Data don’t scare ’em

The contents of the Immune Epitope Database (IEDB) are shocking. The free, searchable site is home to data from more than 1.6 million immunology experiments, making it a one-stop shop for understanding and predicting the body’s response to viruses, bacteria, cancer, allergens, and more.

The IEDB was established in 2003 by LJI professors Alessandro Sette, Dr. Biol. Sci., and Bjoern Peters, Ph.D. Fifteen years later, the user base is booming. In fact, LJI was recently awarded a $22 million contract renewal from the National Institute of Allergy and Infectious Diseases to continue hosting and expanding the IEDB for the next seven years. This support shows how vital the IEDB is for making sense of immunological data.

But the IEDB is so much more than a database. Behind the scenes is a hard-working team dedicated to guiding life-saving research.

The data hunters

Randi Vita, M.D., is the kind of person who really likes puzzles. An immunologist by training, Dr. Vita joined LJI in 2005 to work as a “biocurator” for the IEDB.

“The rise of public databases was relatively new back then, and they needed people to manually curate data,” says Dr. Vita, who today serves as scientific curator and quality assurance manager for the IEDB. “It turns out I love data.”

Good thing too, as the popularity of techniques like genome sequencing has led to what scientists call the age of “big data.” While studies a few decades ago focused on only a couple of data points, studies today routinely contain supplemental tables with 10,000 data points or more. The challenge for Dr. Vita and her colleagues is to gather the reams of epitope data in one place.

In just the first eight years the IEDB operated, the curation team sorted through a staggering number of studies. Between 2004 and 2012, the team manually curated almost 16,000 published manuscripts, which included data from more than 704,000 experiments. They read papers, highlighted terms, and determined how to organize and label epitopes. As a result, more than 120,000 epitopes were entered into the IEDB. Since then, the curators have worked to keep the database up to date. “It’s an extremely manual process,” says Dr. Vita. “We do a lot of interpreting and entering of data.” As of January 2019, the IEDB housed more than 535,000 epitopes. It also expanded in recent years to include data on T and B immune cells themselves, which lets scientists better understand how the body recognizes danger.

These efforts mean that important data won’t be forgotten in the tables of scientific papers. Scientists curious about tuberculosis, for example, can search the IEDB and find epitope data gathered from studies published decades apart in journals around the world.

Biocurator Nima Salimi, M.Sc., is another self-confessed data fan, and he likes working behind the scenes on the IEDB.

“I feel like Marie Kondo,” Salimi says. “We read these papers, and we have to extract and organize just the relevant data we need.”

Salimi says people regularly ask about an IEDB subscription fee. “It’s unbelievable that it’s free. It’s such an amazing resource to have at your disposal,” he says. “I’m always proud to say that I’m part of a project like this. By supporting this kind of science, you’re not only promoting the discovery of information but also leveraging of that information to cure disease.”

The tool makers

“The IEDB does pretty much everything,” says John Sidney, M.Sc., an associate scientist at LJI. Sidney worked with Drs. Sette and Peters to establish the IEDB, and he still helps advise the curation team. He’s also happy to call himself a “guinea pig” as he tests out new IEDB tools.

These tools can dramatically speed up research. For example, Sidney’s lab studies the immune system molecules that mark pathogens for destruction. This research requires an understanding of molecules called peptides.

The problem is that large organisms can make tens of thousands of peptides. “It would cost millions of dollars to even make the peptides to start a study,” says Sidney.

With the IEDB’s prediction tools, Sidney can narrow the candidate peptides down to just 1 or 2 percent of the total, saving vast amounts of money and time. “The IEDB’s bioinformatics tools are really invaluable,” Sidney says.

So far, Sidney has used the IEDB to study dengue, tuberculosis, HPV, hepatitis, Marburg virus, malaria, cancer, and even whooping cough.

Jason Greenbaum, Ph.D., director of the Bioinformatics Core Facility at LJI, has been involved in tool development for the IEDB since 2006, and he’s seen the tool user base triple in the last 10 years. With the renewed NIAID contract, Dr. Greenbaum’s mission is to expand the tools available to scientists.

One of his dedicated developers is Sinu Paul, Ph.D., a bioinformatics scientist. Before Dr. Paul had even earned his Ph.D., he knew he wanted to work at LJI. “I wanted to be part of the IEDB team,” says Dr. Paul. “I wanted to explore what we could do with new technology in the field of biology.”

The IEDB team has also found a way to address diseases we haven’t even heard of yet. When a disease emerges, global immunologists can use the IEDB’s prediction tools and learn about a new pathogen by studying its relatives. “We can use these tools to quickly predict epitopes,” says Dr. Paul.

“If tomorrow there is a new biohazard or concern, you can come to the IEDB, plug in the protein sequence of a new strain of the flu, for example, and find the peptides that would be relevant to studying the immune response,” Sidney explains.

In fact, researchers turned to the IEDB when Zika first emerged as a global pandemic. They knew Zika was related to dengue, so they analyzed dengue epitopes and extrapolated from the IEDB to jump-start new Zika studies.

“This work gives people a foundation to build on,” says Dr. Paul.

A global team

Even scientists who have never used the IEDB have benefitted from the team’s work.

As the team built the database, they established a system for how to label and organize data in immunology. Tackling the questions of how to categorize things is part of a field called ontology, and it has become a passion for Dr. Peters and many others with the IEDB. (Salimi says he finds himself mentally categorizing and subcategorizing things outside work.) This consistent language allows scientists anywhere to share their results and start new collaborations.

“Science is one of the things that has no frontiers,” says Dr. Sette. “There’s a common language in science. You can visit a local university on the other side of the world and talk about the same things. That’s not to be taken for granted. The IEDB has been really at the forefront of making sure this communication without frontiers is maintained in science as the data becomes bigger and bigger.”

Drs. Sette and Peters reflect on how the IEDB has evolved since its founding. “It’s like watching a child grow up,” Dr. Sette says.

“I would consider the IEDB my biggest achievement,” adds Dr. Peters. “In my mind, this project has moved immunology into the twentysecond century.”

The co-founders have also poured their own data into the IEDB and helped researchers worldwide with direct data submissions. The IEDB is turning the world’s immunologists into a collaborative community, and LJI is at the epitope epicenter.