Children's Hospital Colorado

The COVIDome: Data Sharing and Collaboration in the Fight Against COVID-19 

6/7/2022

test tubes

How can data science impact the ways we treat and understand COVID-19?


One of the hallmarks of the COVID-19 pandemic has been uncertainty. Faced with a novel disease that spreads rapidly and encompasses a variety of symptoms, doctors and researchers have come up against new clinical mysteries on a near-daily basis. For critical care specialist and informatics and data scientist Tell Bennett, MD, the key to mitigating that uncertainty and finding answers lies in the data.

Developing the data platforms

While there is a wealth of medical data related to COVID-19 across disparate hospitals, it takes collaboration and hard work to turn that data into useful medical outcomes. To that end, Dr. Bennett has been a key player in developing collaborative data platforms that help researchers better understand COVID-19, its impact and possible treatment options.

“The pandemic was incredibly fast moving and questions continuously arrived that needed answering,” Dr. Bennett says. “Having integrated data sources ready to go was critical to addressing those questions. A few of these different platforms have contributed, and we hope will continue to be available.”

Those platforms — the National COVID Cohort Collective (N3C), the COVIDome and a monoclonal antibody effectiveness project — have earned national attention and have yielded multiple published papers.

Still, health informatics work is not simple. To create these multi-source data-marts, researchers confront data security issues, HIPAA authorizations, work with honest brokers who de-identify data and pull together information from a vast number of sources into one pipeline. Add in COVID-19’s twists, turns and variants, and it becomes clear just how difficult this important work can be.

Nonetheless, the results are both exciting and impressive, Dr. Bennett says, and the techniques used to build data infrastructure for these projects can be used to pool data on other diseases as well, opening doors well beyond COVID-19.

COVID-19 research in action

Dr. Bennet’s work with N3C, a national agglomeration of electronic medical record data on clinical characteristics, outcomes, severity and risk factors for COVID-19 patients, is one such project.

“N3C is all focused on COVID-19 in its current form,” Dr. Bennett says, “but it is a model for how to integrate large bodies of electronic health record data securely for wide use. There’s a vision to have that model apply [to other health issues].”

The repository currently aggregates data from 72 different sites around the country, representing 13 million people and nearly 5 million positive COVID-19 cases. It incorporates 2.2 billion medication records and nearly 7 billion lab results. All that data is available for researchers, clinicians and citizen scientists to access and has already produced several publications, including a handful from Dr. Bennett himself.

One of the studies, published in February 2022, focuses on COVID-19 data among the pediatric population. After surveying N3C-generated data for more than one million children, Dr. Bennett’s team found that 15.6% tested positive and 13.9% met the criteria for severe disease. The study identified several risk factors for severe disease, and determined that with future studies, this data could be used to translate the information to improved outcomes.

Dr. Bennett’s most recent study, funded by an $11 million National Institutes of Health grant, aims to better understand long COVID as part of a national cohort known as RECOVER, or “Researching COVID to Enhance Recovery.”

Enter the COVIDome

The COVIDome, described as a “highly collaborative effort to generate multidimensional datasets from biospecimens from COVID-19 patients and controls,” started with 73 participants hospitalized on the Anschutz Medical Campus with confirmed positive cases of COVID-19. Researchers collected blood samples from the participants, which was analyzed to produce a plasma proteome, plasma metabolome, cytokine profiles, red blood cell metabolome and an immune map. This information, collectively known as multiomics data, was then annotated with clinical data.

The COVIDome’s creators, including Dr. Bennett, made this data publicly available through a database in the hopes it would enable researchers to develop better prevention strategies and diagnostic and therapeutic tools for managing patients with COVID-19.

The work is also supported by Health Data Compass, a unique informatics collaboration between Children’s Hospital Colorado and the CU School of Medicine and has so far produced three published studies (two of which are available online).

The first, published in March 2021, examined the COVIDome data to find a better way to stratify COVID-19 patients for targeted therapeutics. The study found that a patient’s seroconversion status (the point at which their body begins developing COVID-19 antibodies) was associated with distinct pathophysiological states. Researchers propose that given this information, seroconversion status could be used to determine the best treatment for each patient.

Another study, published in August 2021, used the portal to compare various COVID-omics biosignatures against C-reactive protein, or CRP, an established biomarker of poor COVID prognosis. Researchers found a clear association between elevated CRP levels and damage-associated molecular patterns, depletion of protective serpins and mitochondrial metabolic dysfunction.

In addition to providing critical data for studies on the Anschutz Medical Campus, the COVIDome, like N3C, has attracted researchers elsewhere as well. In fact, between its launch in November 2020 and June 2021, the dashboard saw more than 800 unique users from 36 countries around the world.

Effectiveness of monoclonal antibody treatment

Researchers at the Colorado Clinical and Translational Sciences Institute on the Anschutz Medical Campus have been able to build on the successes of N3C and the COVIDome. Leveraging that existing data infrastructure, they’ve developed an ongoing research project, funded by an $8.7 million NIH grant, to determine the effectiveness of monoclonal antibody treatments against COVID-19.

Early in the pandemic, patients who visited the emergency room with COVID-19 were examined to determine whether hospitalization was necessary. If it wasn’t, they were sent home with changes to watch for. With the FDA authorization of monoclonal antibody treatments, that changed, but information gaps remained, both for patients and clinicians.

To remedy that, researchers set out to create a database that would allow them to monitor how this treatment worked, says emergency physician and the project’s lead investigator Adit Ginde, MD.

“From the informatics side, we were looking at real-world effectiveness of these treatments and wanted to build a robust data infrastructure statewide across UCHealth, Denver Health, and in collaboration with the Colorado Department of Public Health and Environment, to basically have a pipeline for real-world data to study these treatments as the pandemic changed,” he says.

Building a platform to do this took Dr. Ginde, Dr. Bennett and their team roughly six months, as it required compiling and linking data from multiple sources, including electronic medical records, Epic, statewide vaccine registries, hospitalization data and death records. The team also took special care to link data to study health inequities and to use data to examine the ways in which information related to the monoclonal antibody treatment was disseminated to patients, including through social media, websites and community organizations.

This work has earned national attention from federal partners, including the CDC, FDA and the federal COVID response team, among others.

“We present to them the data and the findings to help with policy decisions. They're also very interested in what we're doing and how we've been able to then keep up as we switched to the delta variant and then omicron,” Dr. Ginde says. “We’ve been able to incorporate that information into the common data pipeline and be able to start providing information that can really inform the federal response.”