TOPMed

About TOPMed

The goal of the National Heart, Lung, and Blood Institute’s (NHLBI) TOPMed program is to generate scientific resources that improve the understanding of heart, lung, blood, and sleep disorders and advance precision medicine. Precision medicine is an emerging approach to disease prevention and treatment that considers each patient’s unique genes and environment.

The TOPMed program collects whole-genome sequences and other -omics data. In biology, -omics refers to measurable differences or changes in biological molecules, such as genes, metabolites, proteins, and RNA. TOPMed integrates -omics data with molecular, behavioral, imaging, environmental, and clinical data from diverse participants in NHLBI’s population and epidemiology studies. Integrating genomic and -omics data from diverse populations supports researchers in their search for scientific discoveries by expanding their analyses and identifying factors that increase or decrease the risk of disease, identify subtypes of disease, and develop more targeted and personalized treatments.

The TOPMed program is an important part of NHLBI’s efforts to harness data science to drive precision medicine. As part of its role in NHLBI’s broader precision medicine landscape, TOPMed makes its genomic data and the pre-existing parent study phenotypic data available in the NHLBI BioData Catalyst® (BDC) ecosystem to researchers granted access through the NIH Database of Genotypes and Phenotypes (dbGaP). BDC is a cloud-based ecosystem providing tools, applications, and workflows in secure workspaces where researchers can find, access, share, store, and compute on data that are hosted in the ecosystem or that researchers bring to it.

The TOPMed program also complements the NIH All of Us Research Program, which is an effort to gather data from one million or more people living in the United States to accelerate research that may improve health.

TOPMed has contributed to research published in almost 150 publications and posted over 200 abstracts at numerous professional meetings.

At a Glance

TOPMed consisted of over 180,000 whole-genome sequences, of which around 60% are of predominantly non-European ancestry.
TOPMed program studies are collecting-omics data metabolic profiles, epigenetics, and protein and RNA expression patterns.
The TOPMed program is leveraging data from NHLBI’s clinical and population studies.

Off

As of September 2021, TOPMed consisted of approximately 180,000 participants from more than 85 different studies with varying designs.
Prospective cohorts provide large numbers of disease risk factors, subclinical disease measures, and incident disease cases; case-control studies provide large numbers of prevalent disease cases; extended family structures and population isolates provide improved power to detect rare variant effects.

The phenotype pie chart shows a breakdown of the numbers and percentages of participants in studies with a focus on heart, lung, blood, and sleep, as well as the percentage belonging to cohort studies that have collected many different phenotypes.

How does the TOPMed program contribute to scientific discoveries?

Biomarkers that increase or decrease the risk of heart, lung, blood, and sleep disorders
Interactions between the environment and genes that affect health
Potential targets for new treatments
New ways to define heart, lung, blood, and sleep disorders or subtypes of these disorders based on molecular signatures
Targeted ways to develop and test personalized treatments in specific patients
Advances in precision medicine to predict, prevent, diagnose, and treat heart, lung, blood, and sleep disorders

These contributions present numerous life-improving possibilities, like the development of precision treatments for patients effected by racial and ethnic health disparities and new ways to screen, diagnose, and treat patients based on the genetic underpinnings of the diseases, disorders, and conditions.

Who is part of the TOPMed Program?

The NHLBI TOPMed Program is made successful by the involvement of many entities.

The TOPMed Research Community includes over 1,400 investigators who are members that represent more than 90 NHLBI cohort and clinical studies, their mentees, and others the PIs sponsor to join. Members of the TOPMed Research Community sit on one or more of TOPMed’s 32 phenotype-focused working groups and 10 committees. Learn more about the TOPMed Research Community (members), Working Groups, and Committees.

The NHLBI provides administrative coordinating center (ACC) services through a contract with Westat, Inc., informatics research center (IRC) services through a contract with the University of Michigan Center for Statistical Genetics, and sequencing services through contracts with four sequencing centers. Learn more about NHLBI and the TOPMed Centers.

How does the TOPMed program process data?

NHLBI notifies the TOPMed Administrative Coordinating Center (ACC), TOPMed Informatics Research Center (IRC), and TOPMed sequencing centers about newly approved studies.
The TOPMed IRC and sequencing centers contact the study investigators with the following items and tasks:
1. The IRC sends unique sample identifiers to the study investigators, who then assign the unique sample identifiers to their selected samples.
2. The sequencing centers send a sample manifest to the study investigators to collect information needed to process the samples.
The study investigators fill in the sample and identifier information and send them to the sequencing center with the selected samples.
The sequencing centers process the samples, perform sample-level quality control (QC), and send the resulting data to the IRC.
The IRC performs additional data processing and QC and transfer the data to the TOPMed exchange Area in dpGaP, where it is made available for use by TOPMed Working Groups and the originating parent study teams (i.e. pre-public release).
After a period of approximately six months (i.e. the priority period), the data are submited to dbGaP and assigned accenssions for controlled access by the scientific community. The IRC also submits selected data to the NHLBI BioData Catalyst ecosystem, where researchers with access permissions can use innovative data analysis tools, applications, and workflows to accelerate their research efforts.

TOPMed Phases

TOPMed adds new studies in “Phases” that occur yearly, but the number of studies in each Phase varies from year to year (from 3 to 21), and the number of biospecimens needed varies from study to study.

The TOPMed IRC harmonizes whole genome sequences from many available studies to produce periodic joint genotype call sets. These central datasets are called a “freeze,” and TOPMed data are being made available to the scientific community as a series of “data freezes”:

genotypes and phenotypes via dbGaP
read alignments via the Sequence Read Archive (SRA)
variant summary information via the Bravo variant server
single nucleotide polymorphisms (dbSNP)

TOPMed WGS data are contained in study-specific accessions with names containing “NHLBI TOPMed,” while most phenotypic data are in parent study accessions. The TOPMed accessions can be identified by searching the dbGaP website for “TOPMed.” More information about the available data and how to access it can be found on the Data Access page.