Skip to main content
TOPMed

Trans-Omics for Precision Medicine | TOPMed

Home

What is TOPMed?

Trans-Omics for Precision Medicine (TOPMed), is a program of the National Heart, Lung and Blood Institute (NHLBI), a part of the National Institutes of Health, which aims to improve scientific understanding of the fundamental biological processes that underlie heart, lung, blood, and sleep (HLBS) disorders and advance precision medicine in ways that lead to disease treatments tailored to individuals’ unique genes and environments.

TOPMed supports these scientific advances through the integration of whole-genome sequencing (WGS) and other omics data (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) with molecular, behavioral, imaging, environmental, and clinical data from pre-existing parent studies that have large samples of human subjects with rich phenotypic characterization and environmental exposure data. TOPMed also collects environmental and behavioral data, such as dietary habits, physical activity, and socioeconomic factors, to provide a more comprehensive understanding of the factors that contribute to these disorders.

NHLBI Artificial Intelligence Initiative (NHLBI-AI)

The NHLBI-AI Initiative was created to stimulate the development of AI methods and models for discovery research that significantly impact understanding health and the prevention, diagnosis, and treatment of heart, lung, blood, and sleep (HLBS) conditions. In addition, the Initiative will develop infrastructure to support studies involving AI by the broader research community. NHLBI-AI began in FY2024 with the establishment of an AI Coordinating Center (AICC) tasked with managing the logistics of building the consortium and bringing AI experts into collaboration with subject matter experts on existing NHLBI programs, such as TOPMed and BioData Catalyst. We envision that the NHLBI-AI Initiative will be composed of the Data Science Center (DSC), a Precision Medicine Coordinating Center (PMCC), several modular precision medicine programs, and a series of independently funded AI research projects supported by these cores. The PMCC will support the transformation, harmonization, and sharing of extant and new data so that NHLBI-AI investigators can pose a greater scope of clinically impactful research questions. 

Explore the Data

The TOPMed program provides data resources for researchers studying heart, lung, blood, and sleep disorders. These data resources include various types of genomic and other data, such as whole-genome sequencing, whole-exome sequencing, RNA sequencing, epigenetic data, metabolomic data, and proteomic data. Researchers who wish to access TOPMed data, including electronic health records, medical imaging data, and other patient health information, must get approval through the Database of Genotypes and Phenotypes (dbGaP). Once approval is granted, researchers can access the data from NHLBI BioData Catalyst® (BDC) or dbGaP.

BioData Catalyst (BDC)

  • Cloud-based computing ecosystem

  • Features secure workspaces, tools, applications, and workflows

  • Hosts data and supports collaboration

The TOPMed program uses BioData Catalyst (BDC) as a resource to facilitate research efforts. BDC is a cloud-based ecosystem where researchers can access NHLBI datasets, including TOPMed data, and leverage innovative data analysis tools, applications, and workflows to accelerate their research efforts. Additionally, BDC allows researchers to bring their own data, collaborate, and share their findings with other researchers in the community, ultimately driving discovery and scientific advancement in precision medicine.

TOPMed Data Freeze 9

  • Variant discovery was initially made on approximately 206,000 samples.
  • 781 million single nucleotide variants were identified.
  • 62 million short insertion/deletion variants were identified and passed variant quality control (QC).

Note: These variant counts are slightly smaller than the corresponding numbers in data freeze 9 due to omitting sites that show no variation in TOPMed samples. More information about WGS methods can be found by selecting a freeze listed on the Methods page.

Omics Data Releases

TOPMed Omics data processing is being performed by several sequencing centers. The program requires that omics data be submitted to dbGaP with thorough documentation of bio-sampling, laboratory methods, and sample provenance. Visit the Methods webpage and scroll to the Standards, Pipelines and Flowcharts for Data Processing section to find available documented omics pipelines specific to omics type and phase. Below is a summary of the approved data sources for each study/cohort name categorized by data type.

TOPMed WGS and Omics Summary of Approved Projects
TOPMed WGS and Omics Summary of Approved Data
Short Name Sort ascending Study/Cohort PI Populations dbGaP ID WGS RNA-seq Methylation Metabolomics Proteomics
WHI Women's Health Initiative Women aged 50-79 years 11,035 Released
of 11,310 approved
Subjects: 11027
2,391 Released
of 2,365 approved
Subjects: 2368
4,314 Released
of 4,400 approved
Subjects: 3118
4,399 Released
of 4,400 approved
Subjects: 3219
0 Released
of 1,000 approved
VTE Venous Thromboembolism project African American (20%) phs001211 phs001402 phs000993 5,206 Released
of 10,531 approved
Subjects: 5202
0 Released
of 6,111 approved
Subjects: 0
14,989 Released
of 16,524 approved
Subjects: 11153
0 Released
of 16,524 approved
Subjects: 0
SAGE Study of African Americans, Asthma, Genes and Environment African American 1,951 Released
of 1,951 approved
Subjects: 1949
917 Released
of 1,000 approved
Subjects: 864
MESA Multi-Ethnic Study of Atherosclerosis Multi-Ethnic populations phs001416 7,886 Released
of 7,107 approved
Subjects: 7878
2,940 Released
of 8,903 approved
Subjects: 1347
2,084 Released
of 13,400 approved
Subjects: 976
13,258 Released
of 14,760 approved
Subjects: 3806
8,880 Released
of 16,200 approved
Subjects: 3323
GALAII Gene-Environment, Admixture and Latino Asthmatics Study Latino 3,674 Released
of 3,674 approved
Subjects: 3666
2,215 Released
of 2,500 approved
Subjects: 2191
FHS Framingham Heart Study Three generation European ancestry pedigrees 4,145 Released
of 4,089 approved
Subjects: 4133
2,691 Released
of 5,832 approved
Subjects: 2691
1,808 Released
of 4,099 approved
Subjects: 1804
0 Released
of 7,117 approved
Subjects: 0
0 Released
of 6,752 approved
Subjects: 0
CRA_CAMP The Genetic Epidemiology of Asthma in Costa Rica and the Childhood Asthma Management Program Costa Rica is a Special Hispanic Population with Asthma Prevalence at 24% phs001726 phs000988 6,462 Released
of 6,647 approved
Subjects: 5979
1,346 Released
of 3,000 approved
Subjects: 1346
2,843 Released
of 3,000 approved
Subjects: 2672
COPD Genetic Epidemiology of COPD >10,000 non-Hispanic white and African-American, nearly all current and former smokers with and without COPD phs000951 phs000946 10,825 Released
of 10,829 approved
Subjects: 10677
723 Released
of 800 approved
Subjects: 386
12,215 Released
of 11,843 approved
Subjects: 7357
9,141 Released
of 8,353 approved
Subjects: 6607
CARDIA Whole Genome Sequence Analysis in Early Cerebral Small Vessel Disease African American and White Young Adults phs001612 2,759 Released
of 3,472 approved
Subjects: 2759
5,090 Released
of 6,000 approved
Subjects: 2773
6,441 Released
of 9,480 approved
Subjects: 2037
7,859 Released
of 9,000 approved
Subjects: 3071
7,710 Released
of 9,000 approved
Subjects: 3066
Total Released 53,943 16,967 43,197 37,500 16,590

Published Papers That Used TOPMed Data

Published papers utilize TOPMed data to address topics related to heart, lung, blood, and sleep disorders.

Title Journal Name Publication Date PMID
Clinics and genetics of hyperhemolysis syndrome in patients with sickle cell disease Transfusion 40172242
Mitochondrial DNA Copy Number Variation in Asthma Risk, Severity, and Exacerbations J Allergy Clin Immunol. 39237012
Sequencing in over 50,000 cases identifies coding and structural variation underlying atrial fibrillation risk Nature Genetics 40050430
A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies Nature Computational Science 39920506
Proteomic Signature of HIV-Associated Subclinical Left Atrial Remodeling and Incident Heart Failure Nat Commun. 39800750

Resources for the Scientific Community

TOPMed data are being made available to the scientific community as a series of “data freezes”:

  • genotypes and phenotypes via dbGaP
  • read alignments via the Sequence Read Archive (SRA)
  • variant summary information via the Bravo variant server
  • single nucleotide polymorphisms (dbSNP)

TOPMed WGS data are contained in study-specific accessions with names containing “NHLBI TOPMed,” while most phenotypic data are in parent study accessions. The TOPMed accessions can be identified by searching the dbGaP website for “TOPMed.” More information about the available data and how to access it can be found on the Data Access page.

Back to top