Home

TOPMed

What is TOPMed?

Trans-Omics for Precision Medicine (TOPMed), is a program of the National Heart, Lung and Blood Institute (NHLBI), a part of the National Institutes of Health, which aims to improve scientific understanding of the fundamental biological processes that underlie heart, lung, blood, and sleep (HLBS) disorders and advance precision medicine in ways that lead to disease treatments tailored to individuals’ unique genes and environments.

TOPMed supports these scientific advances through the integration of whole-genome sequencing (WGS) and other omics data (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) with molecular, behavioral, imaging, environmental, and clinical data from pre-existing parent studies that have large samples of human subjects with rich phenotypic characterization and environmental exposure data. TOPMed also collects environmental and behavioral data, such as dietary habits, physical activity, and socioeconomic factors, to provide a more comprehensive understanding of the factors that contribute to these disorders.

TOPMed Artificial Intelligence Initiative (TOPMed-AI)

NHLBI’s TOPMed program aims to leverage the power of artificial intelligence (AI) and machine learning (ML) to accelerate the understanding of HLBS disorders. By utilizing the vast genomic data resources available through TOPMed and the computing infrastructure of BioData Catalyst (BDC), researchers will be able to develop advanced AI methods to analyze complex data and identify patterns that may lead to new insights and potential innovations for precision medicine. The initiative will bring together AI/ML and other multidisciplinary experts to collaborate on innovative approaches to analyze and interpret TOPMed data. The coordination center (AI-CC) at Westat serves as the central hub for coordinating research projects.

Initial use-cases for the TOPMed-AI initiative include:

Women’s health across the lifespan, starting with a focus on mid-life/menopause transition.
Imaging of lung disease. Radiogenomics focusing on chest CT data and including other imaging data as the program evolves.

Explore the Data

The TOPMed program provides data resources for researchers studying heart, lung, blood, and sleep disorders. These data resources include various types of genomic and other data, such as whole-genome sequencing, whole-exome sequencing, RNA sequencing, epigenetic data, metabolomic data, and proteomic data. Researchers who wish to access TOPMed data, including electronic health records, medical imaging data, and other patient health information, must get approval through the Database of Genotypes and Phenotypes (dbGaP). Once approval is granted, researchers can access the data from NHLBI BioData Catalyst® (BDC) or dbGaP.

BioData Catalyst (BDC)

Cloud-based computing ecosystem
Features secure workspaces, tools, applications, and workflows
Hosts data and supports collaboration

The TOPMed program uses BioData Catalyst (BDC) as a resource to facilitate research efforts. BDC is a cloud-based ecosystem where researchers can access NHLBI datasets, including TOPMed data, and leverage innovative data analysis tools, applications, and workflows to accelerate their research efforts. Additionally, BDC allows researchers to bring their own data, collaborate, and share their findings with other researchers in the community, ultimately driving discovery and scientific advancement in precision medicine.

Access BioData Catalyst

TOPMed Data Freeze 9

Variant discovery was initially made on approximately 206,000 samples.
781 million single nucleotide variants were identified.
62 million short insertion/deletion variants were identified and passed variant quality control (QC).

Note: These variant counts are slightly smaller than the corresponding numbers in data freeze 9 due to omitting sites that show no variation in TOPMed samples. More information about WGS methods can be found by selecting a freeze listed on the Methods page.

View TOPMed data freeze 9

Omics Data Releases

TOPMed Omics data processing is being performed by several sequencing centers. The program requires that omics data be submitted to dbGaP with thorough documentation of bio-sampling, laboratory methods, and sample provenance. Visit the Methods webpage and scroll to the Standards, Pipelines and Flowcharts for Data Processing section to find available documented omics pipelines specific to omics type and phase. Below is a summary of the approved data sources for each study/cohort name categorized by data type.

TOPMed WGS and Omics Summary of Approved Projects

TOPMed WGS and Omics Summary of Approved Data
Short Name Sort descending	Study/Cohort	PI	Populations	dbGaP ID	WGS	RNA-seq	Methylation	Metabolomics	Proteomics
CARDIA	Whole Genome Sequence Analysis in Early Cerebral Small Vessel Disease	Myriam Fornage	African American and White Young Adults	phs001612	2,759 Released of 3,472 approved Subjects: 2759	5,090 Released of 6,000 approved Subjects: 2773	6,441 Released of 9,480 approved Subjects: 2037	7,859 Released of 9,000 approved Subjects: 3071	7,710 Released of 9,000 approved Subjects: 3066
COPD	Genetic Epidemiology of COPD	Ed Silverman	African American (30%)	phs000951 phs000946	10,825 Released of 10,829 approved Subjects: 10677	723 Released of 800 approved Subjects: 386	12,215 Released of 11,843 approved Subjects: 7357	9,141 Released of 8,353 approved Subjects: 6607
CRA_CAMP	The Genetic Epidemiology of Asthma in Costa Rica and the Childhood Asthma Management Program	Scott Weiss	Costa Rica is a Special Hispanic Population with Asthma Prevalence at 24%	phs001726 phs000988	6,462 Released of 6,647 approved Subjects: 5979		1,346 Released of 3,000 approved Subjects: 1346	2,843 Released of 3,000 approved Subjects: 2672
FHS	Framingham Heart Study	Joanne Murabito	Three generation European ancestry pedigrees		4,145 Released of 4,089 approved Subjects: 4133	2,691 Released of 5,832 approved Subjects: 2691	1,808 Released of 4,099 approved Subjects: 1804	0 Released of 7,117 approved Subjects: 0	0 Released of 6,752 approved Subjects: 0
GALAII	Gene-Environment, Admixture and Latino Asthmatics Study	Elad Ziv	Latino		3,674 Released of 3,674 approved Subjects: 3666	2,215 Released of 2,500 approved Subjects: 2191
MESA	Multi-Ethnic Study of Atherosclerosis	Jerome Rotter	Multi-Ethnic populations	phs001416	7,886 Released of 7,107 approved Subjects: 7878	2,940 Released of 8,903 approved Subjects: 1347	2,084 Released of 13,400 approved Subjects: 976	13,258 Released of 14,760 approved Subjects: 3806	8,880 Released of 16,200 approved Subjects: 3323
SAGE	Study of African Americans, Asthma, Genes and Environment	Esteban Burchard	African American		1,951 Released of 1,951 approved Subjects: 1949	917 Released of 1,000 approved Subjects: 864
VTE	Venous Thromboembolism project	Eric Boerwinkle		phs001211 phs001402 phs000993	5,206 Released of 10,531 approved Subjects: 5202	0 Released of 6,111 approved Subjects: 0	14,989 Released of 16,524 approved Subjects: 11153	0 Released of 16,524 approved Subjects: 0
WHI	Women's Health Initiative	Charles Kooperberg	Women aged 50-79 years		11,035 Released of 11,310 approved Subjects: 11027	2,391 Released of 2,365 approved Subjects: 2368	4,314 Released of 4,400 approved Subjects: 3118	4,399 Released of 4,400 approved Subjects: 3219	0 Released of 1,000 approved
Total Released					53,943	16,967	43,197	37,500	16,590

Notes:

“phs” is a dbGaP study accession number prefix indicating a phenotype study. A study accession number is a unique, stable, and versioned identifier.

AFGen dbGaP IDs: phs001435 , phs001543 , phs001624 , phs001732 , phs001600 , phs001189 , phs001546 , phs001606 , phs001547 , phs001725 , phs001545 , phs000993 , phs001598 , phs001062 , phs001434 , phs001544 , phs001024 , phs001601 , phs001933 , phs000997 , phs001032 , phs001040

ATGC dbGaP IDs: phs001728 , phs001729 , phs001730 , phs001602 , phs001603 , phs001604 , phs001605 , phs000920, phs001542 , phs001661 , phs001727 , phs000921 , phs001467

In the table, you may encounter phs links that redirect to a dbGaP error page. If so, this is because the TOPMed dbGaP study webpages are not available until the study accession is released.

TOPMed is generating a rich resource of multi-omics data that will include approximately 40,000 samples undergoing RNA-sequencing, 37,000 samples from metabolomics profiling, 57,000 samples from DNA methylation, and 4,000 samples from proteomics assaying. These projected totals include all stages of progress, from DNA/RNA that are currently being extracted to those that are undergoing sequencing/profiling or those that have completed the sequencing/profiling pipelines. Therefore, most omics data are in the process of being generated and will be released in the future.

Published Papers That Used TOPMed Data

Published papers utilize TOPMed data to address topics related to heart, lung, blood, and sleep disorders.

Title	Journal Name	Publication Date Sort ascending	PMID
Identification of proteins associated with type 2 diabetes risk in diverse racial and ethnic populations	Diabetologia	2024	39349773
Genetics of Latin American Diversity (GLAD) Project: insights into population genetics and association studies in recently admixed groups in the Americas	Cell Genomics	2024	39486408
Machine learning-based clustering identifies obesity subgroups with differential multi-omics profiles and metabolic patterns	Obesity	2024	39497627
Cardiovascular Risk Factors and Genetic Risk in Transthyretin V142I Carriers	JACC Heart Failure	2024	39520444
Genomic and Serological Rheumatoid Arthritis Biomarkers, MUC5B Promoter Variant, and Interstitial Lung Abnormalities	Annals of the American Thoracic Society	2024	39405163

Resources for the Scientific Community

TOPMed data are being made available to the scientific community as a series of “data freezes”:

genotypes and phenotypes via dbGaP
read alignments via the Sequence Read Archive (SRA)
variant summary information via the Bravo variant server
single nucleotide polymorphisms (dbSNP)

TOPMed WGS data are contained in study-specific accessions with names containing “NHLBI TOPMed,” while most phenotypic data are in parent study accessions. The TOPMed accessions can be identified by searching the dbGaP website for “TOPMed.” More information about the available data and how to access it can be found on the Data Access page.

Trans-Omics for Precision Medicine | TOPMed