Home

TOPMed

What is TOPMed?

Trans-Omics for Precision Medicine (TOPMed), is a program of the National Heart, Lung and Blood Institute (NHLBI), a part of the National Institutes of Health, which aims to improve scientific understanding of the fundamental biological processes that underlie heart, lung, blood, and sleep (HLBS) disorders and advance precision medicine in ways that lead to disease treatments tailored to individuals’ unique genes and environments.

TOPMed supports these scientific advances through the integration of whole-genome sequencing (WGS) and other omics data (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) with molecular, behavioral, imaging, environmental, and clinical data from pre-existing parent studies that have large samples of human subjects with rich phenotypic characterization and environmental exposure data. TOPMed also collects environmental and behavioral data, such as dietary habits, physical activity, and socioeconomic factors, to provide a more comprehensive understanding of the factors that contribute to these disorders.

Explore the Data

The TOPMed program provides data resources for researchers studying heart, lung, blood, and sleep disorders. These data resources include various types of genomic and other data, such as whole-genome sequencing, whole-exome sequencing, RNA sequencing, epigenetic data, metabolomic data, and proteomic data. Researchers who wish to access TOPMed data, including electronic health records, medical imaging data, and other patient health information, must get approval through the Database of Genotypes and Phenotypes (dbGaP). Once approval is granted, researchers can access the data from NHLBI BioData Catalyst® (BDC) or dbGaP.

BioData Catalyst (BDC)

Cloud-based computing ecosystem
Features secure workspaces, tools, applications, and workflows
Hosts data and supports collaboration

The TOPMed program uses BioData Catalyst (BDC) as a resource to facilitate research efforts. BDC is a cloud-based ecosystem where researchers can access NHLBI datasets, including TOPMed data, and leverage innovative data analysis tools, applications, and workflows to accelerate their research efforts. Additionally, BDC allows researchers to bring their own data, collaborate, and share their findings with other researchers in the community, ultimately driving discovery and scientific advancement in precision medicine.

Access BioData Catalyst

TOPMed Data Freeze 9

Variant discovery was initially made on approximately 206,000 samples.
781 million single nucleotide variants were identified.
62 million short insertion/deletion variants were identified and passed variant quality control (QC).

Note: These variant counts are slightly smaller than the corresponding numbers in data freeze 9 due to omitting sites that show no variation in TOPMed samples. More information about WGS methods can be found by selecting a freeze listed on the Methods page.

View TOPMed data freeze 9

OMICS Data Processing & Sources

TOPMed Omics data processing is being performed by several sequencing centers. The program requires that omics data be submitted to dbGaP with thorough documentation of bio-sampling, laboratory methods, and sample provenance. Visit the Methods webpage and scroll to the Standards, Pipelines and Flowcharts for Data Processing section to find available documented omics pipelines specific to omics type and phase. Below is a summary of the approved data sources for each study/cohort name categorized by data type.

TOPMed WGS and Omics Summary of Approved Projects

TOPMed WGS and Omics Summary of Approved Data
Short Name Sort descending	Study/Cohort Name	Populations	dbGaP ID	WGS	RNA-seq	Methylation	Metabolomics	Proteomics
AA_CAC	African American Coronary Artery Calcification project	African American families	phs002194	1,159
AFGen	Identification of Common Genetic Variants for Atrial Fibrillation and PR Interval - Atrial Fibrillation Genetics Consortium	European ancestry	AFGen dbGaP IDs	12,742
ARIC+VTE	Venous Thromboembolism project	African American (20%)	phs001211 phs001402 phs000993	10,531	6,111	16,524	16,524
ATGC	Asthma Translational Genomics Collaborative	African American, Mexican, and Puerto Rican individuals	ATGC dbGaP IDs	16,494	9,290
Africa6K	Integrative Genomic Studies of Heart and Blood Related Traits in Africans	African ancestry	phs002194	6,392	2,934
Amish	Genetics of Cardiometabolic Health in the Amish	Old Order Amish large extended pedigrees	phs000956	1,120
BAGS	Barbados Asthma Genetics Study	Barbados families of African descent with >40% of asthmatic members	phs001143	1,085
BCC-PREG	The Boston-Colombia Collaborative for Adverse Pregnancy Outcomes	White, Black, Asian, Hispanic, White Hispanic, AfroCarribean, Ameridians, Mixed	Please see this TOPMed Project's Parent Studies.	14,615
BioMe	Mount Sinai BioMe Biobank	African American (24%), Hispanic/Latino (35%), European (32%), Other (10%)	phs001644	11,626
Boston-Brazil_SCD	Boston-Brazil Collaborative Study of Sickle Cell Disease	Brazilian	phs001599	415
				196,938	64,412	81,033	82,534	34,014

Notes:

“phs” is a dbGaP study accession number prefix indicating a phenotype study. A study accession number is a unique, stable, and versioned identifier.

AFGen dbGaP IDs: phs001435 , phs001543 , phs001624 , phs001732 , phs001600 , phs001189 , phs001546 , phs001606 , phs001547 , phs001725 , phs001545 , phs000993 , phs001598 , phs001062 , phs001434 , phs001544 , phs001024 , phs001601 , phs001933 , phs000997 , phs001032 , phs001040

ATGC dbGaP IDs: phs001728 , phs001729 , phs001730 , phs001602 , phs001603 , phs001604 , phs001605 , phs000920, phs001542 , phs001661 , phs001727 , phs000921 , phs001467

In the table, you may encounter phs links that redirect to a dbGaP error page. If so, this is because the TOPMed dbGaP study webpages are not available until the study accession is released.

TOPMed is generating a rich resource of multi-omics data that will include approximately 40,000 samples undergoing RNA-sequencing, 37,000 samples from metabolomics profiling, 57,000 samples from DNA methylation, and 4,000 samples from proteomics assaying. These projected totals include all stages of progress, from DNA/RNA that are currently being extracted to those that are undergoing sequencing/profiling or those that have completed the sequencing/profiling pipelines. Therefore, most omics data are in the process of being generated and will be released in the future.

Published Papers That Used TOPMed Data

Published papers utilize TOPMed data to address topics related to heart, lung, blood, and sleep disorders.

Title	Journal Name	Publication Date Sort ascending	PMID
Whole Genome Sequencing Based Analysis of Inflammation Biomarkers in the Trans-Omics for Precision Medicine (TOPMed) Consortium	Human Molecular Genetics	2024	38747556
Metabolite signatures associated with microRNA miR-143-3p serve as drivers of poor lung function trajectories in childhood asthma	eBioMedicine	2024	38458111
Transcriptome-wide association study of the plasma proteome reveals cis and trans regulatory mechanisms underlying complex traits	AJHG	2024	38320554
A genetic association study of circulating coagulation Factor VIII and von Willebrand Factor levels	Blood	2024	38320121
Interaction molecular QTL mapping discovers cellular and environmental modifiers of genetic regulatory effects	American Journal of Human Genetics	2024	38181730

Resources for the Scientific Community

TOPMed data are being made available to the scientific community as a series of “data freezes”:

genotypes and phenotypes via dbGaP
read alignments via the Sequence Read Archive (SRA)
variant summary information via the Bravo variant server
single nucleotide polymorphisms (dbSNP)

TOPMed WGS data are contained in study-specific accessions with names containing “NHLBI TOPMed,” while most phenotypic data are in parent study accessions. The TOPMed accessions can be identified by searching the dbGaP website for “TOPMed.” More information about the available data and how to access it can be found on the Data Access page.

Contact Us

* Required field

Trans-Omics for Precision Medicine | TOPMed