Skip to main content
TOPMed

Trans-Omics for Precision Medicine | TOPMed

Home

What is TOPMed?

Trans-Omics for Precision Medicine (TOPMed), is a program of the National Heart, Lung and Blood Institute (NHLBI), a part of the National Institutes of Health, which aims to improve scientific understanding of the fundamental biological processes that underlie heart, lung, blood, and sleep (HLBS) disorders and advance precision medicine in ways that lead to disease treatments tailored to individuals’ unique genes and environments.

TOPMed supports these scientific advances through the integration of whole-genome sequencing (WGS) and other omics data (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) with molecular, behavioral, imaging, environmental, and clinical data from pre-existing parent studies that have large samples of human subjects with rich phenotypic characterization and environmental exposure data. TOPMed also collects environmental and behavioral data, such as dietary habits, physical activity, and socioeconomic factors, to provide a more comprehensive understanding of the factors that contribute to these disorders.

TOPMed Artificial Intelligence Initiative (TOPMed-AI)

NHLBI’s TOPMed program aims to leverage the power of artificial intelligence (AI) and machine learning (ML) to accelerate the understanding of HLBS disorders. By utilizing the vast genomic data resources available through TOPMed and the computing infrastructure of BioData Catalyst (BDC), researchers will be able to develop advanced AI methods to analyze complex data and identify patterns that may lead to new insights and potential innovations for precision medicine.  The initiative will bring together AI/ML and other multidisciplinary experts to collaborate on innovative approaches to analyze and interpret TOPMed data.  The coordination center (AI-CC) at Westat serves as the central hub for coordinating research projects.

Initial use-cases for the TOPMed-AI initiative include:

  • Women’s health across the lifespan, starting with a focus on mid-life/menopause transition.
  • Imaging of lung disease. Radiogenomics focusing on chest CT data and including other imaging data as the program evolves.

Explore the Data

The TOPMed program provides data resources for researchers studying heart, lung, blood, and sleep disorders. These data resources include various types of genomic and other data, such as whole-genome sequencing, whole-exome sequencing, RNA sequencing, epigenetic data, metabolomic data, and proteomic data. Researchers who wish to access TOPMed data, including electronic health records, medical imaging data, and other patient health information, must get approval through the Database of Genotypes and Phenotypes (dbGaP). Once approval is granted, researchers can access the data from NHLBI BioData Catalyst® (BDC) or dbGaP.

BioData Catalyst (BDC)

  • Cloud-based computing ecosystem

  • Features secure workspaces, tools, applications, and workflows

  • Hosts data and supports collaboration

The TOPMed program uses BioData Catalyst (BDC) as a resource to facilitate research efforts. BDC is a cloud-based ecosystem where researchers can access NHLBI datasets, including TOPMed data, and leverage innovative data analysis tools, applications, and workflows to accelerate their research efforts. Additionally, BDC allows researchers to bring their own data, collaborate, and share their findings with other researchers in the community, ultimately driving discovery and scientific advancement in precision medicine.

TOPMed Data Freeze 9

  • Variant discovery was initially made on approximately 206,000 samples.
  • 781 million single nucleotide variants were identified.
  • 62 million short insertion/deletion variants were identified and passed variant quality control (QC).

Note: These variant counts are slightly smaller than the corresponding numbers in data freeze 9 due to omitting sites that show no variation in TOPMed samples. More information about WGS methods can be found by selecting a freeze listed on the Methods page.

OMICS Data Processing & Sources

TOPMed Omics data processing is being performed by several sequencing centers. The program requires that omics data be submitted to dbGaP with thorough documentation of bio-sampling, laboratory methods, and sample provenance. Visit the Methods webpage and scroll to the Standards, Pipelines and Flowcharts for Data Processing section to find available documented omics pipelines specific to omics type and phase. Below is a summary of the approved data sources for each study/cohort name categorized by data type.

TOPMed WGS and Omics Summary of Approved Projects
TOPMed WGS and Omics Summary of Approved Data
Short Name Sort descending Study/Cohort Name Populations dbGaP ID WGS RNA-seq Methylation Metabolomics Proteomics
CARDIA Whole Genome Sequence Analysis in Early Cerebral Small Vessel Disease African American and Caucasian young adults phs001612 3,472 6,000 approved 9,480 approved 12,000 approved 12,000 approved
CFS Cleveland Family Study African American phs000954 1,300
CHS Cardiovascular Health Study Adults in the USA aged 65 and older phs001368 4,780
COPDGene Genetic Epidemiology of COPD Non-Hispanic White and African-American, current and former smokers with and without COPD (>10,000) phs000951 phs000946 10,829 800 approved 11,843 approved 8,353 approved
COPDMet Plasma and BALF Metabolomics in COPDGene and SPIROMICS Current, former, and never smokers Please see this TOPMed Project's Parent Studies.
CRA_CAMP The Genetic Epidemiology of Asthma in Costa Rica and the Childhood Asthma Management Program Hispanic population with asthma prevalence at 24% phs001726 phs000988 6,647 3,000 approved 3,000 approved
DS_CHD Down Syndrome Associated Atrioventricular Septal Defects: New Omic Resources Parent-Offspring Trios Please see this TOPMed Project's Parent Studies. 469
ECLIPSE Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints Adults aged 40-75 with at least 10 pack-years of smoking phs001472 2,355
FHS Framingham Heart Study 3 generation EA pedigrees
GEM-OSA Genetics, Epigenetics and Metabolomics of OSA subtypes Canada and United States Please see this TOPMed Project's Parent Studies. 3,000 3,000 approved 3,000 approved
196,938 64,412 81,033 82,534 34,014

Published Papers That Used TOPMed Data

Published papers utilize TOPMed data to address topics related to heart, lung, blood, and sleep disorders.

Title Journal Name Publication Date Sort ascending PMID
Metagenomic Study of the MESA: Detection of Gemella Morbillorum and Association With Coronary Heart Disease J Am Heart Assoc. 39344648
Genetic variants associated with white blood cell count amongst individuals with sickle cell disease British Journal of Hematology 39279196
A genome-wide association study of alloimmunization in the TOPMed OMG-SCD cohort identifies a locus on chromosome 12 Transfusion 38966903
Whole Genome Sequencing Based Analysis of Inflammation Biomarkers in the Trans-Omics for Precision Medicine (TOPMed) Consortium Human Molecular Genetics 38747556
Metabolite signatures associated with microRNA miR-143-3p serve as drivers of poor lung function trajectories in childhood asthma eBioMedicine 38458111

Resources for the Scientific Community

TOPMed data are being made available to the scientific community as a series of “data freezes”:

  • genotypes and phenotypes via dbGaP
  • read alignments via the Sequence Read Archive (SRA)
  • variant summary information via the Bravo variant server
  • single nucleotide polymorphisms (dbSNP)

TOPMed WGS data are contained in study-specific accessions with names containing “NHLBI TOPMed,” while most phenotypic data are in parent study accessions. The TOPMed accessions can be identified by searching the dbGaP website for “TOPMed.” More information about the available data and how to access it can be found on the Data Access page.

Back to top