About TOPMed

Updated 06/27/2022

Download PDFDownload PDF

Contents

Overview

The Trans-Omics for Precision Medicine (TOPMed) program, sponsored by the National Institutes of Health (NIH) National Heart, Lung and Blood Institute (NHLBI), is part of a broader Precision Medicine Initiative, which aims to provide disease treatments tailored to an individual’s unique genes and environment. TOPMed contributes to this Initiative through the integration of whole-genome sequencing (WGS) and other omics (e.g., metabolic profiles, epigenomics, protein and RNA expression patterns) data with molecular, behavioral, imaging, environmental, and clinical data.

Study Characteristics

A primary goal of the TOPMed program is to improve scientific understanding of the fundamental biological processes that underlie heart, lung, blood, and sleep (HLBS) disorders. TOPMed is providing deep WGS and other omics data to pre-existing ‘parent’ studies having large samples of human subjects with rich phenotypic characterization and environmental exposure data.

Study Designs

As of September 2021, TOPMed consists of ~180k participants from >85 different studies with varying designs. Prospective cohorts provide large numbers of disease risk factors, subclinical disease measures, and incident disease cases; case-control studies provide large numbers of prevalent disease cases; extended family structures and population isolates provide improved power to detect rare variant effects. The phenotype pie chart below shows the numbers and percentages of participants in studies with a focus on HLBS, as well as the percentage belonging to cohort studies that have collected many different phenotypes. It also shows areas of focus within each of the major HLBS categories.

Sample numbers by phenotype area (N=180k total)

Participant Diversity

Achieving ancestral and ethnic diversity is a priority in selecting contributing studies. Currently, 60% of the 180k sequenced participants are of predominantly non-European ancestry. Discovery of genotype-phenotype associations frequently includes pooled analysis across ancestry groups and studies, using statistical models that account for population structure and relatedness.

The pie chart below summarizes TOPMed participant diversity using a combination of self-identified or ascriptive race/ethnicity categories, study inclusion criteria, or other demographic information provided by study investigators. Please note that while groupings may correlate to some extent with genetic ancestry, TOPMed recommends distinguishing between genetically and non-genetically inferred descriptions in analyses and publications, as described in these Guidelines on the use and reporting of race, ethnicity, and ancestry in TOPMed.

Sample numbers by ancestry/ethnicity (N=180k total)

Whole Genome Sequencing

WGS was performed by several sequencing centers to a median depth of 30X using DNA from blood, PCR-free library construction and Illumina HiSeq X technology. A Support Vector Machine quality filter was trained with known variants and Mendelian-inconsistent variants. The Informatics Research Center conducts joint genotype calling across all samples available to produce genotype data “freezes.”

In TOPMed data freeze 8, with variant discovery on ~186k samples, 811 million single nucleotide variants and 66 million short insertion/deletion variants were identified and passed variant QC. 

In TOPMed data freeze 9, variant discovery was initially made on ~206k samples including CCDG, but subset to 158,470 TOPMed samples plus 2,504 1000 Genomes samples. 781 million single nucleotide variants and 62 million short insertion/deletion variants were identified and passed variant QC. These variant counts are slightly smaller than the corresponding numbers in data freeze 8 due to omitting sites which show no variation in TOPMed samples. More information about WGS methods can be found under Sequencing and Data Processing Methods.

Omics

TOPMed Omics data processing is being performed by several sequencing centers. The program requires that omics data be submitted to dbGaP, along with thorough documentation of biosampling and laboratory methods, as well as sample provenance. Visit the Standards webpage to find available documented omics pipelines specific to omics type and phase. Below is a summary of the approved data sources for each study/cohort name categorized by data type.

TOPMed WGS and Omics Summary of Approved Data

Short Name Study/Cohort name PI Populations dbGaP ID WGS RNA-seq Methylation Metabolomics Proteomics
ATGC Asthma Translational Genomics Collaborative Burchard Esteban; Williams, L. Keoki;   ATGC dbGaP IDs 16,494 9,290      
MESA Multi-Ethnic Study of Atherosclerosis Rotter, Jerome; Rich, Stephen Multi-ethnic populations phs001416 7,107 8,903 2,086 12,800 14,200
HCHS_SOL Hispanic Community Health Study - Study of Latinos Kaplan, Robert; North, Kari   phs001395 7,834 7,733   12,226  
ARIC+VTE Venous Thromboembolism project Boerwinkle, Eric 20% African American phs001211 phs001402 phs000993 10,531 6,111 16,524    
CARDIA Cell Disease Whole Genome Sequence Analysis in Early Cerebal Small Vessel Disease Fomage, Myriam; Hou Lifang   phs001612 3,472 6,000 9,480 9,000 9,000
MLOF My Life, Our Future: Genotyping for Progress in Hemophilia Konkle, Barbara; Johnsen, Jill   phs001515 5,670 4,500      
PVDOMICS Pulmonary Vascular Disease Omics Analyses Erzurum, Serpil; Barnard, John; Geraci, Mark; Beck, Gerald; Comhair, Suzy   phs002358 1,137 4,388 1,800    
SPIROMICS SubPopulations and InteRmediate Outcome Measures In COPD Study Meyers, Deborah A   phs001927 2,711 3,980   3,417  
FHS Framingham Heart Study Ramachandran, Vasan S.; Levy, Dan; Heard-Costa, Nancy 3 generation EA pedigrees phs000974 7,077 3,807 1,813 3,025  
Africa6K Integrative Genomic Studies of Heart and Blood Related Traits in Africans Tishkoff, Sarah; Williams, Scott   phs002194 6,392 2,934      
HIPS Hemophilia Inhibitor PUPs study Brown, Deborah   phs002302 25 2,596      
WHI Women's Health Initiative Kooperberg, Charles; Reiner, Alex   phs001237 11,310 2,365 4,400 4,400 1,000
LTRC Lung TIssue Research Consortium Silverman, Edwin   phs001662 1,541 1,548 3,041 1,548 1,548
IPF Whole Genome Sequencing in Familial and Sporadic Idiopathic Pulmonary Fibrosis Schwartz David; Fingerlin, Tasha   phs001607 2,883 835      
PharmHU The Pharmacogenomics of Hydroxyurea in Sickle Cell Disease Boerwinkle, Eric; Sheehan, Vivien; Pace, Betty Sue   phs001466 862 826      
COPDGene Genetic Epidemiology of COPD Silverman, Edwin 30% African American phs000951 phs000946 10,829 800 11,843 8,353  
TOPCHeF Trans-Omics for Precision Medicine for Congestive Heart Failure Taylor, Matthew; Mestroni, Luisa; Graw, Sharon   phs002038 839 776      
nuMoM2b-HHS nuMoM2b-Heart Health Study Blue, Nathan; McNeil, Becky     4,341 600      
MDS Genomics of Myelodysplastic Syndromes Walter, Matthew; Goll, Johannes; Lindsley, R. Coleman; Saber, Wael; Padron, Eric; Miller, Christopher   phs002360 473 145      
SCVI Stanford Cardiovascular Institute iPSC Biobank Study Wu, Joseph; Bustamante, Carlos   phs002338 1,163 82      
AA_CAC African American Coronary Artery Calcification project Taylor, Kent D.; Rotter Jerome African American Families phs002194 1,159        
AFGen Identification of Common Genetic Variants for Atrial Fibrillation and PR Interval - Atrial Fibrilation Genetics Consortium Ellinor, Patrick   AFGen dbGaP IDs 12,742        
Amish Genetics of Cardiometabolic Health in the Amish Mitchell, Braxton D. Old Order Amish large extended pedigrees phs000956 1,120        
PGX_Asthma Pharmacogenomics of Bronchodilator Response in Minority Children with Asthma Burchard, Esteban; Hernandez, Ryan 500AA, 500 Puerto Rican, and 500 Mexican of extremely non-responding asthma patients. Please see this TOPMed Project's Parent Studies 1,500        
BAGS Barbados Asthma Genetics Study Barnes, Kathleen African descent Barbados families with >40% of asthmatic members phs001143 1,085        
BCC-PREG The Boston-Colombia Collaborative for Adverse Pregnancy Outcomes Gray, Kathryn J.; Casa Romero, Juan P   Please see this TOPMed Project's Parent Studies. 14,615        
BioMe Mount Sinai BioMe Biobank Loos, Ruth J.F.; Kenny, Eimear   phs001644 11,626        
Boston-Brazil_SCD Boston-Brazil Collaborative Study of Sickle Cell Disease Sankaran, Vijay G.   phs001599 415        
CFS Cleveland Family Study Redline, Susan African American phs000954 1,300        
CHS Cardiovascular Health Study Psaty, Bruce; Tracy, Russell   phs001368 4,780        
COPDMet Plasma and BALF Metabolomics in COPDGene and SPIROMICS Bowler, Russell   Please see this TOPMed Project's Parent Studies 0        
CRA_CAMP The Genetic Epidemiology of Asthma in Costa Rica and the Childhood Asthma Management Program Weiss, Scott T Costa Rica is a special Hispanic population with asthma prevalence at 24% phs001726, phs000988 6,647   3,000 3,000  
DS_CHD Down Syndrome Associated Atrioventricular Septal Defects: New Omic Resources Sherman, Stephanie L.   Please see this TOPMed Project's Parent Studies. 469        
ECLIPSE Evaluation of COPD Longitudinally to Idenity Predictive Surrogate Endpoints Silverman, Edwin   phs001472 2,355        
GEM-OSA Genetics, Epigenetics and Metabolomics of OSA subtypes Pack, Allan; Carrier, Julie; Magalan, Ulysses; Mignot, Emmanuel; Ayas Najib   Please see this TOPMed Project's Parent Studies. 3,000   3,000 3,000  
GeneSTAR Genetic Studies of Atherosclerosis Risk Mathias, Rasika African American families, European families phs001218 1,780        
GenSalt Genetic Epidemiology Network of Salt Sensitivity He, Jiang   phs001217 1,858        
GOLDN Genetics of Lipid Lowering Drug and Diet Network Amett, Donna K European families phs001359 965        
HLKSCD Genetic Variation of Heart, Lung, and Kidney Disease in Sickle Cell Disease: Pre- and Post- Curative Therapies DeBaun, Michael; Eapen, Mary; Kang, Guolian; Edwards, Todd; Weiss, Mitch; Estepp, Jeremie; Gordeuk, Victor; Li, Bingshan; Saraf, Santosh     1,780        
HyperGEN_GE NOA Hypertension Genetic Epidemiology Network and Genetic Epidemiology Network of Arteriopathy Amett, Donna K African American families phs001345 phs001293 3,153        
JHS Jackson Heart Study Carson, April; Raffield, Laura African American mixed family and population based phs000964 3,418        
OMG_SCD Outcome Modifying Genes in Sickle Cell Disease Ashley-Koch, Allison; Telen, Marilyn   phs001608 653        
PCGC_CHD Pediatric Cardiac Genomics Consortium's Congenital Heart Disease Gelb, Bruce; Seidman, Christine   phs001735 3,888        
PROMIS Pakistan Risk of Myocardial Infarction Study Saleheen, Danish South Asian ancestry from Pakistan phs001569 9,204        
PUSH_SCD Pulmonary Hypertension and the Hypoxic Response in Sickle Cell Disease Nekhai, Sergei   phs001682 423        
REDS-III_Brazil Recipient Epidemiology and Donor Evaluation Study-III Custer, Brian; Kelly, Shannon Brazilian phs001468 2,746        
SAFS San Antonio Family Studies Blangero, John; Curran, Joanne Mexican American in SAFHS extended pedigrees phs001215 1,819        
Samoan Samoan Adiposity Study McGarvey, Stephen Samoan phs000972 1,295        
Sarcoidosis Genetics of Sarcoidosis in African Americans Montgomery, Courtney African American families phs001207 1,330        
SARP Severe Asthma Research Program Meyers, Deborah A   phs001446 1,890        
THRV Taiwan Study of Hypertension using Rare Variants Rotter, Jerome; Chen, Yii-Der Ida Taiwan Chinese families phs001387 2,170        
UNID_CM The Genetic Causes of Unexplained Cardiomyopathies Seidman, Jonathan; Seidman, Christine     779        
Walk-PHaSST Treatment of Pulmonary Hypertension and Sickle Cell Disease with Sildenafil Therapy Gladwin, Mark; Zhang, Yingze   phs001514 437        
TOTAL         205,092 68,219 56,987 60,769 25,748

Notes:

AFGen dbGaP IDs: phs001435, phs001543, phs001624, phs001732, phs001600, phs001189, phs001546, phs001606, phs001547, phs001725, phs001545, phs000993, phs001598, phs001062, phs001434, phs001544, phs001024, phs001601, phs001933, phs000997, phs001032, phs001040

ATGC dbGaP IDs: phs001728, phs001729, phs001730, phs001602, phs001603, phs001604, phs001605, phs000920, phs001542, phs001661, phs001727, phs000921, phs001467

Note: You may encounter phs links that redirect to a dbGaP error page in the table above. If so, this is because the TOPMed dbGaP study webpages do not go live until the study accession is released.

Note: TOPMed is generating a rich resource of multi-omics data that will include approximately 40K samples undergoing RNA-sequencing, 37K samples from metabolomics profiling, 57K samples from DNA methylation, and 4K samples from proteomics assaying. These projected totals include all stages of progress, from DNA/RNA that are currently being extracted, to those that are undergoing sequencing/profiling, or those that have completed the sequencing/profiling pipelines. Therefore, most omics data are in the process of being generated and will be released in the future.

 

Resources for the Scientific Community

TOPMed data are being made available to the scientific community as a series of “data freezes”: genotypes and phenotypes via dbGaP; read alignments via the Sequence Read Archive (SRA); and variant summary information via the Bravo variant server (see figure below) and dbSNP. Genotypes for a set of 55k samples have been released on dbGaP (freeze 5) and a freeze release of >140k samples is expected by mid 2020 (freeze 8). TOPMed WGS data are contained in study-specific accessions with names containing “NHLBI TOPMed”, while most phenotypic data are in parent study accessions. The TOPMed accessions can be identified by searching the dbGaP web site for “TOPMed”. More information about what data are available and how to access it can be found on the Data Access page.

TOPMed is currently adding other omic assays to samples that have been whole-genome sequenced; these include RNAseq, metabolomics, proteomics and epigenomics.

Overview of Bravo variant server resources

Overview of Bravo variant server resources


This content was adapted from a poster presented at the 2018 American Society of Human Genetics (ASHG) meeting, “Overview of the NHLBI Trans-Omics for Precision Medicine (TOPMed) program: Whole genome sequencing of >100,000 deeply phenotyped individuals” (Poster 3145/T).