Skip to main content

Overview of the NHLBI Trans-Omics for Precision Medicine (TOPMed) program: whole-genome sequencing of >100,000 deeply phenotyped individuals

Authors
Cathy Laurie, Tom Blackwell, Goncalo Abecasis, Ken Rice, James Wilson, Deborah Nickerson, Stacey Gabriel, Richard Gibbs, Susan Dutcher, Soren Germer, Donna Arnett, Allison Ashley-Koch, Kathleen Barnes, Eric Boerwinkle, Steve Rich, Ed Silverman, Rebecca Beer, Julie Mikulla, Pothur Srinivas, Weiniu Gan, George Papanicolaou, Cashell Jaquish, and the NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium
Name and Date of Professional Meeting
American Society of Human Genetics, October 18, 2018
Associated paper proposal(s)
Working Group(s)
Abstract Text
A primary goal of the NHLBI TOPMed program is to improve scientific understanding of the fundamental biological processes that underlie heart, lung, blood, and sleep disorders. TOPMed is providing deep whole-genome sequencing (WGS) to pre-existing ‘parent’ studies having large samples of human subjects with deep phenotypic characterization (biochemical, physiological, clinical, behavioral, and anatomical measures) and environmental exposure data. WGS was performed to an average depth of >30x. A support vector machine quality filter was trained with known variants and Mendelian-inconsistent variants. The average pairwise non-reference genotype discordance rate among 69 pairs of duplicate samples was estimated at 5 x 10-5. Currently, >100,000 samples have completed sequencing, with ~700 million variants discovered. Genotypes are being called jointly over all studies to facilitate cross-study analyses, including the use of common controls.

TOPMed studies include prospective cohorts; cross-sectional case-control studies; family-based studies; and ‘synthetic cohorts’ consisting of cases from multiple sources with common controls from other sources. The prospective cohorts provide large numbers of disease risk factors, subclinical disease measures, and incident disease cases, while the case-control studies provide large numbers of prevalent disease cases. Extended family structures can provide improved power to detect rare variant effects. Well-represented diseases include asthma, COPD, atrial fibrillation, stroke and venous thromboembolism. Among currently sequenced individuals, the approximate ancestral/ethnic composition is 43% European-American, 31% African-American, 19% Hispanic/Latino, 5% Asian and 3% other groups. Discovery of genotype-phenotype associations frequently includes pooled analysis across ancestry groups and studies, using statistical models that account for population structure and relatedness.

The resources of TOPMed are being made available to the scientific community as a series of ‘data freezes’: genotypes and phenotypes via dbGaP; read alignments via the SRA; and variants via the Bravo variant server and dbSNP. Genotypes for a set of 18.5k samples have been released on dbGaP, another freeze of 55k is being released in Q2 2018, and a freeze of >100k is planned for Q1 2019. Genotypic data are contained in study-specific accessions with names containing “NHLBI TOPMed”, while most phenotypic data are in parent study accessions.
Back to top