Skip to main content
TOPMed

Harmonized Phenotypes

TOPMed Phenotype Harmonization Project

The main goal of the TOPMed harmonization project is to provide harmonized phenotypes that are well-documented, reproducible, and homogeneous across studies. In harmonized datasets and documentation, “phenotype” refers to the observable characteristic (e.g., diastolic blood pressure) and “variable” to refer to the specific data vector values for a given phenotype (e.g., bp_diastolic_1). To enable reproducibility, all study data were acquired from dbGaP.

Datasets and documentation of the harmonized variables were submitted to two repositories: dbGaP and BioData Catalyst. Full documentation for each harmonized variable is provided in a GitHub repository. The documentation for each harmonized variable includes the identifiers of the original dbGaP study variables used in harmonization as well as the code that was used to transform them into the harmonized variable. This repository also includes a reproducible example that instructs users how to use the documentation to reproduce a simulated harmonized variable.

TOPMed Phenotype Tagging Project

Over 16,000 dbGaP study variables with 65 phenotype concepts from heart, lung, blood, and sleep domains were tagged. These tags enable researchers to identify variables of interest that can be used in future harmonization efforts.  The results of the tagging project are available in the dbGaP user interface.  All tags are mapped to a UMLS Concept Unique Identifier (CUI), which is required for identifying the tagged variables on dbGaP.  

Instructions for Identifying Tagged Variables on dbGaP

The following are examples of different methods to search for tagged variables: Entrez search and faceted search.

Entrez search

  • In your web browser, visit the dbGaP Entrez advanced search page.
  • In the search builder, select Common Data Element Resource and enter “umls” into the associated text box or add “umls[Common Data Element Resource]”.  Another option is to select Common Data Element Term and enter the CUI of a UMLS term into the associated text box or add “C0005890[Common Data Element Term]” to the search box.    
  • The Studies tab of the search results displays all of the studies that contain tagged variables.
  • The Variables tab of the search results displays all of the dbGaP variables that are tagged with at least one UMLS term. Click on a variable name to see more information on the variable page

Faceted search

  • In your web browser, visit the dbGaP faceted search page.• Click on the Variables tab.
  • Under the Common Data Elements filter, check UMLS.o This will display all of the dbGaP study variables that are tagged with a UMLS term.
  • For a given variable listed on the right, you can click on the UMLS link to go directly to the variable’s information page with the full UMLS term name.
  • To search for variables tagged with a specific UMLS term, search for the term’s CUI in the search box in the upper left corner of the page.

Mapped Phenotype Tags

Mapped Phenotype Tags
Phenotype Domain Description UMLS CUI UMLS Term Tag Name (phenotype concept)
Anthropometry Waist circumference measurement C0455829 Waist circumference Waist circumference
Anthropometry Ratio of waist to hip circumference C0205682 Waist-hip ratio Waist-hip ratio
Anthropometry Body weight measurement C0005910 Body weight Weight
Demographics Self-reported sex or gender identity C0017249 Gender identity Gender
Demographics Self-reported race, ancestry or ethnicity C1830369 Race or ethnicity Race/ancestry/ethnicity

Citation

Information about these projects is available in a published manuscript. If you use the datasets described on this page, please cite the following paper:

Stilp AM, Emery LS, Broome JG, Buth EJ, Khan AT, Laurie CA, Wang FF, Wong Q, Chen D, D’Augustine CM, Heard-Costa NL, Hohensee CR, Johnson WC, Juarez LD, Liu J, Mutalik KM, Raffield LM, Wiggins KL, de Vries PS, Kelly TN, Kooperberg C, Natarajan P, Peloso GM, Peyser PA, Reiner AP, Arnett DK, Aslibekyan S, Barnes KC, Bielak LF, Bis JC, Cade BE, Chen MH, Correa A, Cupples LA, de Andrade M, Ellinor PT, Fornage M, Franceschini N, Gan W, Ganesh SK, Graffelman J, Grove ML, Guo X, Hawley NL, Hsu WL, Jackson RD, Jaquish CE, Johnson AD, Kardia SLR, Kelly S, Lee J, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Morrison AC, North KE, Nouraie SM, Oelsner EC, Pankratz N, Rich SS, Rotter JI, Smith JA, Taylor KD, Vasan RS, Weeks DE, Weiss ST, Wilson CG, Yanek LR, Psaty BM, Heckbert SR, Laurie CC. A System for Phenotype Harmonization in the National Heart, Lung, and Blood Institute Trans-Omics for Precision Medicine (TOPMed) Program. Am J Epidemiol. 2021 Oct 1;190(10):1977-1992. doi: 10.1093/aje/kwab115. PMID: 33861317; PMCID: PMC8485147.

Back to top