Skip to main content

Alcohol and tobacco polygenic risk score prediction within and across diverse ancestries

Authors
Gretchen Saunders, GWAS & Sequencing Consortium of Alcohol and Nicotine use, & Trans-Omics for Precision Medicine
Name and Date of Professional Meeting
ASHG (October, 2020)
Associated paper proposal(s)
Working Group(s)
Abstract Text
The vast majority of genome-wide association studies thus far have been restricted to European only samples. While polygenic risk scores derived from these GWAS summary statistics can be applied to non-European ancestral groups, bias is introduced as a function of the divergence from European ancestries, making their accuracy and utility unclear within diverse groups. We use summary statistics from the GWAS & Sequencing Consortium on Alcohol and Nicotine use (GSCAN) of up to 3.3 million individuals with ancestry from four major ancestral groups: African (AFR), American (AMR), East Asian (EAS), and European (EUR) to generate polygenic risk scores for smoking initiation, cigarettes per day, smoking cessation, age of smoking initiation, and drinks per week. Polygenic scores (PRSs) were validated in an independent sample of individuals from the National Longitudinal Study of Adolescent to Adult Health (Add Health; N = 2204 African, N = 1133 Hispanic, N = 6092 European, N = 525 East Asian, corresponding roughly to the ancestral groups in the GWAS meta-analysis). Predictive accuracy of each PRS was estimated by the change in R2 between base and full models, where base models included the covariates of sex, age, and the first 10 principal components, and full models additionally include the PRS. Using EUR based summary statistics to predict alcohol and tobacco use in European individuals results in incremental R2 values ranging from 1% to 7.3%. Using summary statistics and validation samples from the same non-EUR ancestry results in incremental R2 values ranging from 0-1% for African, 0-2.6% for East Asian, and 0.3-3.1% for Hispanic ancestries. The predictive accuracy of own ancestry PRS in non-European groups is approximately the same, or lower, than European ancestry-based scores. For every ancestral group, using PRSs derived from the all combined ancestry summary statistics almost universally has the highest predictive accuracy: 1-7.7% for European ancestries, 0.2-1.6% for African ancestries, 0.1-3.5% for East Asian ancestries, and 0.3-4.8% for Hispanic ancestries. Polygenic prediction in EUR ancestries remains higher than in non-EUR ancestries, likely for several reasons, including large differences in discovery and validation sample sizes, lower imputation accuracy, differences in LD structure, and phenotypic heterogeneity. Our results highlight the increased prediction accuracy from the use of combined ancestry GWAS summary statistics over single ancestry polygenic scores. We discuss the potential bias of using European based scores for prediction in non-European cohorts, as this has implications for disparity in the utility of polygenic scores.
Back to top