Skip to main content

Smoking

The extent of allelic heterogeneity in a trans-ancestry GWAS meta-analysis of alcohol and tobacco addiction in 3.4 million individuals.

Authors
Gretchen Saunders, GWAS & Sequencing Consortium of Alcohol and Nicotine use (GSCAN), and Trans-Omics for Precision Medicine (TOPMed)
Name and Date of Professional Meeting
ASHG October 2022
Associated paper proposal(s)
If not associated with a paper proposal
Association with a paper proposal is in progress.
Working Group(s)
Abstract Text
The use and abuse of nicotine and alcohol account for >100 million disability-adjusted life years across the globe, constituting one of the world’s leading public health problems. Despite this, the majority of genome-wide association studies thus far have been restricted to individuals of European ancestry, representing <1% of known worldwide genetic variation. Here, we leveraged a trans-ancestry GWAS of nicotine and alcohol use in up to 3.4 million individuals from 60 studies with recent ancestry from Africa (N=119,589), America (N=286,026), East Asia (N=296,438), and Europe (N=2,669,029). Overall, we identified 2,143 loci and 3,823 independent variants associated with our five substance use phenotypes: smoking initiation, age of initiation of regular smoking, cigarettes per day, smoking cessation, and alcoholic drinks per week. The trans-ancestry meta-analysis method allows for quantifying the extent to which associated variants differ in effect size by ancestry along four dimensions estimated from multi-dimensional scaling (MDS) of allele frequencies from each participating study. We found that 79.3% (N = 3,032) of independent variants did not differ in magnitude of effect sizes by ancestry. Of the remaining 791 variants, 136 (3.6% of all independent variants) showed strong evidence for allelic heterogeneity indicating that the effect sizes of these variants differ as a function of at least one axis of genetic variation. A single missense variant in the alcohol dehydrogenase gene ADH1B known to be protective against alcohol consumption showed effect size differences on three axes of ancestry variation. An increase on any of these three MDS components was associated with a reduced effect size of the protective allele, on average. Overall, we found that variants associated with alcohol and tobacco use have largely the same effects across population. This is consistent with the idea that the underlying genetic architecture of alcohol and tobacco use is similar across ancestry and informs our understanding of the reasons for reduced portability of polygenic risk scores across populations. While GWAS identified variants are not necessarily causal themselves, these results suggest that the generally low predictive accuracy of scores across populations that has been widely observed may be largely due to reasons other than difference in causal effect sizes, potentially highlighting the importance of differences in linkage disequilibrium patterns and allele frequencies.

Trans-Ethnic Fine-Mapping using 3.4 Million Individuals from diverse ancestries elucidates the genetic architecture for Smoking Addiction Phenotypes

Authors
Xingyan Wang, GSCAN Consortium, TOPMed Smoking Working Group
Name and Date of Professional Meeting
ASHG 2021 (Oct.18-Oct.22)
Associated paper proposal(s)
If not associated with a paper proposal
Association with a paper proposal is in progress.
Working Group(s)
Abstract Text
Smoking addictions are heritable traits and leading causes for many diseases. Recently, breakthrough in addiction genetics has been made through GSCAN meta-analysis, which identified 406 loci associated with different smoking behavioral traits in European ancestry. Yet, the genetic architecture in non-European ancestry remains elusive. To address this challenge, the GSCAN consortium aggregated datasets from 101 studies with a total of 3.4 million individuals from diverse ancestries (2,669,029 European, 296,395 Asian, 286,026 Admixed American, 119,589 African American). Together, we identify 2,007 loci, among which 464 are novel. This dataset offers an unprecedented opportunity to advance our understanding on the genetic architecture for smoking behavior in global populations.

We propose an improved meta regression-based model for trans-ancestry genetic effect distributions. Specifically, we use the principal components (PCs) of genome-wide allele frequencies as proxies of continuously varying cohort-level ancestry. We model the genetic effect from each study as a mixture of models with different number of PCs, which could encompass different extent of heterogeneity for different variants. For example, the model with 0 PC supports homogenous effects. As the 1st PC separates European and Asian ancestry, the model with 1 PC can be interpreted as having heterogenous effects along the European-Asian cline. By imposing a Dirichlet-Multinomial prior, we borrow strength across variants, learn the genetic architecture and fine map causal variants.

We perform simulations across different scenarios that assume variants have homogenous effects and that have ancestry-specific effects. We show that our method greatly improves the fine mapping resolution and allows us to estimate the fractions of loci that show homogenous effects and ancestry-specific effects. We apply our method to the GSCAN study of 4 phenotypes, i.e., the age of initiation of regular smoking (AgeSmk), cigarettes per day (CigDay), smoking initiation (SmkInit) and smoking cessation (SmkCes). Among 2,007 identified significant loci with a median of 3,274 variants per locus, our proposed method fine-map 34.5% of them to less than 6 variants and 1.51 genes in 90% credible sets, a significant improvement over fine mapping using European ancestry only. On average, 81% of loci show homogenous effect. 13% of the loci are best supported by the mixture with 1PC, which indicates the variants have distinct effects along the European-Asian cline, but homogenous in other ancestry groups. Our new results and continued research will elucidate the genetic architecture in global ancestries.

A Statistical framework to assess replicability of signals from trans-ethnic genome-wide association meta-analysis: Applications to smoking/drinking addiction traits using 3.4 million individuals.

Authors
Chen Wang, GSCAN - the GWAS and Sequencing Consortium of Alcohol and Nicotine Use
Name and Date of Professional Meeting
ASHG Annual Meeting (October 18-22, 2021)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Consortium studies often use genome-wide association meta-analysis (GWAMA) aggregate summary statistics from multiple studies to empower genetic discovery. It is a standard practice to replicate the association signals using an independent dataset. Yet, as discovery studies continue to grow larger and more diverse, it becomes difficult to identify a large enough replication sample, and more so for studies of non-European ancestry. Without replication, the identified association signals are much more likely to be spurious and confound downstream studies. To address this challenge, we propose a novel statistical framework RATES (Replicability Assessment in Trans-Ethnic Studies) to assess replicability without a replication sample. RATES first models genetic effect variations across studies using meta-regression with principal components of genome-wide allele frequencies as covariates and adjusts genetic effect heterogeneities due to ancestry. Next, RATES leverages the strength and consistency of residual association signals across variants and studies to calculate a “posterior probability of replicability”, based on the rationale that replicable association signals tend to be significantly associated across multiple studies. A parametric bootstrap method was also developed to evaluate the p-values for PPR. We performed extensive simulations where 1) the genetic effects are homogeneous across ancestries, 2) the genetic effects are ancestry-specific, and 3) false-positive signals occur in some studies in the meta-analysis. We compared RATES with popular meta-analysis methods including the fixed effect (FE), random effects (RE and RE2) and binary effect (BE) meta- analysis, and meta-regression (MR-MEGA). We showed when outliers are present, only RATES yields correct type I error, while other methods (e.g., FE or RE2) can have > 4 folds inflated type I error. RATES also gives higher or comparable power in all scenarios, even for simulations that favor alternative methods. For variants with ancestry-specific effects, the power of RATES is 7% to over 400% higher compared to the 2nd best performing meta-analysis method. We further applied RATES to smoking/drinking addiction traits using 3.4 million individuals of different ethnic groups. As the first step, RATES confirmed that all sentinel variants reported have PPR>99%. When comparing the mean chi-square as converted from p-values, RATES yields chi-square values that are 9 % higher than the 2nd best method (RE2). Applying RATES to rare and low-frequency variants that are typically filtered out, we further identified novel signals of biological relevance in addition to GWAMA of common variants.

Trans-ethnic Transcriptome-wide Association Study and fine-mapping analysis in 3.4 million individuals shed lights on the genetic architecture of alcohol and smoking addiction

Authors
Fang Chen, Xingyan Wang, Gretchen Saunders, Trans-Omics for Precision Medicine, GWAS and Sequencing Consortium of Alcohol and Nicotine Use
Name and Date of Professional Meeting
ASHG 2021 (October 18-22, 2021)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Cigarette smoking is a well-established major heritable risk factor for human diseases. And alcohol consumption has been increasingly associated with the development of chronic diseases and other serious problems, such as heart disease, stroke, and cancer. The availability of large datasets in recent years has enabled a breakthrough in the genetics of smoking and alcohol addiction, with more than 400 associated genomic loci discovered to date. These numbers keep growing thanks to rapidly increasing sample size and involvement of ethnically diverse populations. But it remains challenging to map non-coding variants to their target genes and translate their biological and clinical relevance.
To address this gap, we developed a novel trans-ethnic TWAS approach, TESLA (trans-ethnic transcriptome-wide association study approach using an optimal linear combination of association statistics), that has provably optimal power to integrate trans-ethnic GWAS with eQTL data from a possibly unmatched ancestry. TESLA uses a mixed effect meta-regression approach to model ancestry-specific effect across different studies in meta-analysis. We showed using simulation that TESLA substantially outperforms other strategies, including TWAS using fixed effect meta-analysis results and TWAS using only studies from matched ancestries.
Using TESLA, we aggregated trans-ethnic GWAS and whole-genome sequencing data from the GSCAN consortium (total N = 3.4 million) and GTEx v8 eQTL data in 49 tissues to further empower gene discovery for tobacco and alcohol use behaviors. We identified 2,652 genes; among them, 148 are novel genes that are outside 1 million basepair window of GWAS sentinel variants. These results provide many susceptibility genes for alcohol and smoking addiction than hitherto. Consistent with previous studies, our results also showed a general lack of tissue specificity across all phenotypes where the most significant genes in each tissue are often ranked the top across many tissues. Further, we performed a fine-mapping analysis in these risk TWAS loci to prioritize putative causal genes. The 90%-credible set of 778 loci were narrowed down to a single gene. These results could help us develop a deeper understanding and broad vision of the genetic architecture of smoking and alcohol use behaviors.

Colocalization of GWAS-associated loci for alcohol and tobacco use phenotypes with expression quantitative trait loci in 49 tissues

Authors
Jacqueline M. Otto, GWAS & Sequencing Consortium of Alcohol and Nicotine Use (GSCAN), Trans-Omics for Precision Medicine (TOPMed)
Name and Date of Professional Meeting
ASHG (October 2020)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Background: Alcohol and tobacco use are moderately heritable behaviors (h2 ≈ 0.50), and genome-wide association studies (GWAS) have identified a large number of associated loci. Though the majority of findings map to intergenic regions of the genome, GWAS heritability is known to be enriched at expression quantitative trait loci (eQTLs). Therefore, identifying relevant gene and tissue expression for GWAS loci may inform our understanding of disease mechanisms underlying these signals. In the current study, we tested whether GWAS-associated variants for alcohol and tobacco use colocalized with variants associated with nearby gene expression (cis-eQTLs).
Methods: Summary statistics for variants with minor allele frequency (MAF) > 0.001 were obtained from the most recent GSCAN meta-analyses of drinks per week (DPW), smoking initiation (SI), age of smoking initiation (AI), cigarettes smoked per day (CPD), and smoking cessation (SC) in a sample of European-ancestry individuals (N = 618,489 - 2,669,029, depending on the phenotype). Fine-mapped summary statistics for variants with MAF ≥ 0.01 were obtained from eQTL mapping in 49 tissues as part of the GTEx consortium v8 release. After assigning GWAS variants to approximately LD-independent blocks across the genome, colocalization analyses were conducted using the Bayesian hierarchical test, fastENLOC, for all 5 GWAS trait x 49 tissue pairs.
Results: After correction for multiple testing, a number of independent gene loci (DPW = 151; SI = 374; AI = 2; CPD = 18; and SC = 37) were found to have regional colocalization probabilities (RCPs) > 0.80, indicative of regions that likely harbor a shared causal variant for both the molecular and behavioral trait. These signals were present in brain and non-brain tissues and included several candidate genes previously associated with smoking and psychiatric phenotypes, such as CHRNA2 and SYT3. When aggregated across all tissues, the median proportion of loci with a suggestive RCP > 0.50 overlapping with a GW-significant variant ranged from 17%-40%, depending on the trait.
Discussion: Common variants associated with alcohol and tobacco use exhibit colocalization with variants influencing tissue-specific gene expression, though these patterns occur generally and do not appear to be specific to brain tissues. Relatively smaller sample sizes for eQTL mapping in brain tissues may have limited power to detect shared causal variation in the current study. Future investigations might test for colocalization with eQTL data from phenotype-relevant postmortem brain tissues or conduct additional functional studies to validate select candidate loci.

Alcohol and tobacco polygenic risk score prediction within and across diverse ancestries

Authors
Gretchen Saunders, GWAS & Sequencing Consortium of Alcohol and Nicotine use, & Trans-Omics for Precision Medicine
Name and Date of Professional Meeting
ASHG (October, 2020)
Associated paper proposal(s)
Working Group(s)
Abstract Text
The vast majority of genome-wide association studies thus far have been restricted to European only samples. While polygenic risk scores derived from these GWAS summary statistics can be applied to non-European ancestral groups, bias is introduced as a function of the divergence from European ancestries, making their accuracy and utility unclear within diverse groups. We use summary statistics from the GWAS & Sequencing Consortium on Alcohol and Nicotine use (GSCAN) of up to 3.3 million individuals with ancestry from four major ancestral groups: African (AFR), American (AMR), East Asian (EAS), and European (EUR) to generate polygenic risk scores for smoking initiation, cigarettes per day, smoking cessation, age of smoking initiation, and drinks per week. Polygenic scores (PRSs) were validated in an independent sample of individuals from the National Longitudinal Study of Adolescent to Adult Health (Add Health; N = 2204 African, N = 1133 Hispanic, N = 6092 European, N = 525 East Asian, corresponding roughly to the ancestral groups in the GWAS meta-analysis). Predictive accuracy of each PRS was estimated by the change in R2 between base and full models, where base models included the covariates of sex, age, and the first 10 principal components, and full models additionally include the PRS. Using EUR based summary statistics to predict alcohol and tobacco use in European individuals results in incremental R2 values ranging from 1% to 7.3%. Using summary statistics and validation samples from the same non-EUR ancestry results in incremental R2 values ranging from 0-1% for African, 0-2.6% for East Asian, and 0.3-3.1% for Hispanic ancestries. The predictive accuracy of own ancestry PRS in non-European groups is approximately the same, or lower, than European ancestry-based scores. For every ancestral group, using PRSs derived from the all combined ancestry summary statistics almost universally has the highest predictive accuracy: 1-7.7% for European ancestries, 0.2-1.6% for African ancestries, 0.1-3.5% for East Asian ancestries, and 0.3-4.8% for Hispanic ancestries. Polygenic prediction in EUR ancestries remains higher than in non-EUR ancestries, likely for several reasons, including large differences in discovery and validation sample sizes, lower imputation accuracy, differences in LD structure, and phenotypic heterogeneity. Our results highlight the increased prediction accuracy from the use of combined ancestry GWAS summary statistics over single ancestry polygenic scores. We discuss the potential bias of using European based scores for prediction in non-European cohorts, as this has implications for disparity in the utility of polygenic scores.

Near-optimal trans-ethnic association and fine mapping of smoking associated genes integrating GWAS and TOPMed sequence data of 1.3 million individuals

Authors
Y. Jiang; TOPMed smoking working group and GSCAN consortium
Name and Date of Professional Meeting
the American Society of Human Genetics annual meeting
Associated paper proposal(s)
Working Group(s)
Abstract Text
Tobacco use is a heritable risk factor for numerous diseases, for which 353 associated genes were identified in European samples. Yet, its genetic architecture in non-European populations remains elusive. To address this, we assembled TOPMed whole genome sequences of ~150,000 individuals from diverse US populations as well as GWAS data of up to 1.2 million individuals. Four smoking phenotypes were studied, including smoking initiation, cigarettes per day, smoking cessation and the age of smoking initiation.

To analyze these amazingly rich datasets, we developed a novel mixed effect meta-regression method for near-optimal trans-ethnic meta-analysis (MEMO). MEMO summarizes ancestry for each study using principal components of genome-wide allele frequencies. It models the between-study genetic effect heterogeneities due to genetic ancestry differences as a fixed effect and that due to non-ancestry exposure differences as random effects. For each SNP, MEMO adaptively selects fixed effects and random effects to be included that best models the genetic effect heterogeneity. It thus combines the strength of fixed effect, random effect meta-analysis, and meta-regression. MEMO is consistently the most powerful (or close to the most powerful) across a wide variety of scenarios in simulations, even when the simulated disease model is in favor of alternative methods. We further extend MEMO for fine mapping, which can distinguish causal variants with homogeneous effects and that show ancestry-specific effects. Due to the improved model of multi-ethnic genetic effects, MEMO considerably improves fine mapping resolution. Simulation shows the method is well calibrated and on average, the posterior probability of association for causal variants estimated by our method is 50% higher, and our 95% credible interval for causal variants is ~33% shorter than alternative trans-ethnic fine-mapping methods.

Applying MEMO, we identified 265 loci with p<5e-9 among which 27 are novel, and >400 independent secondary associations. Our fine-mapping narrowed down the 95% credible interval for causal variants to less than 10 variants for 76 loci, and 17 of them contain a single SNP. We estimated that 56% of the causal variants show homogeneous effects across ancestries, while another 26% and 12% show African specific and Hispanic specific effects. In conclusion, our results elucidate the genetic architecture for smoking traits, and our developed methods will be valuable for other studies.

Near-Optimal Trans-ethnic Association and Fine Mapping of Smoking Associated Genes Integrating GWAS and TOPMed sequence Data of 1.3 million individuals

Authors
Yu Jiang
TOPMed Smoking Working Group
GWAS & Sequencing Consortium of Alcohol and Nicotine use
Name and Date of Professional Meeting
ASHG October 2019
Associated paper proposal(s)
Working Group(s)
Abstract Text
Tobacco use is a heritable risk factor for numerous diseases, for which 353 associated genes were identified in European samples. Yet, its genetic architecture in non-European populations remains elusive. To address this, we assembled TOPMed whole genome sequences of ~150,000 individuals from diverse US populations as well as GWAS data of up to 1.2 million individuals. Four smoking phenotypes were studied, including smoking initiation, cigarettes per day, smoking cessation and the age of smoking initiation.

To analyze these amazingly rich datasets, we developed a novel mixed effect meta-regression method for near-optimal trans-ethnic meta-analysis (MEMO). MEMO summarizes ancestry for each study using principal components of genome-wide allele frequencies. It models the between-study genetic effect heterogeneities due to genetic ancestry differences as fixed effect and that due to non-ancestry exposure differences as random effects. For each SNP, MEMO adaptively selects fixed effects and random effects to be included that best models the genetic effect heterogeneity. It thus combines the strength of fixed effect, random effect meta-analysis and meta-regression. MEMO is consistently the most powerful (or close to the most powerful) across a wide variety of scenarios in simulations, even when the simulated disease model is in favor of alternative methods. We further extend MEMO for fine mapping, which can distinguish causal variants with homogeneous effects and that show ancestry specific effects. Due to the improved model of multi-ethnic genetic effects, MEMO considerably improves fine mapping resolution. Simulation shows the method is well calibrated and on average, the posterior probability of association for causal variants estimated by our method is 50% higher, and our 95% credible interval for causal variant is ~33% shorter than alternative trans-ethnic fine mapping methods.

Applying MEMO, we identified 265 loci with p<5e-9 among which 27 are novel, and >400 independent secondary associations. Our fine mapping narrowed down the 95% credible interval for causal variants to less than 10 variants for 76 loci, and 17 of them contain a single SNP. We estimated that 56% of the causal variants show homogeneous effects across ancestries, while another 26% and 12% show African specific and Hispanic specific effects. In conclusion, our results elucidate the genetic architecture for smoking traits, and our developed methods will be valuable for other studies.

Trans-ethnic GWAS meta-analysis of tobacco and alcohol use

Authors
MengZhen Liu
Trans-Omics for Precision Medicine (TOPMed)
GWAS & Sequencing Consortium of Alcohol and Nicotine use (GSCAN)
Name and Date of Professional Meeting
ASHG October 2019
Associated paper proposal(s)
Working Group(s)
Abstract Text
Alcohol and nicotine use are together the largest preventable causes of morbidity and mortality in the United States. While there has been a steady decline of cigarette smoking in the US, in 2017 the 1-year prevalence of current smoking was 14%, with a disproportionate rate of use among individuals from disadvantagedbackgrounds. Regular alcohol use and misuse remains a persistent issue across a majority of Americans, and is associated with a range of medical issues including liver damage and neuropsychiatric impairments. Previous genetic association research has focused predominantly on individuals of European ancestry. We expand upon existing research by conducting a GWAS meta-analysis of alcohol and nicotine use across multiple ancestries. Our approach maximizes power for variant detection, allows evaluation of ancestry-specific variant effects, and provides greater fine-mapping resolution. We have conducted a meta-analysis in over 3.4 million individuals of diverse ancestry to discover genetic variation contributing to multiple stages of smoking including whether an individual has ever been a regular smoker (N=3,377,408 from 75 studies discovering 780 loci [404 novel]), age of initiation of regular smoking (N=731,870; 64 studies; 39 loci [30 novel]), cigarettes per day (N=782,790; 81 studies; 157 loci [103 novel]), smoking cessation (N=1,400,906; 742,222 studies, 112 loci [91 novel]), as well as drinks per week, a measure of alcohol consumption (N=2,896,131; 62 studies; 376 loci; [286 novel]). The largest non-European ancestry subset is composed of individuals of East Asian ancestry, with sample sizes ranging from N=62,943 for age of initiation of regular smoking to 293,145 for ever/never smoker status. We will report trans-ethnic discovery and evidence for ancestry-moderated variant effects, as well as heritabilities and utility of polygenic scores from this dataset.
Back to top