Authors |
Angel C.Y. Mak1, Linda Kachuri2, Donglei Hu1, Celeste Eng1, Scott Huntsman1, Jennifer R. Elhawary1, Namrata Gupta3, Stacey Gabriel3, Shujie Xiao4, Hongsheng Gui4, L. Keoki Williams4,5, José R. Rodríguez Santana6, Michael LeNoir7, Kevin L. Keys1,8&, Akinyemi Oni-Orisan9,10,11, Sam S. Oh1, Max A. Seibold12, Christopher R. Gignoux13, Noah Zaitlen14,15, Esteban G. Burchard1,10#, Elad Ziv1,11,16#
|
Abstract Text |
Background: Non-European populations are under-represented in both genome-wide association studies (GWAS) and expression quantitative trait loci (eQTLs) reference databases. The lack of adequate eQTL datasets limits fine mapping of GWAS results and the application of transcriptome-wide association studies (TWAS) in non-European ancestry populations. We leveraged whole genome and RNA sequencing data from 2,280 African American, Mexican American, and Puerto Rican children with and without asthma to investigate the relationship between genetic ancestry and heritability of gene expression. We quantified the prevalence of ancestry-specific eQTLs, developed gene expression models for TWAS, and demonstrated the gains in predictive power.
Results: Heritability (h2) of gene expression in cis was highest in participants with the higher African (AFR) ancestry populations and lowest in participants with the higher Indigenous American ancestry (IAM). Participants with >50% AFR (AFRhigh: h2=0.17) had significantly higher h2 compared to individuals with <10% AFR (AFRlow: h2=0.13, p=2.0×10-147). Among participants with >50% IAM, heritability was lower (h2=0.12) compared to <10% IAM (h2=0.16, p=2.0×10-147). The results for higher heritability in AFR and lower for IAM were consistent when we used locus specific ancestry for heritability comparisons. We developed a framework to identify ancestry-specific eQTLs, accounting for linkage disequilibrium. We studied 9,635 heritable genes in the AFRhigh individuals and found that over 25% had ancestry-specific eQTLs. We generated gene expression imputation models for 11,807 genes (mean cross-validation R2=0.16) and compared these models with models based on GTEx and the Multi-Ethnic Study of Atherosclerosis (MESA) in a TWAS of 28 traits from the Population Architecture using Genomics and Epidemiology (PAGE) Consortium. The total number of genes examined in our models was 38% to 53% higher than GTEx and MESA, respectively. Applying our models to multi-ancestry GWAS results from PAGE identified 321 significantly associated genes (FDR<0.05) and yielded more associated genes than in MESA (in 85% of analyses, p=2.6×10-3) and in GTEx (in 83% of analyses, p=7.5×10-3).
Discussion: We found that cis-heritability of gene expression tracked with heterozygosity (highest in AFR and lowest in IAM). We also found that ancestry-specific eQTLs are common in a large fraction of genes, stressing the need for larger GWAS and RNA-seq sample size in AFR and IAM populations. Finally, we demonstrated the improved performance of ancestry-specific gene expression models for gene discovery in populations with mixed ancestry.
|