Authors |
Xihao Li, Zilin Li, Corbin Quick, Hufeng Zhou, Sheila M. Gaynor, Han Chen, Jerome I. Rotter, Cristen J. Willer, Pradeep Natarajan, Gina M. Peloso, and Xihong Lin, on behalf of the TOPMed Lipids Working Group, BioData Catalyst Consortium
|
Abstract Text |
Introduction
Large-scale whole genome sequencing (WGS) studies have enabled the analysis of rare variants (RVs) associated with complex human traits. Existing RV meta-analysis approaches are not scalable when applied to WGS data.
Methods
We propose MetaSTAAR (Meta-analysis of variant-Set Test for Association using Annotation infoRmation), a powerful and resource-efficient rare variant meta-analysis framework, for large-scale whole genome sequencing association studies. MetaSTAAR accounts for population structure and relatedness for both continuous and dichotomous traits by fitting the generalized linear mixed models using sparse genetic relatedness matrices. By storing LD information of RVs in sparse matrix format, the proposed workflow is highly storage efficient and computationally scalable for analyzing large-scale WGS data. Furthermore, the proposed meta-analysis framework builds upon the STAAR method, which dynamically incorporates multiple functional annotations to empower rare variant association analysis and allows for RV-set analysis including gene-centric analysis by grouping variants into functional categories for each gene and genetic region analysis using sliding windows. MetaSTAAR also enables conditional analyses to identify RV-set signals independent of nearby common variants.
Results
We applied MetaSTAAR to identify RV-sets associated with four quantitative lipid traits (LDL-C, HDL-C, TG and TC) in 30,138 related samples from the NHLBI Trans-Omics for Precision Medicine program Freeze 5 data, consisting of 14 ancestrally diverse study cohorts and 255 million variants in total. MetaSTAAR requires 520 GB to store the summary statistics and LD matrices across the whole genome, which is at least 100 times smaller than the existing method RAREMETAL. In addition, the computation time is benchmarked to be at least 10 times faster than RAREMETAL. In RV gene-centric analysis, MetaSTAAR identified 70 significant associations with lipids traits. In RV sliding window analysis, MetaSTAAR detected 257 significant 2kb sliding windows associated with lipid traits. Compared to the joint analysis of pooled individual-level data using STAAR, the P-values from MetaSTAAR and STAAR are highly concordant, with correlation > 0.99 among significant regions.
Conclusion
We propose MetaSTAAR as a power and resource-efficient framework for meta-analysis of rare variant association, while incorporating multiple variant functional annotations to further improve power. Currently, the proposed framework is the only available solution to perform rare variant meta-analysis at the scale of large whole genome sequencing studies.
Key Words:
Genome sequencing; Genome-wide association; Methodology; Rare variants; Statistical genetics
|