Skip to main content

MultiSTAAR: A statistical framework for powerful rare variant multi-trait analysis in biobank-scale sequencing studies

Authors
Xihao Li, Han Chen, Margaret Sunitha Selvaraj, Eric Van Buren, Kenneth M. Rice, Jerome I. Rotter, Gina M. Peloso, Pradeep Natarajan, Zilin Li, Zhonghua Liu and Xihong Lin, on behalf of the TOPMed Lipids Working Group
Name and Date of Professional Meeting
American Society of Human Genetics Annual Meeting (November 1-5, 2023)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Introduction
Biobank-scale sequencing studies have made it feasible for better understanding rare variant contributions to complex human traits and diseases. Leveraging association strengths across multiple traits in rare variant association analysis of sequencing studies can improve statistical power over single-trait analysis and detect pleiotropic genes or noncoding regions. Existing methods have limited ability to perform rare variant multi-trait analysis when applied to biobank-scale sequencing data.

Methods
We propose MultiSTAAR, a powerful statistical framework and computationally scalable analytical pipeline for functionally-informed rare variant multi-trait analysis in biobank-scale sequencing studies. As a statistical framework, MultiSTAAR accounts for relatedness, population structure and correlation between phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. As a comprehensive and robust analytical pipeline, MultiSTAAR facilitates functionally-informed multi-trait analysis of both coding and noncoding rare variants by incorporating multiple variant functional annotations for grouping and weighting. MultiSTAAR also provides conditional multi-trait analysis to dissect rare variant association signals independent of known variants.

Results
We applied MultiSTAAR to perform whole-genome sequencing rare variant analysis of 61,838 ancestrally diverse participants from 20 studies by jointly analyzing three quantitative lipid traits from the NHLBI TOPMed consortium: LDL-C, HDL-C and TG. In gene-centric multi-trait analysis of rare variants, MultiSTAAR identified 43 conditionally significant associations with lipid traits, including 4 noncoding associations (enhancer DHS rare variants in NIPSNAP3A and LIPC; ncRNA rare variants in RP11-310H4.2 and MIR4497) that were missed by any of the three single-trait functionally-informed analysis using STAARpipeline. In genetic region multi-trait analysis of rare variants, MultiSTAAR identified 7 conditionally significant 2-kb sliding windows associated with lipid traits, including two sliding windows in DOCK7 (chromosome 1: 62,651,447 - 62,653,446 bp; chromosome 1: 62,652,447 - 62,654,446 bp) and an intergenic sliding window (chromosome 1: 145,530,447 - 145,532,446 bp) that were missed by single-trait analysis using STAARpipeline.

Summary
In summary, MultiSTAAR provides a powerful statistical framework and a computationally scalable analytical pipeline for multi-trait analysis of biobank-scale sequencing studies with complex study samples.
Back to top