Skip to main content

StocSum: stochastic summary statistics for whole genome sequencing studies

Authors
Nannan Wang, Bing Yu, Goo Jun, Qibin Qi, Ramon A. Durazo-Arvizu, Sara Lindstrom, Alanna C. Morrison, Robert C. Kaplan, Eric Boerwinkle, Han Chen
Name and Date of Professional Meeting
Joint Statistical Meetings (August 5-10, 2023)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Genomic summary statistics, usually defined as single-variant test results from genome-wide association studies, have been widely used to advance the genetics field in a wide range of applications. Applications that involve multiple genetic variants also require their correlations or linkage disequilibrium (LD) information, often obtained from an external reference panel. In practice, it is usually difficult to find suitable external reference panels that represent the LD structure for underrepresented and admixed populations, or rare variants from whole genome sequencing (WGS) studies, limiting the scope of applications for genomic summary statistics. We develop StocSum, a novel reference-panel-free statistical framework for generating, managing, and analyzing stochastic summary statistics using random vector algorithms. Regardless of the complex sample correlation structure, StocSum always scales linearly with both the sample size and the number of genetic variants in computing stochastic summary statistics from individual-level data. We demonstrate the accuracy and computational efficiency of StocSum using two cohorts from the Trans-Omics for Precision Medicine WGS studies.
Back to top