Authors |
Nannan Wang, Bing Yu, Goo Jun, Qibin Qi, Ramon A. Durazo-Arvizu, Sara Lindstrom, Alanna C. Morrison, Robert C. Kaplan, Eric Boerwinkle, Han Chen
|
Abstract Text |
Genomic summary statistics, usually defined as single-variant test results from genome-wide association studies, have been widely used to advance the genetics field in a wide range of applications. Applications that involve multiple genetic variants also require their correlations or linkage disequilibrium (LD) information, often obtained from an external reference panel. In practice, it is usually difficult to find suitable external reference panels that represent the LD structure for underrepresented and admixed populations, or rare variants from whole genome sequencing (WGS) studies, limiting the scope of applications for genomic summary statistics. We develop StocSum, a novel reference-panel-free statistical framework for generating, managing, and analyzing stochastic summary statistics using random vector algorithms. Regardless of the complex sample correlation structure, StocSum always scales linearly with both the sample size and the number of genetic variants in computing stochastic summary statistics from individual-level data. We demonstrate the accuracy and computational efficiency of StocSum using two cohorts from the Trans-Omics for Precision Medicine WGS studies.
|