Authors |
Eric Van Buren, Xihao Li, Zilin Li, Peter Orchard, Hufeng Zhou, Alex Reiner, Laura Raffield, Xihong Lin, NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium, TOPMed Multi-Omics Working Group
|
Abstract Text |
Introduction
Through our previously developed cellSTAAR method, we demonstrated that integration of single-cell-sequencing-based epigenetic data can boost the power of gene-centric Rare Variant (RV) association tests (RVATs) to detect associations of candidate Cis-Regulatory Elements (cCREs) in complex human diseases. Integrating additional kinds of multi-omics data to capture additional sources of functional variability that exists in the non-coding genome may further increase power.
Methods
We propose omicsSTAAR as a new method to robustly integrate several kinds of multi-omics data into gene-centric RVATs of non-coding regions. First, omicsSTAAR can integrate variant- level multi-omics datasets, such as from methylation studies or eQTL summary statistics, to create custom variant sets of the most likely causal variants weighted with corresponding functional annotations. Association p-values from each variant set are aggregated using the Cauchy Combination Test to create an omnibus p-value summarizing evidence across different categories of multi-omics data. Second, omicsSTAAR can integrate gene-level multi-omics datasets, such as RNA-seq and proteomics experiments, to weight omnibus gene-centric association p-values using “side-information” approaches such as Independent Hypothesis Weighting (IHW). Using such approaches, omicsSTAAR can account for the biological relevance of each gene as measured by expression or protein abundance in relevant tissues.
Results
We applied omicsSTAAR on Freeze 8 (N = 60,000) of the NHLBI Trans-Omics for Precision Medicine (TOPMed) consortium data of four hematological traits: hemoglobin (HGB), hematocrit (HCT), platelet count (PLT), and white blood cell count (WBC). To demonstrate omicsSTAAR, we collected single-cell ATAC-seq data and two TOPMed blood-based datasets: RNA-seq from the WHI and FHS TOPMed cohorts (N = 2,072) and eQTL summary statistics based on 5,007 TOPMed participants. Our analysis reveals associations in several known genes for hematological traits, including HBQ1 and CD84, while showing variability in the which kinds of omics data detect each association. We also demonstrate a substantial increase in the number of discoveries at a reduced significance threshold when combining the variant-level multi-omics data (scATAC-seq and eQTL summary statistics) association results into an omnibus association p-value and when using gene-level multi-omics data (RNA-seq) to weight the gene-centric omnibus p-values.
|