Skip to main content

Analysis of densely imputed UK Biobank genetic data reveals disease-associated rare loss of function variation

Authors
SA Gagliano, W Zhou, D Taliun, J Nielsen, J LeFaive, R Dey, S Das, GR Abecasis
Name and Date of Professional Meeting
ASHG Annual Meeting (October 2018)
Associated paper proposal(s)
Working Group(s)
Abstract Text
Loss of function (LoF) variants, such as those that introduce a premature stop codon or shift the reading frame for translation machinery, alter protein structure to the extent of eliminating or greatly diminishing protein action. This biologically interpretability makes LoF variants a particularly important subset of genetic variation to study.

To pinpoint disease-associated LoF, we imputed additional variants into the UK Biobank cohort of half a million genotyped individuals using 60,039 deeply whole genome sequenced individuals from the multi-ethnic TOPMed project. This allowed us to expand the number of genotyped or imputed variants characterized in the UK Biobank from 39,131,578 to 177,895,992. The vast majority (94%; 167,502,731) of the imputed variants are rare (alternate allele frequency<0.5%), of which 0.03% (49,892) are high-confidence predicted LoF. This dramatic increase in the number of variants in one of the largest health-based cohorts to date provides an ideal setting to assess the impact of LoF variation. To identify disease-associated LoF variants, we conducted single-variant and gene-burden association tests for >1,400 binary traits constructed from health record billing codes.

In the single-variant analyses, we identified five rare LoF variants (not found in the 39M dataset) to be associated with disease, including two variants associated with breast cancer: a frameshift indel in CHEK2 (chr22:28695868:AG:A; build38; p=7.0E-22) and a SNP resulting in a gained stop in PALB2 (chr16:23621362:C:T; p=6.9E-14). Both are present in the ClinVar database as potentially pathogenic for familial breast cancer, but this is the first time these variants have been identified by GWAS.

Furthermore, we found significant burden signals in genes previously implicated in familial disease, but for which no LoF variants were significantly associated in the single-variant analyses; for example, USH2A (a known gene for Usher’s syndrome, for which retinis pigmentosa is a primary symptom) for hereditary retinal dystrophies with only 83 cases, and IFT140 (a known gene for the kidney-disease characterized Mainzer-Saldino syndrome) for kidney cyst with 1,257 cases.

We demonstrate that association studies in large-scale biobanks, even with a relatively small number of cases, are capable of yielding pathogenic findings that previously were only detected in clinical cases or difficult-to-collect family cohorts, in which it can be challenging to obtain robust associations.
Back to top