Introduction

Genome-wide association studies (GWAS) have emerged as a cornerstone in the delivery of the Human Genome Project’s promise to revolutionize medicine.1 Far from being caused by individual mutations, complex traits that account for the overwhelming majority of illnesses have proven very difficult to study using classic positional mapping or hypothesis-driven case–control studies. Under the “common variant–common disease” model, a new approach was needed to address the polygenicity of these traits by taking into account the additive/synergistic interaction of multiple common variants and the heterogeneity of these combinatorial effects across individuals.2 To date, 1,751 studies have been published that collectively examined millions of polymorphic markers in hundreds of thousands of patients and controls to extract signals of association according to the National Human Genome Research Institute GWAS Catalog.3 The identification of 11,912 such signals may not have yet penetrated clinical practice for the purpose of prediction and prevention of common diseases, but it has greatly informed our understanding of their pathophysiology, and we are starting to see their potential to inform novel drug development.4, 5, 6, 7

The power of GWAS stems from their hypothesis-free design, which makes it possible to uncover unexpected genetic, and in turn, biological mechanisms. The fact that the medical relevance, historically inferred from Mendelian mutations, is unknown for most of the ~20,000 protein-coding human genes means that a GWAS signal is often detected in a gene with no established medical context. This challenge is further compounded by the finding that the majority of variants are not within the 1.5% coding part of the human genome.8 As a result, alleles that produce significant signals are often interpreted in light of the physically closest genes even though they may not be the source of the biological signal.9 Extensive follow-up functional analysis of the genes that are the putative source of the signal is often required especially when little had been published about the function of these genes.10

Mendelian mutations have long been recognized as the best source of information on gene function in humans.11, 12 Although gain of function and loss of function can both be informative in this regard, loss of function mutations are often more faithful in exposing the physiological context of genes. Biallelic truncating mutations are the most conspicuous forms of loss of function, especially those that occur early in the reading frame and across all known isoforms, essentially representing knockouts of these genes.13 We have previously shown that enhanced homozygosity as a function of autozygosity, a signature genomic feature of inbreeding, can facilitate the occurrence of naturally occurring human knockouts.14 These knockouts span the entire phenotypic spectrum from the apparently healthy to the very early embryonic lethal.15

In this paper, we expand our work on human knockouts by reporting their relevance to the study of complex traits. Specifically, we report five genes that had not been reported in a Mendelian context but had been suggested to influence complex trait genetic risk through GWAS signals. By comparing the phenotypes of patients who are knocked out for these genes to the purported complex trait association, we show the potential value of this approach in improving the specificity of GWAS interpretation.

Materials and methods

Human subjects

Human subjects are individuals who presented with phenotypes that matched existing institutional review board–approved research protocols (KFSHRC RAC 2070023, 2080006, 2121053). After informed consent was obtained, blood was collected from index and available family members as appropriate for DNA extraction and subsequent analysis. A separate consent to publish identifiable photographs was also obtained.

Autozygome analysis

Mapping of all autozygous segments per genome (autozygome) was as described before. Briefly, regions of homozygosity >2 Mb were used as surrogates of autozygosity as determined by AutoSNPa. The genotyping platform was Axiom single-nucleotide polymorphism (SNP) chip from Affymetrix (Santa Clara, CA, USA).

Exome analysis and variant interpretation

Exome capture was performed using TruSeq Exome Enrichment kit (Illumina, San Diego, CA, USA) following the manufacturer’s protocol. Samples were prepared as an Illumina sequencing library, and in the second step, the sequencing libraries were enriched for the desired target using the Illumina Exome Enrichment protocol. The captured libraries were sequenced using Illumina HiSeq 2000 Sequencer. The reads were mapped against UCSC hg19 (http://genome.ucsc.edu/) by BWA (http://bio-bwa.sourceforge.net/). The SNPs and indels were detected by SAMTOOLS (http://samtools.sourceforge.net/). For this study, we included homozygous truncating variants only if they met the following criteria: (i) within the autozygome, (ii) novel or rare (<0.001) in ExAC and 2,379 Saudi exomes, (iii) in a gene with no established Mendelian phenotype, and (iv) in a gene with suggested association to a complex trait in a previously peer-reviewed published GWAS.

Results

We report in this study the identification of five knockout events that met our eligibility criteria and are described below (Table 1, Figure 1 and Supplementary Figure S1 online). Details of the clinical phenotype are provided in the accompanying supplemental Supplementary Note S1 (the file called “GWAS paper clinical Notes:v5”).

Table 1 Summary of clinical findings and mutation information in the study cohort
Figure 1
figure 1

Clinical features of human knockouts studied in this cohort. (ac) Clinical images of the index with TRAF3IP2 biallelic truncation showing extensive skin disease including the scalp. (d) Magnetic resonance image of the index with PXDNL biallelic splicing variant showing interhemispheric cyst. (eg) Lack of obvious facial dysmorphism in the three siblings with RSRC1 biallelic truncation. (h) Hypotonic posture of the index with biallelic truncation of BTBD9. (i) Antenatal ultrasonography of the index with biallelic truncation of FRMD3 showing anhydramnios and echogenic kidneys of normal size.

BTBD9

Several GWAS studies have implicated this poorly characterized gene in the pathogenesis of restless leg syndrome and periodic limb movement of sleep although the mechanism remains unclear.16 In a consanguineous family from Yemen with multiple deaths due to severe unexplained myopathy (normal creatine kinase) and normal brain magnetic resonance image, we identified a homozygous truncating mutation NM_001172418.1:c.1015C>T:p.(Arg339*) in the two affected siblings we had access to.

TRAF3IP2

This gene encodes an interactor with IL17RA and this interaction is thought to mediate the suggested role of TRAF3IP2 in psoriasis susceptibility, which was reported by several GWAS.17, 18 We have identified three siblings with severe eczema and elevated IgE who are all homozygous for the truncating mutation NM_147686.3:c.488_492delTACCT:p.(Leu163*).

RSRC1

The encoded protein is thought to play a role in RNA splicing and has been implicated in height and subjective wellbeing based on GWAS approach.19, 20 We show that in a family of three siblings with nonsyndromic intellectual disability and normal brain magnetic resonance image, a homozygous truncating mutation NM_016625.3:c.268C>T:p.(Arg90*) fully segregated with the phenotype.

FRMD3

The encoded protein belongs to the 4.1 protein family and is highly enriched in the kidney. Several GWAS have implicated FRMD3 in the genetic risk of diabetic kidney disease.21, 22, 23 However, only its association with Lewy body disease is listed in GWAS Catalog based on the study by Beecham et al.24 Remarkably, we have identified a homozygous FRMD3 truncation NM_001244962.1:c.70C>T:p.(Arg24*) in a female patient who died shortly after birth with Potter sequence and severely echogenic kidneys with no identifiable cysts.

PXDNL

This gene encodes a protein thought to play a role in cytoskeleton remodeling and cell motility.24 An intragenic SNP was reported among the top association signals for neuritic plaques, which in turn correlated positively with Alzheimer disease risk.24 We have observed a homozygous splicing mutation NM_144651.4:c.695-2 A>T in a patient with intellectual disability, partial agenesis of corpus callosum, and hypoplastic cerebellum and brainstem.

Discussion

The contribution of Mendelian genes to the genetic risk of complex traits has long intrigued the human genetics community.25 While early case–control studies that were designed to investigate the contribution of specific Mendelian genes did not generate rigorous associations, subsequent GWAS have indeed confirmed numerous associations between common and rare variants in Mendelian genes and the complex counterpart.25 Genes with Mendelian links can serve as landmarks to guide GWAS investigators as they attempt to interpret the likely source of biological signal when the genetic signal is not in a gene. For example, the study by Harley et al. on the genetics of systemic lupus erythematosus identified a nongenic SNP, and the authors attributed the signal to the closest gene, PXK, even though the gene has no obvious connection to autoimmunity.26 However, our subsequent finding that human knockouts for DNASE1L3, another gene in the same vicinity of the SNP as PXK, prompted a re-analysis that confirmed that DNASE1L3 was indeed the likely source of the signal.27, 28 On the other hand, the finding of Mendelian forms in the same gene implicated in GWAS provides additional and independent support that the signal identified by the latter is likely causal, as demonstrated by our finding that the GWAS-implicated LACC1 in inflammatory bowel disease can cause a Mendelian form of the disorder, and the GWAS-implicated ARL6IP6 in stroke can cause a Mendelian disease with severe cerebrovascular disease.29, 30

Human knockouts have historically contributed to the mapping of novel disease genes. However, the recent appreciation of the widespread occurrence of knockouts has prompted reconsideration of their phenotypic consequences especially since some are clearly not associated with any conspicuous phenotype.13 Therefore, we caution against the conclusion that the phenotypes we report in this study are necessarily caused by the knockout events before additional patients are described. However, they can still be helpful in interpreting the GWAS signals that had been attributed to these genes. We note that in certain instances, the knockout phenotype likely represents the severe Mendelian counterpart of the complex trait with reported association. This is most evident in TRAF3IP2 (the severe skin phenotype versus psoriasis susceptibility) although we also note significant overlap in the case of FRMD3 (severely echogenic kidneys versus diabetic nephropathy susceptibility), and BTBD9 (severe myopathy versus restless leg syndrome susceptibility). In contrast, the observed knockout phenotypes for PDXNL and RSC1 appear to be unrelated to the phenotypes with which variants in these genes have been associated in GWAS. It is worth highlighting that the essentially loss-of-function consequence of human knockout events leaves open the possibility that the observed discrepancy with GWAS is due to potential gene upregulation caused by the reported variants in those studies.

In conclusion, we suggest that naturally occurring knockout events in humans can aid in the interpretation of GWAS signals depending on the resulting phenotype. An expanded catalog of these events and their associated phenotypes is under way to inform research into Mendelian and complex traits.