Introduction

Whole-exome sequencing (WES) is a powerful application of massively parallel sequencing technology that allows the sequencing of the entire coding genome and its flanking sequence.1 Since its first application to identify the cause of a Mendelian disorder, it has become a standard tool in the study of these diseases, which are caused by variants in single genes.2,3 This success has quickly penetrated the clinical arena, where clinical exome sequencing (CES) is increasingly integrated in the diagnostic process of individuals with suspected genetic disorders. The power of CES to identify the causal variants without the need for expensive ancillary tests or subjective clinical evaluation is especially appealing for such heterogeneous disorders as developmental delay and intellectual disability. Recent studies show that CES is economically competitive with the traditional testing strategy, arguing for its potential use as a first-tier standard diagnostic test.4,5,6

Clinical sensitivity of CES, i.e., the percentage of cases for which CES identifies a likely causal variant for the phenotype undergoing investigation, is surprisingly comparable among large and heterogeneous case series. For example, Retterer et al. identified a likely causal variant in 28.8% of 3,040 samples.7 Similarly, Farwell and colleagues reported a yield of 30% among their first 500 tested cases.8 These figures are very similar to those generated by previous large studies such as those by Yang et al. and Lee et al. that reported yields of 25.2% for 2,000 cases and 26% for 814 cases, respectively.9,10 A lower yield was reported among 486 adults with a suspected genetic diagnosis for various indications (17.5%).11

Although very high compared to other tests, this clinical sensitivity leaves >70% of patients with suspected genetic disorders undiagnosed molecularly. Nongenetic or non-Mendelian etiology may explain some of the “negative” CES; however, CES in conditions very likely to be Mendelian, e.g., retinal dystrophy and intellectual disability with positive family history, also suffered from a limited sensitivity.9,10,12 Another possible explanation is that CES was rarely ordered as a first-tier test in the published cohorts, so it is possible that tested cases were enriched for unlikely mutational mechanisms and novel genes that may be more difficult to uncover by CES. Recent studies of CES as a first-tier test did suggest a higher sensitivity (50–60%); this may explain the limited clinical sensitivity but only partially.6,13 Finally, there is a growing concern that cases with “negative” CES may harbor variants that are not captured or sequenced by the assay, although the magnitude of this is currently unknown.

One major challenge in addressing the issue of clinical sensitivity of CES is lack of data regarding the theoretical maximum potential of CES. In other words, the contribution of the various classes of mutations to the etiology of Mendelian diseases is unknown. For example, although the limited available data regarding WGS suggest that the contribution of nongenic variants to Mendelian diseases is very limited,14,15 this may have been an underestimate due to bias in interpreting this difficult class of mutations. One approach that has the potential to address this challenge is positional mapping. A Mendelian phenotype can be mapped to a single positional locus with sufficient meiotic events irrespective of the nature of the mutation. Therefore, interrogating a large number of phenotypes that are mapped to a single locus each can provide the much needed benchmark to calculate the maximum potential of CES as a diagnostic test, as we show in this study. After showing that the theoretical maximum yield of CES is much higher than what is experienced in practice, we demonstrate that the causal variants in the majority of “negative” CES can indeed be identified by improved variant filtration rather than increased coverage.

Materials and Methods

Human subjects

Families who map to single loci have been recruited as part of several positional mapping projects with appropriate consenting and IRB approval. They are all consanguineous and have several members with the same autosomal-recessive phenotype. Patients enrolled in this specific study have all been evaluated clinically for suspected genetic disorders and received a negative CES report, i.e., a report in which no likely causal variant for the indication of the test was identified. CES was performed in major diagnostic laboratories (names are not revealed for confidentiality).

Positional mapping

All families with single loci that we present in this paper were mapped using autozygome analysis essentially as described before.16 Briefly, we first performed genome-wide genotyping using the Axiom SNP Chip platform following the manufacturer’s instructions (Affymetrix). We then scanned the generated files for regions of homozygosity ≥2 Mb as surrogates of autozygosity given the parental consanguinity using AutoSNPa.17 The entire set of autozygous blocks per genome (autozygome) was compared among family members with the same phenotype. Only families with a single autozygous interval exclusively shared among the affected members are included in Supplementary Table S1 online. For our reanalysis of “negative” CES, we used the autozygome of the tested individual and, when applicable, other affected relatives to filter the exomic variants, essentially as described before.3,18

Exome sequencing and variant interpretation

Obtaining the raw CES data was not possible, so we had to repeat WES to perform the reanalysis of “negative” CES cases. Exome capture was performed using the TruSeq Exome Enrichment kit (Illumina) following the manufacturer’s protocol. Samples were prepared as an Illumina sequencing library; in the second step, the sequencing libraries were enriched for the desired target using the Illumina Exome Enrichment protocol. The captured libraries were sequenced using Illumina HiSeq 2000 Sequencer. The reads were mapped against UCSC hg19 by BWA. The SNPs and indels were detected by SAMTOOLS. Variants from WES were filtered such that only novel (or very low-frequency 0.1%), coding/splicing, homozygous variants that are within the autozygome of the affected individual (or shared autozygome of the affected individuals when applicable) were considered likely causal variants. Frequency of variants was determined using publically available variant databases (1000 Genomes, Exome Variant Server, and ExAC Browser) as well as a database of 817 in-house ethnically matched exomes as described previously.19 The ACMG guidelines on variant interpretation and classification were followed.20 Because the ACMG guidelines apply only to established disease genes, we labeled variants in novel genes as variants of unknown significance.

Results

Very high theoretical maximum sensitivity of CES

Our group has been working on mapping recessive Mendelian phenotypes since 2007 by exploiting the special structure of the local population with consanguinity, large family structure, and free access to health care.21 By surveying the thousands of families recruited in the process, we were able to identify 104, mostly published, families in whom a phenotype that follows the autosomal-recessive mode of inheritance could be mapped to a single autozygous interval (Supplementary Figure S1 online shows the pedigree of all unpublished families). Using this as a denominator, we found that the underlying variant was identified within the critical interval in all but three (97%). Supplementary Table S1 online summarizes the classes of variants identified in these families. Although all identified variants were genic, intronic variants deserve a special mention. In aggregate, splicing variants accounted for 17% of the total variants and 2% were >50 nucleotides from the nearest exon. Because the latter group is not typically captured by the design of CES, we conclude that the hypothetical maximum yield of CES in the setting of autosomal recessive phenotypes is >95%.

Our own experience with CES performed by the same major reference laboratories from which the literature on clinical utility of CES is derived revealed a much lower sensitivity (~30%) compared to the hypothetical maximum of 95%. This provided an opportunity to investigate the reasons for this discrepancy and whether positional mapping can unlock the full potential of CES. Therefore, we approached families with suspected autosomal recessive phenotypes in whom CES (some additionally had clinical WGS) did not reveal a likely causal variant, i.e., “negative,” and 33 agreed to recruitment under an IRB-approved research protocol (KFSHRC RAC2121053). We then proceeded with our previously published pipeline of autozygome analysis and WES.18

Reanalysis of “negative” CES using positional mapping markedly improves sensitivity

Table 1 summarizes the results of our analysis of “negative” CES. Overall, a likely causal variant was identified in 88% of the families. Even if we limited our analysis to established disease genes, a likely causal variant was identified in 48%, although we note that CES laboratories typically do report variants in novel genes; however, none of these was reported. Although the numbers are small, cases in which autosomal-recessive inheritance was clearly implied from the pedigree structure had a higher success rate compared to simplex cases presumed to be recessive on the basis of parental consanguinity only. This cohort of cases that had been “negative” on CES, despite the small numbers, gave us an opportunity to address why the causal variants were originally missed.

Table 1 Summary of cases with “negative” CES and the results of reanalysis using research-grade WES with autozygome filtering

One prominent class of mutations was splicing. In none of these cases was the splicing variant deep enough to be missed by the capture design of CES, although they were not in the canonical splice site. These include the +8 variant in RTTN,22 +4 in SMG9, and +5 in ECHS1. Of note, the +5 variant in ECHS1 was not called, even by clinical whole-genome sequencing, which was requested by the referring clinician after receiving a negative CES result. Even the deep -24 variant in COG6 was, in fact, included in the VCF by the reference laboratory but was not included in the final report.12 These results suggest that splicing variants beyond the canonical 1/2 position are very difficult to call with confidence as disease-causing by CES without supporting evidence from positional mapping. The CTU2 variant that abolishes the splice-donor site by replacing the last bp of the exon may have been missed because it masquerades as a synonymous change, although CES laboratories are usually aware of this well-established phenomenon.23 Surprisingly, none of the other “missed” variants could be presumed challenging a priori. For example, the missense variants we identified in ISCA2, ASNS, C3orf17, and SLC1A4 were all missed by CES, and VRK1 was also missed on WGS. Similarly, the nonsense variant in UNC80 is expected to be easily called by all next-generation sequencing platforms. Even the indels we identified in this resequencing study were not particularly challenging and were identified by the same sequencing platform we used in this study, as typically used by the CES laboratories, i.e., Illumina HiSeq2000 (e.g., the four-nucleotide deletion in GOLGA224). Again, it is worth highlighting that the latter variant was missed not only on CES but also on clinical WGS.

Discussion

CES is increasingly becoming the test of choice in clinical genomics, particularly in patients with challenging phenotypes. An important question that has arisen as a result of this widespread use is: how should a “negative” CES report be interpreted? It is tempting to speculate that coverage is the main culprit in the limited sensitivity of CES, i.e., the missed variants in “negative” CES are present outside the ~2% coding/flanking sequences that are targeted by CES. The limited available data for whole-genome sequencing (WGS), however, do not seem to support this. For example, the application of WGS for 152 cases with various clinical phenotypes that are suspected to be genetic in etiology only revealed one nongenic variant that would have been missed by WES: a small deletion 1.5 kb downstream of SOX3.14 The overall clinical sensitivity of WGS in that study was 21% and was not superior to that of CES.

In this report, we attempted to address this question by first calculating the theoretical maximum sensitivity of CES using an approach that is neutral to the class of variants. In fact, we suggest that the data we obtained by dissecting the various classes of variants identified in the largest published cohort of phenotypes that map to a single locus each can inform the wider question about the contribution of individual mutation classes to the etiology of Mendelian diseases. Unfortunately, our data are limited to autosomal-recessive “disease” phenotypes, so we caution against their generalization to other modes of inheritance but suspect similar data can be collected from laboratories that primarily deal with autosomal-dominant and X-linked phenotypes. Nonetheless, the finding that the overwhelming majority of autosomal-recessive disease variants are within genes is reassuring in that the extreme rarity of reports of nongenic autosomal-recessive disease variants likely reflect their genuine rarity rather than a bias against their detection.25,26 Even when deep splicing variants that are not covered by the current design of CES are excluded, the sensitivity of CES is much less than the theoretical maximum of 95%. This suggests that the solution to “negative” CES is not necessarily WGS because it is likely that the causal variants have in fact been sequenced but failed to be called by the laboratory director. This is highly consistent with the data we present regarding the resequencing of individuals with “negative” CES, in which we show that, in many instances, the causal variant was “missed” because it was filtered out at the filtration stage rather than because of inherent technical limitation. This highlights the need for improved filtering algorithms used in the analysis of CES. Indeed, we posit that investing in these improved algorithms should be a priority over investing in clinical WGS because the latter will probably miss the same variants when the same algorithms are used, as demonstrated by at least three cases in this cohort. Although positional mapping is only applicable in a subset of cases, our data strongly suggest that it should be incorporated in the filtering algorithms whenever applicable and there are methods that allow this directly on VCF files.27 Assuming similar quality of WES performed on a research basis by us and CES performed previously (the latter very likely has a much higher quality), the only obvious variable to which we can attribute our higher success rate is the implementation of positional mapping.

In conclusion, we present an innovative method to calculate the theoretical maximum yield of CES in the setting of autosomal-recessive diseases, which we show to be much higher than currently experienced. We demonstrate that positional mapping can minimize this discrepancy by extracting causal variants from most “negative” CES. Our data strongly support the incorporation of positional mapping in the analysis of CES whenever applicable.

Disclosure

The authors declare no conflict of interest.