Increasing the sensitivity of clinical exome sequencing through improved filtration strategy

Shamseldin, Hanan E.; Maddirevula, Sateesh; Faqeih, Eissa; Ibrahim, Niema; Hashem, Mais; Shaheen, Ranad; Alkuraya, Fowzan S.

doi:10.1038/gim.2016.155

Download PDF

Original Research Article
Published: 06 October 2016

Increasing the sensitivity of clinical exome sequencing through improved filtration strategy

Genetics in Medicine volume 19, pages 593–598 (2017)Cite this article

2107 Accesses
54 Citations
19 Altmetric
Metrics details

Subjects

Abstract

Background:

Clinical exome sequencing (CES) has greatly improved the diagnostic process for individuals with suspected genetic disorders. However, the majority remains undiagnosed after CES. Although understanding potential reasons for this limited sensitivity is critical for improving the delivery of clinical genomics, research in this area has been limited.

Materials and Methods:

We first calculated the theoretical maximum sensitivity of CES by analyzing >100 families in whom a Mendelian phenotype is mapped to a single locus. We then tested the hypothesis that positional mapping can limit the search space and thereby facilitate variant interpretation by reanalyzing 33 families with “negative” CES and applying positional mapping.

Results:

We found that >95% of families who map to a single locus harbored genic (as opposed to intergenic) variants that are potentially identifiable by CES. Our reanalysis of “negative” CES revealed likely causal variants in the majority (88%). Several of these solved cases have undergone negative whole-genome sequencing.

Conclusion:

The discrepancy between the theoretical maximum and the actual clinical sensitivity of CES is primarily in the variant filtration rather than the variant capture and sequencing phase. The solution to negative CES is not necessarily in expanding the coverage but rather in devising approaches that improve variant filtration. We suggest that positional mapping is one such approach.

Genet Med advance online publication 06 October 2016

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles

Article 09 April 2024

Saori Sakaue, Kathryn Weinand, … Soumya Raychaudhuri

Genomic data in the All of Us Research Program

Article Open access 19 February 2024

The All of Us Research Program Genomics Investigators

Introduction

Whole-exome sequencing (WES) is a powerful application of massively parallel sequencing technology that allows the sequencing of the entire coding genome and its flanking sequence.¹ Since its first application to identify the cause of a Mendelian disorder, it has become a standard tool in the study of these diseases, which are caused by variants in single genes.^2,3 This success has quickly penetrated the clinical arena, where clinical exome sequencing (CES) is increasingly integrated in the diagnostic process of individuals with suspected genetic disorders. The power of CES to identify the causal variants without the need for expensive ancillary tests or subjective clinical evaluation is especially appealing for such heterogeneous disorders as developmental delay and intellectual disability. Recent studies show that CES is economically competitive with the traditional testing strategy, arguing for its potential use as a first-tier standard diagnostic test.^4,5,6

Clinical sensitivity of CES, i.e., the percentage of cases for which CES identifies a likely causal variant for the phenotype undergoing investigation, is surprisingly comparable among large and heterogeneous case series. For example, Retterer et al. identified a likely causal variant in 28.8% of 3,040 samples.⁷ Similarly, Farwell and colleagues reported a yield of 30% among their first 500 tested cases.⁸ These figures are very similar to those generated by previous large studies such as those by Yang et al. and Lee et al. that reported yields of 25.2% for 2,000 cases and 26% for 814 cases, respectively.^9,10 A lower yield was reported among 486 adults with a suspected genetic diagnosis for various indications (17.5%).¹¹

Although very high compared to other tests, this clinical sensitivity leaves >70% of patients with suspected genetic disorders undiagnosed molecularly. Nongenetic or non-Mendelian etiology may explain some of the “negative” CES; however, CES in conditions very likely to be Mendelian, e.g., retinal dystrophy and intellectual disability with positive family history, also suffered from a limited sensitivity.^9,10,12 Another possible explanation is that CES was rarely ordered as a first-tier test in the published cohorts, so it is possible that tested cases were enriched for unlikely mutational mechanisms and novel genes that may be more difficult to uncover by CES. Recent studies of CES as a first-tier test did suggest a higher sensitivity (50–60%); this may explain the limited clinical sensitivity but only partially.^6,13 Finally, there is a growing concern that cases with “negative” CES may harbor variants that are not captured or sequenced by the assay, although the magnitude of this is currently unknown.

One major challenge in addressing the issue of clinical sensitivity of CES is lack of data regarding the theoretical maximum potential of CES. In other words, the contribution of the various classes of mutations to the etiology of Mendelian diseases is unknown. For example, although the limited available data regarding WGS suggest that the contribution of nongenic variants to Mendelian diseases is very limited,^14,15 this may have been an underestimate due to bias in interpreting this difficult class of mutations. One approach that has the potential to address this challenge is positional mapping. A Mendelian phenotype can be mapped to a single positional locus with sufficient meiotic events irrespective of the nature of the mutation. Therefore, interrogating a large number of phenotypes that are mapped to a single locus each can provide the much needed benchmark to calculate the maximum potential of CES as a diagnostic test, as we show in this study. After showing that the theoretical maximum yield of CES is much higher than what is experienced in practice, we demonstrate that the causal variants in the majority of “negative” CES can indeed be identified by improved variant filtration rather than increased coverage.

Materials and Methods

Human subjects

Families who map to single loci have been recruited as part of several positional mapping projects with appropriate consenting and IRB approval. They are all consanguineous and have several members with the same autosomal-recessive phenotype. Patients enrolled in this specific study have all been evaluated clinically for suspected genetic disorders and received a negative CES report, i.e., a report in which no likely causal variant for the indication of the test was identified. CES was performed in major diagnostic laboratories (names are not revealed for confidentiality).

Positional mapping

All families with single loci that we present in this paper were mapped using autozygome analysis essentially as described before.¹⁶ Briefly, we first performed genome-wide genotyping using the Axiom SNP Chip platform following the manufacturer’s instructions (Affymetrix). We then scanned the generated files for regions of homozygosity ≥2 Mb as surrogates of autozygosity given the parental consanguinity using AutoSNPa.¹⁷ The entire set of autozygous blocks per genome (autozygome) was compared among family members with the same phenotype. Only families with a single autozygous interval exclusively shared among the affected members are included in Supplementary Table S1 online. For our reanalysis of “negative” CES, we used the autozygome of the tested individual and, when applicable, other affected relatives to filter the exomic variants, essentially as described before.^3,18

Exome sequencing and variant interpretation

Obtaining the raw CES data was not possible, so we had to repeat WES to perform the reanalysis of “negative” CES cases. Exome capture was performed using the TruSeq Exome Enrichment kit (Illumina) following the manufacturer’s protocol. Samples were prepared as an Illumina sequencing library; in the second step, the sequencing libraries were enriched for the desired target using the Illumina Exome Enrichment protocol. The captured libraries were sequenced using Illumina HiSeq 2000 Sequencer. The reads were mapped against UCSC hg19 by BWA. The SNPs and indels were detected by SAMTOOLS. Variants from WES were filtered such that only novel (or very low-frequency 0.1%), coding/splicing, homozygous variants that are within the autozygome of the affected individual (or shared autozygome of the affected individuals when applicable) were considered likely causal variants. Frequency of variants was determined using publically available variant databases (1000 Genomes, Exome Variant Server, and ExAC Browser) as well as a database of 817 in-house ethnically matched exomes as described previously.¹⁹ The ACMG guidelines on variant interpretation and classification were followed.²⁰ Because the ACMG guidelines apply only to established disease genes, we labeled variants in novel genes as variants of unknown significance.

Results

Very high theoretical maximum sensitivity of CES

Our group has been working on mapping recessive Mendelian phenotypes since 2007 by exploiting the special structure of the local population with consanguinity, large family structure, and free access to health care.²¹ By surveying the thousands of families recruited in the process, we were able to identify 104, mostly published, families in whom a phenotype that follows the autosomal-recessive mode of inheritance could be mapped to a single autozygous interval (Supplementary Figure S1 online shows the pedigree of all unpublished families). Using this as a denominator, we found that the underlying variant was identified within the critical interval in all but three (97%). Supplementary Table S1 online summarizes the classes of variants identified in these families. Although all identified variants were genic, intronic variants deserve a special mention. In aggregate, splicing variants accounted for 17% of the total variants and 2% were >50 nucleotides from the nearest exon. Because the latter group is not typically captured by the design of CES, we conclude that the hypothetical maximum yield of CES in the setting of autosomal recessive phenotypes is >95%.

Our own experience with CES performed by the same major reference laboratories from which the literature on clinical utility of CES is derived revealed a much lower sensitivity (~30%) compared to the hypothetical maximum of 95%. This provided an opportunity to investigate the reasons for this discrepancy and whether positional mapping can unlock the full potential of CES. Therefore, we approached families with suspected autosomal recessive phenotypes in whom CES (some additionally had clinical WGS) did not reveal a likely causal variant, i.e., “negative,” and 33 agreed to recruitment under an IRB-approved research protocol (KFSHRC RAC2121053). We then proceeded with our previously published pipeline of autozygome analysis and WES.¹⁸

Reanalysis of “negative” CES using positional mapping markedly improves sensitivity

Table 1 summarizes the results of our analysis of “negative” CES. Overall, a likely causal variant was identified in 88% of the families. Even if we limited our analysis to established disease genes, a likely causal variant was identified in 48%, although we note that CES laboratories typically do report variants in novel genes; however, none of these was reported. Although the numbers are small, cases in which autosomal-recessive inheritance was clearly implied from the pedigree structure had a higher success rate compared to simplex cases presumed to be recessive on the basis of parental consanguinity only. This cohort of cases that had been “negative” on CES, despite the small numbers, gave us an opportunity to address why the causal variants were originally missed.

Table 1 Summary of cases with “negative” CES and the results of reanalysis using research-grade WES with autozygome filtering

Full size table

One prominent class of mutations was splicing. In none of these cases was the splicing variant deep enough to be missed by the capture design of CES, although they were not in the canonical splice site. These include the +8 variant in RTTN,²² +4 in SMG9, and +5 in ECHS1. Of note, the +5 variant in ECHS1 was not called, even by clinical whole-genome sequencing, which was requested by the referring clinician after receiving a negative CES result. Even the deep -24 variant in COG6 was, in fact, included in the VCF by the reference laboratory but was not included in the final report.¹² These results suggest that splicing variants beyond the canonical 1/2 position are very difficult to call with confidence as disease-causing by CES without supporting evidence from positional mapping. The CTU2 variant that abolishes the splice-donor site by replacing the last bp of the exon may have been missed because it masquerades as a synonymous change, although CES laboratories are usually aware of this well-established phenomenon.²³ Surprisingly, none of the other “missed” variants could be presumed challenging a priori. For example, the missense variants we identified in ISCA2, ASNS, C3orf17, and SLC1A4 were all missed by CES, and VRK1 was also missed on WGS. Similarly, the nonsense variant in UNC80 is expected to be easily called by all next-generation sequencing platforms. Even the indels we identified in this resequencing study were not particularly challenging and were identified by the same sequencing platform we used in this study, as typically used by the CES laboratories, i.e., Illumina HiSeq2000 (e.g., the four-nucleotide deletion in GOLGA2²⁴). Again, it is worth highlighting that the latter variant was missed not only on CES but also on clinical WGS.

Discussion

CES is increasingly becoming the test of choice in clinical genomics, particularly in patients with challenging phenotypes. An important question that has arisen as a result of this widespread use is: how should a “negative” CES report be interpreted? It is tempting to speculate that coverage is the main culprit in the limited sensitivity of CES, i.e., the missed variants in “negative” CES are present outside the ~2% coding/flanking sequences that are targeted by CES. The limited available data for whole-genome sequencing (WGS), however, do not seem to support this. For example, the application of WGS for 152 cases with various clinical phenotypes that are suspected to be genetic in etiology only revealed one nongenic variant that would have been missed by WES: a small deletion 1.5 kb downstream of SOX3.¹⁴ The overall clinical sensitivity of WGS in that study was 21% and was not superior to that of CES.

In this report, we attempted to address this question by first calculating the theoretical maximum sensitivity of CES using an approach that is neutral to the class of variants. In fact, we suggest that the data we obtained by dissecting the various classes of variants identified in the largest published cohort of phenotypes that map to a single locus each can inform the wider question about the contribution of individual mutation classes to the etiology of Mendelian diseases. Unfortunately, our data are limited to autosomal-recessive “disease” phenotypes, so we caution against their generalization to other modes of inheritance but suspect similar data can be collected from laboratories that primarily deal with autosomal-dominant and X-linked phenotypes. Nonetheless, the finding that the overwhelming majority of autosomal-recessive disease variants are within genes is reassuring in that the extreme rarity of reports of nongenic autosomal-recessive disease variants likely reflect their genuine rarity rather than a bias against their detection.^25,26 Even when deep splicing variants that are not covered by the current design of CES are excluded, the sensitivity of CES is much less than the theoretical maximum of 95%. This suggests that the solution to “negative” CES is not necessarily WGS because it is likely that the causal variants have in fact been sequenced but failed to be called by the laboratory director. This is highly consistent with the data we present regarding the resequencing of individuals with “negative” CES, in which we show that, in many instances, the causal variant was “missed” because it was filtered out at the filtration stage rather than because of inherent technical limitation. This highlights the need for improved filtering algorithms used in the analysis of CES. Indeed, we posit that investing in these improved algorithms should be a priority over investing in clinical WGS because the latter will probably miss the same variants when the same algorithms are used, as demonstrated by at least three cases in this cohort. Although positional mapping is only applicable in a subset of cases, our data strongly suggest that it should be incorporated in the filtering algorithms whenever applicable and there are methods that allow this directly on VCF files.²⁷ Assuming similar quality of WES performed on a research basis by us and CES performed previously (the latter very likely has a much higher quality), the only obvious variable to which we can attribute our higher success rate is the implementation of positional mapping.

In conclusion, we present an innovative method to calculate the theoretical maximum yield of CES in the setting of autosomal-recessive diseases, which we show to be much higher than currently experienced. We demonstrate that positional mapping can minimize this discrepancy by extracting causal variants from most “negative” CES. Our data strongly support the incorporation of positional mapping in the analysis of CES whenever applicable.

Disclosure

The authors declare no conflict of interest.

References

Ng SB, Turner EH, Robertson PD, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 2009;461:272–276.
Article CAS Google Scholar
Shendure J. Next-generation human genetics. Genome Biol 2011;12:408.
Article CAS Google Scholar
Alkuraya FS. Discovery of mutations for Mendelian disorders. Human Genetics. 2016:1–9.
Monroe GR, Frederix GW, Savelberg SMC, et al. Effectiveness of whole-exome sequencing and costs of the traditional diagnostic trajectory in children with intellectual disability. Genet Med 2016; 18:949–956.
Article CAS Google Scholar
Gomez CM, Das S. Clinical exome sequencing: the new standard in genetic diagnosis. JAMA Neurol 2014;71:1215–1216.
Article Google Scholar
Anazi S, Maddirevula S, Faqeih E, et al.Clinical genomics expands the morbid genome of intellectual disability and offers a high diagnostic yield. Mol Psychiatry 2016.doi: 10.1038/mp.2016.113 (e-pub ahead of print).
Retterer K, Juusola J, Cho MT, et al. Clinical application of whole-exome sequencing across clinical indications. Genet Med 2016;18:696–704.
Article CAS Google Scholar
Farwell KD, Shahmirzadi L, El-Khechen D, et al. Enhanced utility of family-centered diagnostic exome sequencing with inheritance model-based analysis: results from 500 unselected families with undiagnosed genetic conditions. Genet Med 2015;17:578–586.
Article CAS Google Scholar
Yang Y, Muzny DM, Xia F, et al. Molecular findings among patients referred for clinical whole-exome sequencing. JAMA 2014;312:1870–1879.
Article CAS Google Scholar
Lee H, Deignan JL, Dorrani N, et al. Clinical exome sequencing for genetic identification of rare Mendelian disorders. JAMA 2014;312:1880–1887.
Article Google Scholar
Posey JE, Rosenfeld JA, James RA, et al. Molecular diagnostic experience of whole-exome sequencing in adult patients. Genet Med 2016;18:678–685.
Article CAS Google Scholar
Yavarna T, Al-Dewik N, Al-Mureikhi M, et al. High diagnostic yield of clinical exome sequencing in Middle Eastern patients with Mendelian disorders. Hum Genet 2015;134:967–980.
Article CAS Google Scholar
Stark Z, Tan TY, Chong B, et al. A prospective evaluation of whole-exome sequencing as a first-tier molecular test in infants with suspected monogenic disorders. Genet Med; e-pub ahead of print 3 March 2016.
Taylor JC, Martin HC, Lise S, et al. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders. Nat Genet 2015;47:717–726.
Article CAS Google Scholar
Gilissen C, Hehir-Kwa JY, Thung DT, et al. Genome sequencing identifies major causes of severe intellectual disability. Nature 2014;511:344–347.
Article CAS Google Scholar
Alkuraya FS. Autozygome decoded. Genet Med 2010;12:765–771.
Article Google Scholar
Carr IM, Flintoff KJ, Taylor GR, Markham AF, Bonthron DT. Interactive visual analysis of SNP data for rapid autozygosity mapping in consanguineous families. Hum Mutat 2006;27:1041–1046.
Article Google Scholar
Alkuraya FS. The application of next-generation sequencing in the autozygosity mapping of human recessive diseases. Hum Genet 2013;132:1197–1211.
Article CAS Google Scholar
Saudi Mendeliome Group. Comprehensive gene panels provide advantages over clinical exome sequencing for Mendelian diseases. Genome Biol 2015;16:134.
Article Google Scholar
Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015;17:405–423.
Article Google Scholar
Alkuraya FS. Genetics and genomic medicine in Saudi Arabia. Mol Genet Genomic Med 2014;2:369–378.
Article Google Scholar
Shamseldin H, Alazami AM, Manning M, et al.; Care4Rare Canada Consortium. RTTN mutations cause primary microcephaly and primordial dwarfism in humans. Am J Hum Genet 2015;97:862–868.
Article CAS Google Scholar
Shaheen R, Patel N, Shamseldin H, et al. Accelerating matchmaking of novel dysmorphology syndromes through clinical and genomic characterization of a large cohort. Genet Med 2016;18:686–695.
Article Google Scholar
Shamseldin HE, Bennett AH, Alfadhel M, Gupta V, Alkuraya FS. GOLGA2, encoding a master regulator of golgi apparatus, is mutated in a patient with a neuromuscular disorder. Hum Genet 2016;135:245–251.
Article CAS Google Scholar
Weedon MN, Cebola I, Patch AM, et al.; International Pancreatic Agenesis Consortium. Recessive mutations in a distal PTF1A enhancer cause isolated pancreatic agenesis. Nat Genet 2014;46:61–64.
Article CAS Google Scholar
Bae BI, Tietjen I, Atabay KD, et al. Evolutionarily dynamic alternative splicing of GPR56 regulates regional cerebral cortical patterning. Science 2014;343:764–768.
Article CAS Google Scholar
Carr IM, Bhaskar S, O’Sullivan J, et al. Autozygosity mapping with exome sequence data. Hum Mutat 2013;34:50–56.
Article CAS Google Scholar

Download references

Acknowledgements

We thank the families for their enthusiastic participation. We also thank the Sequencing and Genotyping Facilities at KFSHRC for its technical help. This work was supported by KACST grants 13-BIO1113-20 (F.S.A.) and KSCDR (F.S.A.).

Author information

The first two authors contributed equally to this work.

Authors and Affiliations

Department of Genetics, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
Hanan E. Shamseldin, Sateesh Maddirevula, Niema Ibrahim, Mais Hashem, Ranad Shaheen & Fowzan S. Alkuraya
Department of Pediatric Subspecialties, Children’s Hospital, King Fahad Medical City, Riyadh, Saudi Arabia
Eissa Faqeih
Department of Anatomy and Cell Biology, College of Medicine, Alfaisal University, Riyadh, Saudi Arabia
Fowzan S. Alkuraya

Authors

Hanan E. Shamseldin
View author publications
You can also search for this author in PubMed Google Scholar
Sateesh Maddirevula
View author publications
You can also search for this author in PubMed Google Scholar
Eissa Faqeih
View author publications
You can also search for this author in PubMed Google Scholar
Niema Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
Mais Hashem
View author publications
You can also search for this author in PubMed Google Scholar
Ranad Shaheen
View author publications
You can also search for this author in PubMed Google Scholar
Fowzan S. Alkuraya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fowzan S. Alkuraya.

Supplementary information

Supplementary Figure S1

(PPT 573 kb)

Supplementary Table S1

(DOCX 34 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shamseldin, H., Maddirevula, S., Faqeih, E. et al. Increasing the sensitivity of clinical exome sequencing through improved filtration strategy. Genet Med 19, 593–598 (2017). https://doi.org/10.1038/gim.2016.155

Download citation

Received: 30 March 2016
Accepted: 24 August 2016
Published: 06 October 2016
Issue Date: May 2017
DOI: https://doi.org/10.1038/gim.2016.155

Keywords

This article is cited by

IHH enhancer variant within neighboring NHEJ1 intron causes microphthalmia anophthalmia and coloboma
- Ohad Wormser
- Yonatan Perez
- Ohad S. Birk
npj Genomic Medicine (2023)
Re-evaluation and re-analysis of 152 research exomes five years after the initial report reveals clinically relevant changes in 18%
- Tobias Bartolomaeus
- Julia Hentschel
- Bernt Popp
European Journal of Human Genetics (2023)
Beyond the exome: utility of long-read whole genome sequencing in exome-negative autosomal recessive diseases
- Lama AlAbdi
- Hanan E. Shamseldin
- Fowzan S. Alkuraya
Genome Medicine (2023)
Diagnostic implications of pitfalls in causal variant identification based on 4577 molecularly characterized families
- Lama AlAbdi
- Sateesh Maddirevula
- Fowzan S. Alkuraya
Nature Communications (2023)
A heterozygous GRID2 mutation in autosomal dominant cerebellar ataxia
- Kishin Koh
- Haruo Shimazaki
- Yoshihisa Takiyama
Human Genome Variation (2022)