Introduction

Hearing loss is the most common sensory deficit in humans, affecting about 1 of every 500 children,1 with a genetic etiology suspected in more than 50% of individuals.2 About 70% of hereditary hearing-loss cases are nonsyndromic, resulting from pathogenic changes in one of over 100 implicated genes.3,4 Furthermore, some forms of syndromic hearing loss initially present as nonsyndromic sensorineural hearing loss (NSHL) or called NSHL mimics, which may delay a clinical diagnosis. As such, genetic testing has been critical in distinguishing between different forms of NSHL and its mimics. The current genetic diagnostic rate for NSHL is still relatively low, ranging from 28 to 48% depending on the gene content assayed,5,6,7,8,9 suggesting that many hearing loss–causing genes are yet to be discovered. Clinical laboratories are continuously facing the challenge of keeping up with this rapid rate of gene discovery as they try to maximize the clinical sensitivity of hearing-loss genetic diagnostic tests with the constantly expanding list of hearing-loss genes.3

As expected for a disease with high genetic heterogeneity, the majority of clinically offered NSHL genetic tests are based on targeted next-generation sequencing (NGS) panels.5 These panels use exon enrichment followed by NGS to thoroughly interrogate a predefined set of genes known to be associated with NSHL. The NGS panels are highly efficient and present advantages including (i) high sequence coverage of tested regions, (ii) relatively low sequencing cost, and (iii) streamlined data analysis, as only variants in clinically relevant genes are classified. However, the diagnosis is possible only if the disease-causing genes are covered in the predefined design of the panel. Furthermore, updating the gene content of targeted NGS panels requires labor intensive and time-consuming revalidation efforts, rendering this approach to be relatively static, and causing laboratories to lag behind the pace of new gene discovery. An alternative method to targeted NGS panels is exome sequencing (ES), which sequences the majority of coding exons in the genome. In comparison with NGS panels, ES has several drawbacks, which makes it less efficient for diagnosis of certain diseases. These include higher sequencing cost due to increased gene capture size with overall lower sequence depth, bigger analytical burden caused by increased number of captured variants, and reporting challenges occurring with the potential discovery of both incidental and secondary findings.10 However, exomes have been increasingly used for diagnosis of genetic disorders and have been through technical improvements with optimized coverage over genes considered to be medically relevant. Therefore, for conditions such as hearing loss with high genetic heterogeneity and continuously expanding repertoire of disease-causing genes, a diagnostic test that combines the strengths of both methods, i.e., efficiency of NGS panels and dynamic gene content of ES, would be ideal.

Here we present the development of a tier-based, highly comprehensive, cost- and time-efficient genetic test, named AUDIOME, for the diagnosis of NSHL and its mimics. This test consists of ES and targeted analysis of a subset of genes. To evaluate its technical and operational feasibility, we compared exon-level coverage for genes of interest attained by ES and a targeted NGS panel approach. We investigated the cause of insufficient coverage in both technologies and implemented the appropriate sequence fill-in strategies for poor coverage regions by ES. The clinical utility of this test was affirmed by the initial analysis of 33 prospective cases.

Materials and methods

Curation of AUDIOME target genes

An initial list of 387 genes potentially implicated in hereditary hearing loss was compiled from literature, public disease databases, and in collaboration with clinicians.4,11,12,13 To evaluate the clinical validity of these genes, a thorough evidence-based curation process similar to recently reported ones was performed (Figure 1a).14,15,16 Briefly, each gene was assessed for the evidence of causality by curating literature for both genetic (segregation, case-control data, variant effects, etc.) and experimental (functional, in vitro, in vivo, etc.) data. Genes were also annotated for associated inheritance and potential disease mechanism. Based on the curated information, an evidence level for each gene was assigned according to the tiers described in Figure 1b. Only genes with moderate or strong evidence for causation of pediatric NSHL with congenital to young adult onset were included in AUDIOME.

Figure 1: Evidence-based gene curation strategy.
figure 1

(a) Assessment of gene–disease association involved qualitative and quantitative evaluation of both genetic and functional evidence. Expert input was requested when necessary. Only genes with strong or moderate evidence for association with nonsyndromic hearing loss (NSHL) and/or syndromes mimicking NSHL were included in the AUDIOME panel. Genes with insufficient evidence for hearing-loss association were put aside as “candidate genes” and will be reviewed on regular basis. HL, hearing loss. (b) Evidence levels with detailed evaluation criteria and number of genes. Numbers in parentheses indicate newly included genes since the initial launch of the test. Full gene list in the AUDIOME panel is available online (http://www.chop.edu/centers-programs/division-genomic-diagnostics).

Validation and patient cohorts

To assess the performance of AUDIOME, we generated (i) ES data from HapMap NA12878 sample, (ii) ES data from de-identified retrospective clinical cases with hearing loss as the major clinical indication (N = 32), and (iii) sequence data from retrospective NSHL cases analyzed by a targeted NGS panel (N = 15). We also included de-identified findings from the first 33 prospective cases that underwent AUDIOME testing.

Exome and targeted NGS sequencing

ES was performed following standard manufacturer protocols. Briefly, exome-wide capture of the coding regions and splice sites was performed using the SureSelectXT Clinical Research Exome kit (Agilent, Santa Clara, CA), followed by cluster generation using the TruSeq Rapid Cluster Kits (Illumina, San Diego, CA), and then sequencing on the Illumina HiSeq 2500 platforms with 2  ×  100 bp paired-end reads and average sequencing depth of 100 ×.

For the NGS panel, targeted DNA enrichment was done using the SureSelectQXT kit, followed by 2  ×  150 bp paired-end sequencing on the Illumina MiSeq platform at an average depth of 900 ×.

Coverage analysis of the regions of interest

The genomic coordinates of AUDIOME genes were downloaded from the University of California–Santa Cruz Genome Browser (GRCh37). The regions of interest (ROIs) were created by adding 15 bp of intronic sequence to both ends of the coding exons from all RefSeq transcripts resulting in a total of 366,523 bps. The coverage statistics of each ROI was calculated using the BamStats04 tool. The average, minimum, and maximum coverage and the percentage of fully covered regions were calculated using Genome Analysis Toolkit DepthOfCoverage version 3.6. A base pair with at least 15 sequencing reads was considered “fully covered.”17 Regions <15 × from ES were investigated to determine the potential cause of inadequate coverage, including baits availability, alternative splicing, GC content, as well as reads mapping quality, which could be influenced by the presence of highly homologous genomic sequences such as pseudogenes.

Variant filtration and classification

Variant call format (VCF) files generated from MiSeq/HiSeq FASTQ data were processed using an in-house bioinformatics pipeline. The pipeline was built on Novoalign version 3, Genome Analysis Toolkit Haplotype Caller version 3.6. Variant filtration and annotation was done using Cartagenia NGS Bench software v4.2.2 or an equivalent in-house software based on SnpEFF 4.2.18 Only variants in the AUDIOME ROIs were retained. The filtration process was designed to capture known pathogenic variants, as well as rare variants with potential effect on protein function or gene splicing. Rare variants include (i) variants with minor allele frequency (MAF) <1% if previously reported in the Human Gene Mutation Database, (ii) novel variants with MAF <0.1% in genes causing dominant forms of hearing loss, and (iii) novel variants with MAF <0.5% in genes causing recessive forms of hearing loss. MAF values were taken from the overall population and most of the subpopulations (excluding Latino and Other) in the Exome Aggregation Consortium. MAF cutoffs were estimated using Hardy–Weinberg equilibrium–based calculations, and hearing-loss disease attributes such as disease prevalence1 and mode of inheritance.19,20,21

The clinical significance (pathogenic, likely pathogenic, uncertain significance, likely benign, or benign) of the retained variants was manually assessed following variant classification standards based on the American College of Medical Genetics and Genomics guideline.22

Sanger sequencing, long-range polymerase chain reaction, and targeted NGS analysis

Sanger sequencing was used to analyze the genes included in tier 1, to fill in regions with insufficient coverage, to analyze deep intronic variants as needed, and to confirm reported variants if needed using procedures described before.23

Long-range polymerase chain reaction (PCR) (LR-PCR) followed by NGS was designed to detect sequence variants in the STRC gene, which has significant homologous sequences in its pseudogene. Primers and conditions used were described elsewhere.24

Targeted single-nucleotide polymorphism array

Chromosomal single-nucleotide polymorphism (SNP) array analysis was carried out using the CytoSNP850Kv1.1 BeadChip (Illumina, San Diego, CA). Array data was filtered to include the AUDIOME ROIs plus 50 kb upstream and downstream of the gene start and end positions respectively, with a backbone resolution averaging one probe every 60 kb. The filtered data was analyzed using PennCNV.25 Reportable small copy-number variants covered by <5 probes were confirmed by an alternative assay.

Results

Curated gene list

A total of 387 genes were curated for clinical validity (Figure 1). The curation process removed genes with limited evidence due to only a few variants reported with weak or no segregation or experimental data to support their pathogenicity (N = 51). These genes will be reviewed periodically as new information emerges. Genes that either lack evidence supporting a role in hearing loss (N = 75) or are associated with other syndromes with evident syndromic features in the neonatal period (N = 144) were also excluded. The initial list used for validation and testing of the first 14 prospective cases included 117 curated genes with at least moderate evidence of association with NSHL. After full curation on a biannual basis, 4 genes were added to the gene list. Of the current 121 genes, 57 are known to cause NSHL, 34 are involved with syndromic hearing loss that may initially mimic NSHL, and the remaining 30 genes have been implicated in both syndromic and nonsyndromic forms of hearing loss. Targeting the “apparently” nonsyndromic genes is critical because hearing loss is often the main presenting feature, and molecular testing can identify unexpected, yet possibly manageable, later-onset syndromic features.

Tier-based testing strategy

To enhance the delivery of the genetic results in a timely and cost-effective manner, the AUDIOME test was designed in a hierarchical manner taking into consideration of different contribution of certain genes and pathogenic variants to the overall genetic causes of hearing loss. Therefore, the test was divided into two tiers (Figure 2). Tier 1 covers the analysis of four genomic regions: DFNB1 and STRC loci, which are the most common genetic contributors to NSHL, and two mitochondrial genes, MT-RNR1 and MT-TS1, as they would not be captured in ES. Sanger sequencing is performed to identify small sequence variations in GJB2 and two mitochondrial genes, while SNP array is used to detect large deletions in the DFNB1 and STRC loci. Tier 1 testing is expected to result in genetic diagnosis up to 50% of recessive hearing-loss cases or in ~20% of all NSHL individuals.20 Identification of biallelic pathogenic alterations in the DFNB1 or STRC loci, or the presence of pathogenic mitochondrial variants matching patients’ clinical indication and medical history, would conclude the testing and a report would be issued. Individuals with such positive findings would only be charged for the tier 1 associated costs. If tier 1 testing is not diagnostic, tier 2 testing is initiated, which extends the examination of sequence variants and copy-number variants to the rest AUDIOME genes using ES and SNP array technologies. Tier 2 also includes Sanger sequencing of insufficiently covered regions from ES (Figure 2). In addition, sequence variants in STRC, which has high homology to its nonfunctional pseudogene, are identified through specific LR-PCR amplification of STRC followed by targeted NGS.

Figure 2: Hierarchical testing strategy of AUDIOME test.
figure 2

Tier 1 is based on Sanger sequencing and targeted copy-number analysis. If tier 1 results in diagnostic findings, final report is issued. Nondiagnostic samples advance to tier 2, which assesses the remaining AUDIOME genes by exome sequencing (ES) and Illumina single-nucleotide polymorphism array analysis. Final report is issued after tier 2 testing regardless of the diagnostic outcome.

Gene coverage

The average, minimum, and maximum coverage, and percentage of bases with <15 × read depth were evaluated for all AUDIOME ROIs in 17 retrospective clinical ES samples. At the average sequencing depth of 100 ×, the vast majority (98.24%) of the ROI bases were covered at ≥15 ×. Only 15 genes showed areas of inadequate coverage, encompassing 1.76% of all ROI bases (Supplementary Table S1 online). Further analysis revealed that the presence of highly homologous sequences could explain the poor coverage of multiple exons of three genes (STRC, OTOA, and ESPN), while high GC content (>60%), and consequently inefficient hybridization and/or amplification, could be responsible for poor coverage in 10 other genes. No specific reasons could be identified in the 2 remaining genes. For the nonhomologous regions, the length of the poorly covered single exons stretches anywhere from 10 to 244 bp, indicating that these regions are amenable to Sanger sequencing from a single PCR amplicon. An up-front fill-in strategy was implemented for 4 of these 12 single exons because they consistently had low coverage among all tested cases. For the other 9 regions, Sanger fill-in was instituted on a case-by-case basis depending on the data quality/coverage of individual samples. Based on our observation, most of the low coverage regions localized to a few to several dozen bases at the end of exons where read depth was just below the 15 × cutoff (data not shown).

We have established a standardized workflow to determine when a low coverage region including a known pathogenic deep intronic variant needs to be filled in (Figure 3a). Regions with <10 × coverage in genes associated with autosomal dominant or X-linked disease were generally filled in. On the other hand, a poor coverage region (<10 ×) in an autosomal recessive gene was filled in if one potential disease-causing variant in this gene was detected, especially if there is significant phenotypic overlap between the gene-related condition and the patient’s clinical indication. Fill-in for other autosomal recessive genes is contingent on phenotypic overlap and the presence of known disease-causing variants within these regions in the disease databases. For the regions with coverage between 10 and 15 ×, the sequencing data is manually reviewed first. If there is strand bias, additional fill-in may be needed. The insufficient coverage may be also caused by a deletion, which is generally required to be confirmed with another assay.

Figure 3: Sequencing fill-in for insufficiently covered regions of interest (ROIs).
figure 3

(a) General fill-in strategies. This applies to lowly covered regions including those with pathogenic variants in deep intronic regions. This strategy does not apply to regions with homologous sequences (STRC, OTOA, and ESPN). Disease-causing variants should be previously reported in Human Gene Mutation Database or ClinVar that also meet our reporting criteria. AD, autosomal dominant; AR, autosomal recessive; XL, X-linked. *In our laboratory, loci showing a recurrent read depth <10 × across 10 consecutive bases were considered for up-front fill-in. (b) Comparison of coverage between exome sequencing (ES) and targeted panels. Sufficiently covered bases is defined as >15 × for exome sequencing across 17 samples and for targeted next-generation sequencing (NGS) panel across 15 samples.

To assess whether the insufficient coverage observed in ES could be improved by increasing the sequencing depth such as reducing the number of samples in batch, we analyzed the coverage for 10 of the inadequately covered genes that were also captured on the targeted NSHL NGS panel previously developed in our laboratory.26 Despite the fact that average sequencing depth of the targeted panel was approximately nine times higher than that of the ES, it still contained inadequately covered regions (Figure 3b), mainly in the three aforementioned genes with highly homologous genomic sequences. This clearly suggests that our ES approach does not result in an increased number of low coverage regions, and shows that the above technically challenging regions have inherent sequence-based properties (homology or GC content) that cannot be significantly remedied with additional sequencing at higher depth.

Performance of the ES-based approach

To further assess the performance of this ES-based panel approach, we determined the analytical sensitivity and specificity for the detection of single-nucleotide variants (SNVs) and small insertions or deletions (indels <35 bp) within the AUDIOME ROIs. For this purpose, we used the HapMap NA12878 sample that has high quality sequence data available through the National Institute of Standards and Technology–Genome in a Bottle consortium (NIST-GIAB).27 Within the AUDIOME ROIs, 192 SNVs and 7 small indels were cataloged in NIST-GIAB. Using the same ES protocol and pipeline, we sequenced NA12878 in duplicate in four independent runs. All of the above NA12878 SNVs and indels were consistently called in all eight runs resulting in an analytical sensitivity of 100% (199/199) with no false negatives. Only six additional single-nucleotide variants were called with ~2 extra calls per run and were not confirmed by Sanger sequencing, thus deemed false positives. Notably, all the false positive variants were located in regions with low quality by depth scores (QD <13.3) and showed extreme strand bias in BAM files (Supplementary Table S2 online). Based on this, the analytical specificity was calculated to be over 99.99% for all runs.

Finally, 32 retrospective clinical exome cases with a total of 51 reported variants including variants of uncertain significance (VUSs) or above in the AUDIOME genes were used to further validate variant detection using the AUDIOME bioinformatics filtration pipeline. As expected, 51/51 (100%) variants in the AUDIOME genes were retained and correctly identified through our customized filtration algorithm (data not shown).

Diagnostic yield based on 33 consecutive NSHL cases

A total of 33 prospective cases with bilateral NSHL seen in our genetics of hearing loss clinic28 were tested using the clinically validated AUDIOME test (Table 1). A molecular diagnosis was revealed in 11 cases (33.3%), 7 of which had a positive finding in tier 1 including two cases with homozygous deletions in STRC detected by targeted SNP array. The other 4 cases were positive in tier 2, and pathogenic variants were identified in both nonsyndromic (LOXHD1 and MYO15A) and syndromic (USH2A and SOX10) genes. For one of these cases, Sanger fill-in was performed and detected a known deep intronic disease-causing variant in USH2A after a pathogenic variant in this gene was identified by ES. Eight of the 33 cases (24.2%) were likely positive, with either one VUS identified in a dominant gene, or one pathogenic and one VUS (or two VUS) in a recessive gene, and with good phenotypic match to the individuals’ clinical presentations; however, further familial study for phase and segregation are needed to help determine the clinical significance of these variants. Of note, three of these eight cases carry two variants in GJB2 that were identified in tier 1, but tier 2 was performed due to the conflicting interpretation of pathogenicity for variants (c.101T>C and c.109G>A) in ClinVar.29 They remained likely diagnostic as no additional changes were observed in tier 2.

Table 1 Testing results for the prospective bilateral sensorineural hearing loss cases

Tier 1 testing identified 7 positive cases, and if adding 3 of the likely positive cases with two GJB2 variants, the diagnostic yield of tier 1 testing was 30.3% (Figure 4). The diagnostic yield of tier 2 testing was 27.3% including 4 positive cases and 5 likely positive cases. The remaining 14 cases (42.2%) were either inconclusive due to partial phenotypic overlap (n = 2, 6%) or negative (n = 12, 36.4%) due to limited variant evidence and/or only a single variant identified in autosomal recessive genes. The number of Sanger fill-in varied from run to run, with an average of 6 fill-ins per case: 4 up-front and 0–5 case-specific including deep intronic fill-ins.

Figure 4: Diagnostic yield for the first 33 prospective cases tested by the AUDIOME panel.
figure 4

The tiered fashion can make the AUDIOME test more cost- and time-effective. Δ, difference in cost or time. ES, exome sequencing.

Discussion

With the improved ES data quality and plunging sequencing cost, ES-based gene panels are becoming an attractive diagnostic strategy for many genetically heterogeneous conditions due to their flexibility in updating the gene content, and ability to reflex to exome analysis and streamline laboratory operation.30,31,32,33 With this study, we present the development and implementation of this approach for genetic diagnosis of NSHL, which is among the most etiologically heterogeneous disorders.

To apply targeted analysis approach of NGS panels and reduce the time needed for variant analysis across the exome, we implemented a bioinformatics filtration strategy to capture variants in genes of interest only. For clinical diagnostic purposes, only genes with high clinical validity were included for this test. We conducted a thorough evidence-based gene curation and selected an initial set of 117 genes. In contrast to the targeted panels, we can expand this list as soon as a new NSHL-associated gene with strong clinical validity is identified by simply modifying our bioinformatics filtering process and ensuring sufficient gene coverage. The inclusion of a new gene would not require extensive wet-bench validation efforts.31 This flexibility in gene content results in two large advantages for both laboratories and patients: a single platform can be used for multiple panels with minimal validation efforts, and the data from one individual’s test can be constantly reanalyzed and updated over time as the gene list evolves. First and foremost, ES-based panels make diagnostic testing highly dynamic and can quickly incorporate new scientific discoveries, as opposed to targeted NGS panels that tend to be static. Within the first 6 months after launching the AUDIOME test, we have already expanded our gene list by four novel NSHL genes (CDC14A, CEP78, DMXL2, and PDZD7),34,35,36,37,38 attesting the highly dynamic genetic landscape of this disorder. Secondly, for nondiagnostic cases, one can readily reflex the analysis to other genes of potential interest, or to the entire exome for the identification of candidate disease-causing genes. This reflex can be achieved with a fraction of the cost for a new test because the sequence data is already generated and only reanalysis-associated costs would have to be incurred. Implementation of a relatively simple bioinformatics solution would enable clinical laboratories to carry out automatic reanalysis of previously negative or inconclusive cases on regular basis. While this approach has great potential, reanalysis is still a challenge for laboratories, especially regarding billing/insurance coverage, costs associated with bioinformatics storage and long-term compatibility, variant reclassification, and return of results.

The main concern associated with developing ES-based gene panels is sequencing depth, even with the continuing improvement of ES capture. However, our data demonstrated that the ES-based coverage of hearing-loss genes was highly comparable to the one attained with targeted NGS approaches. Furthermore, using the NIST reference sample, we show that the ES approach has very high sensitivity and specificity within the AUDOIME ROIs. While evaluation of ROI sequence coverage in an ES-based gene panel is a critical step in assessing its potential to serve as a good alternative to an NGS panel,39 we expect similarly adequate coverage for most other currently available ES kits as many have boosted the total numbers of baits used to capture coding regions in the human genome.

Our data showed that fill-in is needed for every case; however, we could minimize the number of fill-ins per case by prioritizing the regions for fill-in. Because the coverage depth of 15 × is considered sufficient to detect germ-line variants,17,39 we analyzed our ES data to assess the proportion of ROI bases <15 × and found only 1.76% of ROI bases that showed inadequate coverage. Four loci consistently showing low read depth <10 × in multiple ES cases were selected for up-front fill-in. The remaining regions were evaluated for fill-in necessity on a case-by-case basis using the criteria established in our laboratory, which can significantly reduce the amount of fill-ins while maintaining current standards for targeted disease panels. A majority of the genes with inadequate coverage impact only a single (usually the first) exon, with high GC content. For these regions, conventional PCR followed by Sanger sequencing were found to be sufficient. Three of the genes (STRC, ESPN, and OTOA) displayed multiple exons with inadequate read depth, which can be explained by the presence of a highly homologous sequence that lowered the mapping quality of the reads and led to exclusion of the low coverage regions from further analysis. Attempts to mitigate low coverage regions by increasing overall target sequencing depth would be effective for some GC-rich regions, but not for highly homologous sequences as these areas are difficult to confidently map by short-read NGS technologies.23 In fact, our assessment of read depth for 10 such genes included in the targeted NGS panel previously developed in our laboratory showed that additional sequence fill-in would still be required even with a significant (ninefold) increase in sequencing depth. Due to the complex genomic nature of the homologous regions, they need more sophisticated sequencing strategies such as LR-PCR amplification followed by NGS. So far, only testing for STRC has been developed in our laboratory, as this gene contributes up to 11.2% of NSHL cases.24

The clinical utility of this test was further confirmed by testing the first 33 cases with bilateral NSHL. Based on this analysis, examination of the AUDIOME genes resulted in identification of the underlying genetic etiology for NSHL, or its mimics, in 57.6% of the patients. While diagnostic rate varies among patient cohorts and gene panels used, our diagnostic yield is comparable to a recently reported large multiethnic cohort tested using a comprehensive NGS panel with 66 to 89 genes.7 Among our patients, changes in tier 1 loci, especially the DFNB1 locus, are still the most common cause of NSHL, supporting the use of a tier-based genetic testing for NSHL, which significantly reduces the cost and turnaround time as compared with the cost and time needed for more comprehensive sequencing approach of tier 2. The remaining 42.4% of patients with no clear molecular findings highlight the advantage of an ES-based test to offer quick reflexing to exome analysis at no additional sequencing cost. Furthermore, reflexing to exome analysis also provides the ability to perform research-based candidate gene search, which is especially valuable for highly heterogeneous but not yet fully understood genetic conditions such as hearing loss, thus providing patients with continuous comprehensive care.

In conclusion, our study demonstrates that tiered ES-based gene panel approach, complemented with sequencing fill-in of a limited number of regions is a sensitive and efficient diagnostic strategy that can be applied to other highly heterogeneous genetic conditions such as epilepsy.