Introduction

Next-generation sequencing (NGS) has been widely applied to research studies and clinical diagnosis.1,2,3,4,5 This powerful technology has the ability to produce millions of sequence reads in a single run. It provides a fast and inexpensive approach for sequence analysis of the whole genome, the whole exome, targeted panels of genes, and even single-gene NGS tests. After mapping the enormous amount of sequence reads from an NGS run to a reference sequence, the initial variant calls are generated. Indubitably, some of these variants are false calls. Thus, bioinformatics analysis and filtering are essential to facilitate annotation and review of these massive NGS data sets.6

Sequence reads produced by NGS technologies are much shorter than those generated by traditional Sanger sequencing. The short sequence reads make accurate alignment more challenging, especially for regions with large insertion/deletion or with repetitive elements. Some NGS reads can correctly map to a targeted region, but analytical software cannot generate accurate variant calls. This is particularly true for indel changes. A great number of computational tools dedicated to almost all aspects of NGS data analyses have been developed during the past few years.7,8,9 However, there are no “gold standard” alignment tools for clinical application. Although stringent filtering can reduce the number of false sequencing calls, this process also filters misaligned reads that may contain useful information suggesting sequence aberrations or gene rearrangements. In this report, we present examples where a definitive molecular diagnosis was only achieved after thorough review considering both analysis of the NGS raw data with relaxed filtering and the NGS coverage depth.

Materials and Methods

Samples from all subjects in this study were submitted to the Baylor Miraca Genetics Laboratories for NGS-based panel analysis. The analyses were performed according to the institutional review board–approved protocols at Baylor College of Medicine. The sample preparation followed the manufacturer’s recommendation. SeqCap EZ solution-based capture was used to enrich a group of target genes followed by sequencing on the Illumina HiSeq2000 (Illumina, San Diego, CA) with 100 cycle single-end reads. All coding exons and 20 bp of the flanking intronic regions of the target genes were sequenced to an average depth of ~1000×. Raw data in base call files (.bcl format) were converted to qseq files before demultiplexing with CASAVA v1.7 software (Illumina). Demultiplexed data were processed further by NextGENe software for alignment (SoftGenetics, State College, PA). After removing low-quality reads, the reads with ≥80% match sequences were aligned to a reference, and nucleotide changes with ≥5% variant calls were scored.2 Assessment of the NGS data generated using the filtering parameters described above identified several sequence aberrations that included large indels and genomic rearrangements. A copy-number variation detection algorithm developed in house was used to evaluate the presence of large heterozygous exonic deletions.10 HGVS guidelines were followed for mutation nomenclature (http://www.hgvs.org/mutnomen/). Nucleotide numbering uses c.1 as the A nucleotide of the ATG translation initiation codon in the reference cDNA sequence.

Results

Simple small deletions are relatively easy to visualize by commercially available sequence-analysis software

Given the read length of 100 bp in targeted sequences, deletions less than one-third of the read length can usually be precisely annotated by NextGENe viewer. Patient 1 was a 1-year-old boy with hepatomegaly and hypertriglyceridemia ( Table 1 ). Two heterozygous mutations, c.3843_3853del11 (p.S1282Nfs*6) and c.2681+1G>A, were detected in the AGL gene (NM_000642.2). The novel 11 nucleotides, deletion (delGTCTGCTGTGG), c.3843_3853del, can be correctly identified and annotated by NGS analysis software ( Figure 1 ). It is predicted to result in a premature stop codon (p.S1282Nfs*6), and is therefore categorized as a pathogenic variant.11 The c.2681+1G>A variant has been reported in patients with glycogen storage disease type III.12,13 This change is predicted to disrupt the invariant splicing donor site of intron 21 in the AGL gene. Mutations in the AGL gene have been reported to cause autosomal recessive glycogen storage disease type III (OMIM 610860).

Table 1 Summary of patients and detected mutations
Figure 1
figure 1

Small deletions can be easily detected by capture-based deep sequencing. (a,b) A heterozygous splicing mutation c.2681+1G>A and a heterozygous 11-bp deletion c.3843_3853del (p.S1282Nfs*6) were easily visualized in the AGL gene of the patient 1 by NextGENe software. (c) The Sanger sequence analysis confirmed the 11-bp deletion but required careful manual checking to correctly identify the heterozygous deletion.

It is worth noting that in Sanger sequencing chromatograms, heterozygous deletions/insertions typically present as overlapping normal and mutant sequence traces. The Sanger sequence-analysis software usually cannot reliably annotate deletion/insertion mutations and requires manual checking and alignment of the specific region ( Figure 1c ). Although NGS still requires a manual check for correct mutation nomenclature, it is much easier to annotate heterozygous deletion/insertion via NGS software. However, when the size of deletion/insertion is close to or greater than the NGS read length, the NGS reads may not be able to align properly to the reference sequence.

Misaligned NGS reads may indicate the presence of a genomic rearrangement

Patient 2 was a 15-year-old boy with a clinical diagnosis of glycogen storage disease type III. NGS analysis easily identified a single heterozygous pathogenic variant, c.4260-12A>G, in the AGL gene. This is a previously reported splice site mutation that creates a new acceptor site in intron 32.14,15 Using our “loose” filtering, a number of mismatched reads were also observed in the middle of exon 26 and the unmatched sequence was mapped to the deep intronic region of intron 26 ( Figure 2a1 ). Subsequent targeted array comparative genomic hybridization analysis ( Figure 2a2 ) and polymerase chain reaction (PCR) across the putative breakpoints confirmed a deletion of 1,525 bp, involving half of exon 26 and intron 26 (c.3481_3588+1417del1525; Figure 2a3 ).

Figure 2
figure 2

Identification of large deletions and complicated genomic rearrangements by NGS analysis. (a) A heterozygous c.3481_3588+1417del1525 deletion was detected in the AGL gene. (a1) Misaligned reads were observed in exon 26 of the AGL gene. (a2) Targeted PCR analysis using primers flanking 5′ of exon 26 and 3′ of exon 27 confirmed a 1,525-bp deletion. (a3) This heterozygous loss of exon 26 was confirmed by array comparative genomic hybridization. (b) A heterozygous c.4259ins? (p.D1420fs) mutation was detected in the AGL gene. (b1) Examination of the NGS data showed a breakpoint at the c.4259 position of AGL exon 32, with additional mismatched sequence mapped to a genetic segment on chromosome 6 (6p12.1). (b2) PCR amplification using a forward primer located in AGL intron 31 and a reverse primer (2R) anchored at 6p12.1 detected a 214-bp product in this patient’s blood. (b3) Sanger sequencing of the chimeric PCR product showed the breakpoint sequence that matched to the NGS results. (c) Compound heterozygous novel variants in the ARSA gene were uncovered by NGS analysis. (c1,c3) The missense change, c.47G>A (p.G16D), was inherited from the father and (c2,c3) the 102-bp deletion, c.230_331del (p.A77_A110del), was derived from the mother. The presence of misaligned sequences at the deletion junctions facilitates the deduction of the actual sequence change. NGS, next-generation sequencing; PCR, polymerase chain reaction.

Similarly misaligned reads were also observed at the exon/intron 32 boundary of the AGL gene in patient 3 ( Figure 2b1 ). This patient was a 6-year-old boy with hypoglycemia and hepatomegaly, suggesting a glycogen storage disease. The cluster of misaligned reads indicates the presence of a sequence aberration with a breakpoint located at the c.4259 position of exon 32. More relaxed sequence alignment that allows mismatch of up to 50% reveals an 87-bp sequence mapped to a region on chromosome 6 (6p12.1). A forward primer located in the AGL gene in chromosome 1 and a reverse primer located in chromosome 6 ( Figure 2b2 ) were used to amplify the putative junction region. Sanger sequencing of this 214-bp fragment confirmed the presence of a sequence that is identical with a segment in chromosome 6 (6p12.1; Figure 2b3 ). Collectively, this suggests a genomic rearrangement involving the AGL gene on chromosome 1 (1p21.2) and chromosome 6 (6p12.1). Although sequence analysis cannot determine the nature or size of this genomic rearrangement, this change is predicted to result in a reading frame shift (p.D1420fs) within this individual’s AGL gene on chromosome 1 (1p21.2). Chromosomal or fluorescence in situ hybridization analysis is recommended in such situations to establish more specifics of the rearrangement.

Patient 4 was a 5-year-old boy with progressive loss of ability to walk and intellectual development decline. Brain magnetic resonance imaging also showed increased white matter signal. Screening of lysosomal storage disease revealed the arylsulfatase A (ARSA) enzyme deficiency (<5% of normal control). Previous NGS sequencing analysis performed elsewhere identified only one heterozygous missense novel variant, c.47G>A (p.G16D), in the ARSA gene (NM_000487.5) of this affected child. Arylsulfatase A deficiency causes autosomal recessive metachromatic leukodystrophy (OMIM 607574), consistent with the patient’s clinical presentation. However, lack of a second deleterious allele or any known pseudodeficiency variant left the molecular diagnosis in question. This sample was sent to our laboratory for further evaluation by capture-based NGS analysis to search for the second mutant allele. In addition to the previously detected c.47G>A (p.G16D) variant ( Figure 2c1 ), two clusters of misaligned reads were observed in exon 2 of the ARSA gene, suggesting the presence of a large genetic rearrangement ( Figure 2c2 ). Further PCR and Sanger sequencing analysis confirmed that there is a heterozygous in-frame deletion of 102 bp, c.230_331del (p.A77_A110del; Table 1 ). The failure to detect the 102-bp deletion in the original NGS sequence analysis was likely due to the failure of the capture/sequence procedures or it was filtered by the bioinformatics analytical pipeline utilized.

Identification of insertions and duplications is challenging

Due to the short sequence reads produced by NGS, insertions, duplications, and deletions involving large segments are difficult to accurately align. When the size of the indel is greater than half of the sequence read length, the NGS reads containing the mutant sequence may fail to align properly. This is particularly true for duplications and insertions. Patient 5 was a 37-year-old man with developmental delay, ragged red fiber, and a clinical diagnosis of Leigh disease. The dual genome NGS panel analysis, which includes the whole mitochondrial genome and 164 nuclear genes involved in mitochondrial and metabolic disorders, detected a 27-bp hemizygous novel duplication in the PDHA1 gene (NM_000284.3), c.978_1004dup27 (dupCAGCAATCTTGCCAGTGTGGAAGAACT, p.S327_L335dup; Figure 3a ). Mutations in the X-linked PDHA1 gene cause pyruvate dehydrogenase E1-alpha deficiency and Leigh disease, resulting in a wide range of clinical phenotypes (OMIM 300502). Although the in-frame c.978_1004dup27 mutation in PDHA1 is a novel finding, various duplications in this region have been previously reported in patients with PDH deficiency.16,17 In addition, there appears to be a higher incidence of normal or borderline intellectual disability in individuals who have in-frame insertions or deletions compared to those that are with out-of-frame indels.18 Assay of pyruvate dehydrogenase complex activity can further correlate the clinical/biochemical effects of this in-frame duplication in the PDHA1 gene.

Figure 3
figure 3

Large duplications detected by sequence aberrations seen in NGS reads. (a) A hemizygous duplication, c.978_1004dup27 in the X-linked PDHA1 gene, was detected by mismatched sequence reads and was subsequently confirmed by Sanger sequencing. (b) A heterozygous duplication, c.927_954dup28 in the FLCN gene, was deduced based on the NGS misaligned reads. NGS, next-generation sequencing.

Patient 6 was a 56-year-old man with multiple fibrofolliculomas. An NGS-based single-gene test detected a heterozygous duplication c.927_954dup28 (p.G319Sfs*80) in the FLCN gene (NM_144997.5; Figure 3b ). Mutations in the FLCN gene are the cause of autosomal dominant Birt-Hogg-Dube syndrome, which is characterized by hair follicle hamartomas, fibrofolliculomas, kidney tumors, and spontaneous pneumothorax (OMIM 135150). The same duplication detected in this patient has been previously reported in patients with Birt-Hogg-Dube syndrome (reported as c.1378_1405dup).19

The in-frame 27-bp duplication in PDHA1 and the 28-bp frame-shift duplication in FLCN have been confirmed by Sanger sequencing. In both cases, the NextGENe viewer showed a few misaligned reads that could be easily filtered out due to too many mismatches or otherwise overlooked if the raw sequence data were not carefully examined. Sequence similarity limits the direct illustration of duplication by NGS software. However, the presence of mismatch sequences at the putative breakpoints led us to further evaluate using Sanger sequencing and confirmed the mutations successfully.

Comprehensive analysis of exonic deletions is feasible based on the deep NGS data

Capture-based enrichment/NGS can easily detect homozygous or hemizygous exonic deletions due to the absence of sequence reads in deleted regions (patients 7 and 8 in Table 1 ).2,20 Patient 7 was a 16-year-old girl with exercise intolerance and muscle weakness. The capture-based NGS showed no sequence reads on exons 4 to 5 of the GYS1 gene (NM_002103.4), suggesting a homozygous deletion ( Figure 4a1 ). Sanger sequence analysis of long-range PCR products revealed a 2,975-bp deletion, c.493-342_c.824-566del2975, involving exons 4 and 5 of the GYS1 gene ( Figure 4a2 ). Defects in the GYS1 gene can cause the autosomal recessive muscle form of glycogen storage disease type 0B (OMIM 138570).

Figure 4
figure 4

Detection of exonic deletions by comprehensive CNV analysis utilizing NGS coverage depth. (a) Lack of NGS coverage involving exons 4 and 5 of the GYS1 gene identified a homozygous deletion (a1). PCR analysis with primers located in intron 3 and intron 5 of GYS1 confirmed this deletion (a2, a3). (b) A homozygous exon 18 deletion in the LPIN1 gene was detected by CNV analysis. (c,d) Reduced NGS sequence reads suggest heterozygous exonic deletions. (c) A heterozygous exon 18 deletion in the GAA gene and (d1) a heterozygous deletion of exons 8 and 9 in the NARS2 gene were detected. Targeted PCR or (d2) array comparative genomic hybridization was used to confirm the heterozygous exonic deletion. CNV, copy-number variant; NGS, next-generation sequencing; PCR, polymerase chain reaction.

Another example of homozygous exonic deletion is patient 8, who was a 1-year-old girl with hypotonia, feeding problems, and acute rhabdomyolysis. The copy-number analysis of the NGS result revealed no coverage for exon 18 of the LPIN1 gene, indicating homozygous deletion ( Figure 4b ). Deletion of exon 18 of the LPIN1 gene has been previously reported in multiple patients with severe rhabdomyolysis and myoglobinuria.21

Regarding for heterozygous exonic deletions, the coverage depth of the deleted exons is reduced to approximately half of the normalized coverage depth of undeleted exons.10 Figure 4c illustrates a heterozygous exonic deletion, c.2481+109_c.2646+38del538, in the GAA gene (NM_000152, OMIM 606800) (patient 9 in Table 1 ). This 538 bp deletion has breakpoints in intron 17 and intron 18, which removes the entire exon 18 (165 bp). The exon 18 deletion has been reported in patients with Pompe disease.22,23 This patient also harbors a heterozygous missense mutation, c.1979G>A (p.R660H), which was found to be in trans with the exon 18 deletion by parental testing.24,25 Because this patient is a compound heterozygote with a missense mutation and an exonic deletion, this result confirms the diagnosis of Pompe disease.

Patient 10 was a 2-month-old girl with intrauterine growth restriction, perinatal insult, gastrointestinal reflex, anemia, lactic acidosis, ketosis, hyperglycemia, and abnormal magnetic resonance imaging who was suspected of having a multisystem mitochondrial disorder. NGS analysis of the Mitome200 panel revealed a heterozygous novel missense variant, c.641C>T (p.P214L), and reduced coverage involving exons 8 and 9 of the NARS2 gene, suggesting a deletion ( Figure 4d1 ). Subsequent targeted array comparative genomic hybridization analysis confirmed a heterozygous deletion of exons 8 and 9 ( Figure 4d2 ). The proline at amino acid position 214 of the NARS2 protein is highly conserved during evolution. Although not validated for clinical use, the computer-based algorithms, SIFT and PolyPhen-2, predict the p.P214L variant to be deleterious. The NARS2 gene encodes the mitochondrial asparaginyl-tRNA synthase. Although no particular disease or phenotype has been reported to be associated with NARS2 gene defects, a growing number of syndromes have been reported to be associated with dysfunction of mitochondrial translational machinery and with mutations affecting mitochondrial aminoacyl tRNA synthetase.26

Discussion

Capture-based enrichment followed by deep sequencing allows the detection of a wide spectrum of mutations. Unbiased enrichment of target sequences using probe hybridization followed by deep sequencing has the advantage of avoiding allele drop out due to SNPs in PCR primers or secondary DNA structures. This technology not only can accurately detect single nucleotide substitutions and small insertion/deletions but also has the ability to identify exonic copy-number changes and large genomic rearrangements. This report underscores the importance of critical review of the NGS raw data and careful examination of all suspicious variant calls. It should be remembered that using stringent analytical parameters may filter out important mismatched sequences and thus result in false-negative results. Examination of misaligned NGS reads under “loose” filtering and alignment conditions may reveal insights into hidden molecular aberrations. This is particularly important in an autosomal recessive condition when one mutation has been identified and a second mutant allele has not yet been detected.

The relaxed filtering allows an 80% (or even lower) match in alignment to reference sequences. By doing so, the mismatched reads at breakpoints can be captured for further analysis. Allele frequencies ≥5% are scored to detect low-level mosaicisms (unpublished data). For true heterozygotes, the allele ratio is within a tight range of 45–55%. Although the relaxed filtering increases the number of false variant calls, most of these false calls are due to specific sequence structure in a particular gene or instrumental error. These false calls are readily identifiable and are almost always recurrent. In addition, most false calls tend to have extremely unbalanced forward and reverse NGS sequence reads. The majority of these false calls can be filtered through an internal database of collective clinical experience. After the removal of the recurrent and unbalanced false calls, the remaining suspicious calls (in less than 5% of cases) are worth further investigation for the presence of large deletion/insertion or mosaicism.

PCR-based sequencing methods can detect point mutations but have limitations in the detection of heterozygous exonic deletions. Targeted array comparative genomic hybridization or multiplex ligation-dependent probe amplification is commonly used for copy-number changes involving one or more exons.27,28,29 However, these methods cannot detect point mutations. This report demonstrates the power of target capture deep NGS methods in the simultaneous detection of point mutations, exonic deletions, and genomic rearrangement. A reduction in coverage depth of approximately half for specific exons suggests heterozygous exonic deletions. Therefore, target capture/NGS can provide definitive molecular diagnosis in one comprehensive assay by simultaneous identification of a compound heterozygous point mutation and an intragenic exonic deletion. Furthermore, unbiased capture allows the determination of the genomic rearrangements when the junction points reside in the targeted regions. The ability to determine the breakpoint sequences in chromosomes 1 and 6 (patient 3 in Table 1 ) suggests that capture/NGS may be a promising tool for the detection of balanced chromosomal translocations in addition to point mutations and copy-number variants.

Disclosure

The Baylor Miraca Genetic Laboratories (BMGL) offers a broad spectrum of fee-based genetic testing, including various NGS panels. The Department of Molecular and Human Genetics at Baylor College of Medicine derives revenue from the genetic testing offered in the BMGL.