Introduction

Chromosomal microarray testing is a powerful tool for detecting genomic copy number aberrations. Combinatorial use of fluorescent dyes and oligonucleotide probes has enabled the detection of invisible small chromosomal deletions or duplications. Pathogenic copy number variations (CNVs) were detected in 17% of patients with developmental delays of unknown etiology, using chromosomal microarray testing of the whole human genome as a clinical diagnostic tool.1 In 2010, this chromosomal microarray testing was recommended as a first-tier diagnostic tool for patients with developmental delays of unknown etiology.2 Currently, this method is widely used to detect disease-causing CNVs worldwide.

Next-generation sequencing (NGS) is another prevalent method that can be used to obtain a large number of short-read nucleotide sequences with high power. NGS can be used as a clinical diagnostic tool in patients suspected of having single gene disorders. Thus, whole-exome sequencing (WES) has become a standard application for the detection of gene mutations responsible for human disease, especially single-nucleotide variations (SNVs) and small insertions/deletions (indels).3

Currently, researchers are unable to reach a consensus on the most effective clinical diagnostic tool for the first-tier examination of patients with developmental delays of unknown etiology. Efforts are being made to use WES data to detect CNVs to overcome this dilemma, and some studies have suggested the predominant use of an exome-first approach.49 However, the possible use of lower density exome data extracted by specifically targeting only disease-related genes for the screening of CNVs needs to be confirmed.

In this study, clinical exome sequencing was performed for 61 patients with developmental delays of unknown etiology and 24 possible disease-causing SNVs were detected; some of the novel SNVs were reported elsewhere.1014 The remaining samples, which showed no possible candidate variants, might contain pathogenic CNVs. Thus, these samples were analyzed using NGS data extracted with a targeted re-sequencing approach. Here, we will discuss the possible use of NGS data for the detection of pathogenic CNVs as a NGS-first approach.

Material and methods

Samples and grouping

This study was performed in accordance with the Helsinki declaration. In addition, the requisite permissions were obtained from the ethical committee of Tokyo Women’s Medical University.

A total of 61 patients with developmental delays of unknown etiology were enrolled in this study. All patients were clinically evaluated by neuropediatricians at the institutions. All patients were subjected to conventional G-banding prior to this study. Written informed consent was obtained from the families of all patients after comprehensive genetic counseling regarding the appropriate methods of dealing with genetic information and possible incidental findings. Blood samples were collected from all patients and their parents. Samples from 34/61 patients were previously analyzed by chromosomal microarray testing, which did not identify definite disease-causing CNVs. Therefore, samples from these patients were analyzed by NGS and enrolled in the preliminary study (designated the microarray-first approach) to validate the eXome Hidden Markov Model (XHMM) system constructed in this study. Thirty samples that did not show CNVs >200 kb were used for normalization. The remaining four patients possessed five CNVs that were over 200 kb in length; however, these CNVs were considered non-pathogenic and were used as positive controls. The remaining 27 patients were first analyzed by NGS and then subsequently enrolled in a non-biased and blinded study (designated the NGS-first approach). All samples tested by the NGS-first approach were retrospectively re-analyzed by chromosomal microarray testing. Consequently, all 61 patients were analyzed by both microarray and NGS (Table 1).

Table 1 Number of samples and detected CNVs

A total of 52 samples from healthy parents were used for a family-based trio-analysis. Four parental samples for patients #2 and #3 were analyzed using the microarray-first approach; the parental samples of patient #3 were also analyzed by NGS. The remaining 48 samples obtained from the healthy parents were analyzed by NGS. The sample obtained from patient #6’s mother was re-analyzed by microarray. The numbers of samples analyzed by each method are summarized in Table 1.

Genetic analysis methods

Chromosomal microarray testing was performed using the Agilent Human Genome 60 K Array (Agilent Technologies, Santa Clara, CA, USA) as previously described.15 Particular attention was paid to CNVs >200 kb in length because these CNVs were most likely to be detected and most CNVs <200 kb were familial and benign.9 CNVs excluding TruSight One panel (Illumina, San Diego, CA, USA) genes cannot be detected by XHMM; therefore, five CNVs including TruSight One panel genes were selected as positive controls for the XHMM even though they were considered non-pathogenic CNVs. Upon the identification of suspected disease-causing CNVs, the parental samples were analyzed by chromosomal microarray testing and/or fluorescence in situ hybridization (FISH) using human bacterial artificial chromosomes as probes as previously described.15,16 Metaphase spreads were prepared from peripheral blood lymphocytes using standard methods.

Targeted exome sequencing was performed using the TruSight One sequencing panel (Illumina) comprising 125,396 probes to capture 11,946,514-bp targeted exon regions. These exon regions consisted of 4,813 genes that might be associated with known clinical phenotypes based on the Online Mendelian Inheritance in Man database (OMIM; http://www.omim.org/). The sequence library was constructed using 50 ng of genomic DNA, and the 151-bp paired-end reads were sequenced using the MiSeq next-generation sequencer (Illumina) as previously described.1014 The extracted data were mapped to a reference genome using the BWA Enrichment v1.0 cloud software (Illumina). GRCh37/hg19 was used as the reference sequence in this study. On average, we obtained 2.6 Gb of targeted aligned sequences and a mean coverage depth of 105.3 (target coverage at 20×: 95.1%). The extracted variants were annotated and filtered using the Variant Studio software (Illumina). Candidate SNVs that could be related to the clinical features of patients with developmental delays were further analyzed through a family-based approach using parental samples.

The XHMM was used to quantify copy numbers throughout the genome.4,7,9 Briefly, BAM files generated from the GATK data-processing pipeline were accumulated and the depth of coverage was calculated. The mean coverage of the targeted regions was obtained and normalized using principal component analysis. The principal component analysis-normalized depths were filtered and z-scores were obtained for these depths. The calculated data were validated using negative samples (30 BAM files from patients who showed no significant CNVs in previous chromosomal microarray testing and 15 BAM files from their healthy parents; Table 1). BAM files from patients #1, #2, #3 and #4 and from the mother of patient #3 were used as the positive controls. Twenty-seven BAM files from patients and 34 BAM files from their healthy parents subjected to the NGS-first approach were used for the XHMM analysis in addition to the BAM files subjected to the microarray-first approach (Table 1). The final CNV results were visualized in the primary genomic structure as a graph.

Results

Microarray-first approach

The microarray-first approach was used for 34 patients. Ten CNVs >200 kb in size were detected in eight patients. Among these, five CNVs did not include genes targeted by the TruSight One panel (indicating non-pathogenic CNVs). Therefore, the remaining five CNVs detected in four patients were listed as positive controls for the XHMM (Table 2).

Table 2 Comparison of the CNV data extracted from the microarray and XHMM

Two patients (patients #2 and #3) showed relatively large CNVs; these CNVs were initially suspected to be disease-causing CNVs. However, both CNVs were confirmed to be inherited from healthy parents and thus were considered non-pathogenic CNVs. Briefly, the microarray result in patient #2 was arr 10q11.22q11.23(46,964,973-51,595,050)×3 (Figure 1a). The parental samples of patient #2 were analyzed by microarray and the 10q11.22q11.23 duplication was identified in the father. This result suggested that this CNV was familial and not related to the developmental delay observed only in patient #2, although recurrent occurrence of reciprocal deletions/duplications of this region was previously reported.17 The microarray result in patient #3 was arr 10q22.3q23.2(81,697,501–88,517,433)×1 (Figure 1b). The 10q22.3q23.2 deletion in patient #3 was confirmed by FISH (Figure 2a). The parental FISH analyses showed a deletion in the mother, indicating a parental origin. The 10q22.3q23.2 deletion in the mother was retrospectively re-confirmed by microarray (Table 2). DNA samples from patients #2 and #3 and patient #3’s parental samples were further analyzed by NGS; however, this analysis yielded no possible disease-causing SNVs. The XHMM analysis of these samples detected CNVs that were almost identical to those identified by the microarray (Figure 1a and b, Table 2).

Figure 1
figure 1

Genomic copy number variants detected by the microarray-first approach. (a) A gain of 10q11.22q11.23 is shown by microarray (upper) and XHMM (bottom). (b) A loss of 10q22.3q23.2 is shown by microarray (upper) and XHMM (bottom). Horizontal axes indicate physical positions of chromosome 10. Vertical axes indicate signal log2 ratio for microarray (upper) and z-scores for XHMM (bottom). The results of microarray are visualized in Gene View, created by the Agilent Genomic Workbench v.6.5 (Agilent Technologies). XHMM, eXome Hidden Markov Model.

Figure 2
figure 2

Results of fluorescence in situ hybridization (FISH) analyses. (a) Loss of the green signal (a white arrowhead) labeled for RP11–185K11 (10q23.1: 84,628,854–84,778,657) indicates a deletion in patient #3. Red signals labeled for RP11-387K19 (10p15.3: 322,071–162,974) are markers of chromosome 10. (b) An unbalanced translocation between 5p and 14q is confirmed in patient #5 by an additional green signal (a white arrowhead) labeled for RP11-379F22 (14q32.33: 106,920,250–107,014,205) on chromosome 5, indicated by a red signal labeled for RP11-260C12 (5q35.2: 174,674,775–174,854,980).

Although one CNV initially identified by the microarray in patient #4 was successfully detected by XHMM, the other CNV in patient #4 was not detected. The CNV identified in patient #1 was detected by XHMM; however, the calculated size (640 kb) was significantly different from the size calculated by the microarray (2.2 Mb; Table 2).

NGS analysis of the 34 patients yielded 16 possible candidate SNVs, some of which were previously reported.1014

NGS-first approach

Twenty-seven patients were analyzed by the NGS-first approach; possible disease-causing SNVs were identified in eight of these patients. All 27 BAM files extracted via NGS were analyzed by XHMM, and two samples from patients #5 and #6, in whom no possible disease-causing SNVs were identified through the NGS analysis, showed possible CNVs larger than 200 kb. Patient #5 showed a genomic copy number loss at 5p and a gain at 14q from which an unbalanced translocation between 5p and 14q was suspected (Figure 3a). These findings were re-confirmed by microarray as arr 5p15.33p15.2(1-12,748,960)×1,14q32.12q32.33(93,705,209–107,349,540)×3 (Figure 3a). FISH confirmed the unbalanced translocation der(5)t(5;14)(p15.2;q32.12) (Figure 2b). The parental examination showed no translocation, indicating a de novo occurrence in the proband. Patient #6 showed a genomic copy number gain in the 16p11.2 region (Figure 3b). Her mother showed the same result (Figure 3b), indicating a possible familial duplication of 16p11.2. The microarray result in this mother and daughter was arr 16p11.2(29,820,221–30,105,987)×3 (Figure 3c). This result was almost identical to the XHMM result.

Figure 3
figure 3

Genomic copy number variants detected by the XHMM-first approach. (a) XHMM analysis shows genomic copy number loss and gain in the terminal region of 5p (left) and 14q (right), respectively (upper). Similar patterns are re-confirmed by microarray (lower). (b) XHMM shows a gain in 16p11.2 in both the proband and her mother. (c) Duplicated 16p11.2 region, observed in both the proband and her mother, is expanded and visualized in the Gene View (Agilent Genomic Workbench; Agilent Technologies). Horizontal axes indicate physical positions of the chromosomes. Vertical axes indicate signal log2 ratio of the microarray and z-scores for XHMM. XHMM, eXome Hidden Markov Model.

Samples from the remaining 25 patients that did not show any CNVs in the XHMM analysis were re-analyzed by microarray, but no CNVs larger than 200 kb were detected. This finding indicates that the presence or absence of pathogenic CNVs was not misdiagnosed by XHMM (no false-negative or false-positive detection of pathogenic CNVs); thus, the detection ratio for the pathogenic CNVs was 100%.

Patient reports

The clinical features of two patients (patients #2 and #3) who showed relatively large CNVs in the microarray-first approach and two patients (patients #5 and #6) who were diagnosed as having chromosomal aberrations by the NGS-first approach are described below.

A 16-month-old girl (patient #2) was born with a birth weight of 2594 g, a length of 49 cm, and an occipitofrontal circumference of 31 cm, indicating microcephaly. The patient presented mild developmental delay and her brain magnetic resonance imaging showed reduced brain volume.

A boy aged 5 years and 7 months (patient #3) was delivered by Cesarean section with a birth weight of 2430 g. He started to walk at 2 years; however, he was incapable of forming meaningful words even at this age, indicating a severe developmental delay.

A 1-year-old girl (patient #5) was born with a birth weight of 1,282 g, a length of 38 cm, and an occipitofrontal circumference of 29 cm. From early infancy, she showed severe wheezing due to laryngomalacia. This patient presented multiple congenital anomalies, including a high-arched plate, micrognathia and large thumbs. Based on these findings, she was suspected of having Rubinstein-Taybi syndrome.18 Congenital cataracts and bilateral moderate deafness were also noted. The severe laryngomalacia necessitated a tracheostomy. At 8 months of age, this patient was 55.5 cm (−5.8 s.d.) long, weighed 4508 g (−4.8 s.d.), and presented an occipitofrontal circumference of 38.5 cm (−2.7 s.d.), indicating a severe growth delay. She showed a severe developmental delay, with a developmental quotient of 28 evaluated by the Enjoji developmental test. Conventional G-banding showed a normal female karyotype of 46,XX. Retrospective evaluation facilitated the identification of a rounded face; however, the severe wheezing drowned out the mewing cry representative of cri du chat syndrome.19

A 10-year-old girl (patient #6) was born at 41 weeks of gestation with a birth weight of 3,240 g and an occipitofrontal circumference of 33 cm. She showed normal development in early infancy. However, at 3 years and 10 months, she suffered from partial seizures on her right side. Subsequently, complex partial seizures, secondary generalized seizures, and atypical-absence seizures were observed. After the occurrence of the seizures, the patient presented a decline in her language ability. Brain magnetic resonance imaging at 4 years showed no abnormality. The interictal electroencephalogram showed continuous slow waves and spikes. The seizures were controlled and no EEG abnormalities were observed after prescription of anti-epileptic drugs. At 10 years, the Wechsler Intelligence Scale for Children-III measured her intelligence quotient (IQ) as 68, indicating a mild intellectual disability. At present, the patient shows no abnormal neurological findings or abnormal behavior.

Discussion

In this study, a system was constructed to detect CNVs using BAM files extracted from the GATK pipeline after targeted re-sequencing, which was initially aimed to identify disease-causing SNVs in patients with developmental delays of unknown etiology. The CNVs detection was accomplished using XHMM as a statistical tool. The term ‘whole-exome’ generally covers a region of approximately 58 Mb. In comparison, the TruSight One sequencing panel (Illumina) used in this study targeted regions of ~12 Mb; therefore, the region targeted by this panel was approximately 21% of the ‘whole-exome’ sequence. This limitation raised a concern as to whether the exome data extracted from this sparse target region could be sufficient to screen disease-causing CNVs.

The TruSight One sequencing panel (Illumina) was constructed for the detection of disease-causing SNVs. For this purpose, this panel targets approximately 5,000 genes associated with known clinical disease phenotypes based on the Online Mendelian Inheritance in Man (OMIM; http://www.omim.org/). Indeed, we identified 24 possible disease-causing SNVs in 61 patients with developmental delays of unknown etiology, some of which were reported elsewhere.1014 Therefore, the genes selected by this panel were strongly relevant to Mendelian diseases. This finding led to the consideration that CNVs in the genetic regions targeted by the TruSight One sequencing panel (Illumina) could have disease-causing traits in most cases.

In the preliminary study (microarray-first approach), we used five positive-control CNVs in four samples (these CNVs were previously identified by the microarray-first approach in these samples) (Table 2). Among these samples, four CNVs were successfully detected (detection ratio 80%), whereas the remaining CNV (in patient #4) was not detected by XHMM. This undetected CNV region (chr20:4,534,383–4,937,261) included two TruSight One genes (PRNP and PRND); however, these two genes consisted of only two exons each. The CNV identified by the microarray in patient #1 was detected by XHMM; however, the calculated size of this CNV was significantly different. Only 1 gene (CSMD1) in this CNV region (chr8:3,710,810–5,922,013) was targeted by the TruSight One panel, and only the first 5 exons of the total of 70 exons of CSMD1 were included in the CNV region. Therefore, the small range of the targeted exons in the CNV regions could be attributed to their lack of detection by XHMM. In comparison, three CNVs were successfully identified because a sufficient number of exons in the CNV regions were targeted by the TruSight One panel (Illumina). Although the lengths of the CNVs identified by both microarray analysis and XHMM were not identical to one another, the accuracy of the analysis depended on the number or density of the exons included in the TruSight One panel within the region. Consequently, this preliminary microarray-first approach showed a high detection ratio by XHMM, which encouraged us to perform the subsequent NGS-first approach.

Disease-causing chromosomal abnormalities were successfully identified in two patients using the non-biased and blinded serious study with the NGS-first approach. The ~300 kb-long 16p11.2 duplication was successfully identified in patient #6. Despite its short length, this region is gene-rich and contains seven of the genes included in the TruSight One panel. This finding may explain why XHMM clearly revealed this small duplication in both patient #6 and her mother. The z-score data calculated by XHMM did not reflect the exact copy number and could not be used to determine whether the suggested gain of this region was due to a duplication or triplication. Re-confirmation by microarray was necessary for the final diagnosis. Recently, the 16p11.2 duplication was reported as a risk factor for Rolandic epilepsy.20 Patient #6 showed continuous slow waves and spikes, which might be related to the 16p11.2 duplication in this patient.

Patient #5 was initially suspected of having a single gene disorder (Rubinstein-Taybi syndrome) based on her clinical features. The samples obtained from this patient were preferentially subjected to NGS analysis as opposed to chromosomal microarray testing. The NGS data showed no disease-causing SNVs; however, the XHMM results suggested a possible unbalanced translocation with der(5)t(5;14), which was confirmed by the FISH analysis (this unbalanced translocation was misdiagnosed during conventional karyotyping because of similar G-band patterns in both telemetric regions). A retrospective clinical evaluation of patient #5 suggested the lack of any contradiction; thus, the majority of the clinical features in patient #5 were compatible with those often observed in patients with the 5p- syndrome, although many features were modified by the 14q duplication.19

The 10q22.3q23.2 deletion in patient #3 was initially considered a benign CNV because the same deletion was identified in the healthy mother. Therefore, the existence of an unmasked mutation in the homologous allele in the 10q22.3q23.2 region was suspected. However, this hypothesis was negated by the subsequent NGS analyses. van Bon et al.21 reported two cases with familial 10q22.3q23.3 deletions. Finally, we concluded that this deletion might be rather pathogenic in patient #3.

The TruSight One panel was developed with the aim of clinical sequencing. Consequently, this panel targets clinically relevant SNVs. In this study, BAM files extracted through NGS using the TruSight One panel were used to detect clinically relevant CNVs. XHMM was used for this purpose because it focused on identifying rare CNVs with a population frequency <5%.4 A pre-normalization step is performed in XHMM to increase the homogeneity of the samples prior to the principal component analysis, which removes extremely variable targets unrelated to CNVs.8,9 Thus, CNVs detected via XHMM are prone to be pathogenic. The ability of the NGS-first approach to detect pathogenic CNVs was comparable to that of chromosomal microarray testing using a 60k platform, which was consistent with the results of previous reports on WES.5,6

In conclusion, pathogenic CNVs were successfully identified by XHMM using BAM files extracted through disease-related exome panel sequencing. The NGS-first approach yielded no false-negative or false-positive values, indicating the high sensitivity and high specificity of this method for the detection of pathogenic CNVs. Although the sample size was not sufficient to reach a final conclusion, the results of this study indicated that XHMM combined with the targeted exome data covering a 12 Mb region could be used as a first-tier screening approach for the identification of both SNVs and CNVs. The identification of possible pathogenic CNVs by the NGS-first approach should be re-confirmed by microarray as the final diagnosis.