Introduction

Inherited bone marrow failure syndromes (IBMFSs) are a group of genetically heterogenoeuous disorders with impaired production of one or more blood cell types. They usually present during childhood and are frequently associated with physical malformations and a high risk of myelodysplastic syndrome (MDS), leukemia, and other types of cancer.1,2 Over the last several decades mutations in over 100 genes have been reported to cause IBMFSs involving fundamental cellular pathways such as DNA repair, telomere maintenance, and ribosome biogenesis.3,4,5 Thus far, in all patients with IBMFSs and an identified genotype, mutations in one gene have been found sufficient to result in a given phenotype. In our recently published data, 59% of the patients with classified IBMFSs and 18% of the patients with unclassified IBMFSs were found to have point mutations using a NGS gene panel assay of about 70 known IBMFS genes.6

Copy number variants (CNVs are submicroscopic deletion or duplications of DNA stretches ranging from several hundreds base pairs to about 3 mb.7 They are common causes of inherited diseases,8,9,10 and have been recognized as an important cause of IBMFSs,10 for example, Fanconi anemia11,12 and Diamond Blackfan anemia.13,14 We previously found that 16.4% of the IBMFS patients without identified point mutations had pathogenic CNVs by single nucleotide polymorphism arrays or comparative genomic hybridization arrays.3 Unfortunately, all current methods to detect CNVs suffer from major limitations. Metaphase cytogenetics can only detect variants greater than 3 mb and may miss abnormalities in areas that are not well visualized. Furthermore, the exact size of CNVs and whether the cytogenetic abnormality affects the copy number of a specific gene cannot be determined by metaphase cytogenetics. Comparative genomic hybridization arrays and single nucleotide polymorphism arrays only detect CNVs that are larger than 100 and 10–50 kb, respectively. Some important genes may not be covered by these arrays due to difficulties in designing proper probes. Last, small indels can be missed by Sanger sequencing if the primers are designed to anneal to the normal sequence of the indel region. Whole genome sequencing may overcome some of the above limitation;15 however, currently the rate of CNV detection by whole genome sequencing is only 35–50%.

Since next generation sequencing (NGS) gene panel assays are increasingly used to identify point mutations in clinical and research laboratories,6,16,17 it would be of a great advantage if they can also be used to detect CNVs. Several previous studies suggested that CNVs can be inferred from NGS data.18,19 Importantly, Anders and Huber20 compared high-throughput sequencing data by several different methods using a negative binomial distribution and found that the normalized coverage ratio can control type I error.

The aim of this study was to determine whether CNVs can be detected in data from NGS gene panel assays using methods that compare nucleotide read depth in test samples to normalized control coverage values. We also aimed to characterize the identified CNVs and to evaluate their clinical relevance. To address these aims we studied a large number of patients with IBMFSs by an IBMFS NGS gene panel assay.

Results

Characteristics of the patients

Two hundred and fifty eight IBMFS patients without a diagnostic genotype were analyzed by the IBMFS NGS gene panel assay. Fifty three percent were male. At time of analysis, 138 patients had a specific syndromic diagnosis and 120 patients had an unclassified IBMFS (Table 1). The most common diagnoses among tested patients who had classified IBMFS and unknown mutations was Diamond–Blackfan anemia, followed by Fanconi anemia, Kostmann/severe congenital neutropenia, Shwachman–Diamond syndrome, and dyskeratosis congenita. Among the 120 patients with unclassified IBMFSs the largest group had global bone marrow failure (pancytopenia). Importantly, 20 patients with unclassified IBMFSs had clones/MDS/leukemia at the time they were tested by the NGS gene panel assay.

Table 1 Clinical diagnosis of patients tested by the next generation sequencing IBMFS gene panel assay.

Pathogenic point variants detected by the NGS gene panel

Pathogenic point variants were revealed in 90 patients (35%) studied by the NGS panel batches 1–3. Among the 138 patients with classified syndromes, pathogenic point variants were identified in 73 patients (53%); whereas among the 120 unclassified patients only 17 (14%) were genotyped. The genes identified to harbor pathogenic point variants by the NGS panel assay are listed in Table 2. The identified genes were related to various disorders on the IBMFS spectrum, including IBMFSs with pancytopenia such as Shwachman–Diamond syndrome, Fanconi anemia, and dyskeratosis congenita, IBMFSs with predominantly anemia, such as Diamond–Blackfan anemia and inherited sideroblastic anemia, IBMFSs with predominantly neutropenia, such as Kostmann/severe congenital neutropenia, and IBMFSs with predominantly thrombocytopenia, such as familial thrombocytopenia. The identified genes function in various cellular pathways, including DNA repair (e.g., FANCA and FANCG), ribosome biogenesis (e.g., SBDS and RPS19), telomere maintenance (e.g., TINF2 and DKC1), and hematopoietic signaling (such as GATA2). Analysis of point variants from the NGS panel batch 1 and 2 was published previously.6 Table 2 shows results of reanalysis of pathogenic point variants from these batches and of those from batch 3.

Table 2 Pathogenic point variants detected by the inherited bone marrow failure syndrome NGS gene panel assay.

Overall detection of CNVs by the NGS panel

Of the 258 patients analyzed by the panel 168 were not found to have pathogenic point variants by the panel. Samples from 165 of the patients without identified pathogenic point variants were subjected to CNVs analysis using a computerized software program. Of the 165 patients, 10 (6%) were found to have pathogenic deletions in IBMFS genes (Fig. 1) using the strategy described in the Methods. All identified deletions were validated. Detailed description of the deletion and subjects are presented in the following paragraphs and in Table 3.

Fig. 1
figure 1

Workflow of cohort analysis and overall rates of pathogenic variant detection.

Table 3 Pathogenic copy number variants identified by the inherited bone marrow failure syndrome NGS gene panel.

Detection of CNVs in patients with previously established clinical diagnosis

The largest group of patients with identified deletions were those with Diamond–Blackfan anemia and included five patients. Three of these patients had a heterozygous deletion that encompassed the entire RPS19 gene (Fig. 2a–c and Supplementary Table 3). Despite being siblings, having severe anemia since infancy, markedly reduced bone marrow erythropoiesis and normal erythrocyte adenosine deaminase (eADA) activity levels, these patients had otherwise variable clinical phenotypes and outcome. Patient 1 had scoliosis, patient 2 had large almond-shaped eyes, and patient 3 had no physical malformations. These patients were treated with prolonged courses of oral corticosteroids with varied responses. Patient 1 responded well to corticosteroid treatment. Patient 2 had only a partial response; his hemoglobin was still lower than the normal level for age and he required few red blood cell transfusions when he had intercurrent infections. Patient 3 did not respond to corticosteroids and underwent matched unrelated hematopoietic stem cell transplantation (HSCT) but developed graft failure and died from sepsis after a second HSCT.

Fig. 2
figure 2

Deletions in patients with Diamond Blackfan anemia. a Representative raw data dispersion with 95% confidence intervals from one of the three patients with RPS19 deletions (Patient 2). The dispersion analysis showed that 97.26% of the reads were within fitted interval. The purple dots represent data from chromosome 19 and show clear separation of dots outside the whole data (the RPS19 deletion) from other dots on chromosome 19 (also purple) data and on other chromosomes (gray dots). b Gel electrophoresis shows clear reduction in the quantity of signal of RPS19 exon 3 and exon 4 in the patients compared to controls. In this experiment FOXP2 and RPL11 were used as internal controls. c Results of band densitometry from the gel electrophoresis. d Raw data dispersion with 95% confidence intervals from patient 4 are shown. Totally, 98.57% of the read were within fitted interval. The purple dots on the left show clear separation of the RPL5 deletion from the whole data and from the chromosome 1 data in each of the patient samples. e Gel electrophoresis shows clear reduction in the quantity of signal of RPL5 exon 2–7 in the patient compared to controls. In this experiment FOXP2 and RPL11 were used as internal controls. f The lower panel shows the results of band densitometry from the gel electrophoresis. g An RPL11 gene deletion in Patient 5 is shown. The figure shows the raw data dispersion with 95% confidence intervals from patient 5. Totally, 98.50 % of the reads were within fitted interval. The purple dots on the left shows clear separation of the RPL11 deletion from the whole data and from the chromosome 1 data in each of the patient samples. h Gel electrophoresis shows clear reduction in the quantity of signal of RPL11 exon 2 and 4 in the patient compared to controls. In this experiment RPL35A exon 7 was used as an internal control. i Band densitometry from the gel electrophoresis is shown. E indicates exon, P indicates a patient number, and C indicates a healthy control subject.

Patient 4 had one copy deletion in RPL5 from the promoter region to exon 7 (Fig. 2d–f and Supplementary Table 4). In addition to severe anemia, she had short stature, a large atrial septal defect, sensorineural hearing loss and developmental delay. The patient did not respond to corticosteroid therapy and required chronic red blood cell transfusions.

One copy deletion of the entire RPL11 gene was identified in Patient 5 (Fig. 2g–i and Supplementary Table 5). This patient presented with severe macrocytic anemia, reticulocytopenia and high hemoglobin F at 2 years of age. Bone marrow examination demonstrated pure red cell aplasia. Activity of eADA was elevated. He was born with low birth weight and also suffered from failure to thrive, bilateral complete cleft palate and cleft lip, tracheomalacia, subglottic stenosis, thumb abnormality and developmental delay. He had a good response to corticosteroid therapy.

Patients 6 and 7 had a clinical diagnosis of thrombocytopenia with absent radii syndrome. Using the NGS panel assay we identified compound heterozygosity for a deletion on one allele of RBM8A and a single pathogenic point variant on the other allele. The cases have been previously reported,6 and the deletions were also detected by Affymetrix SNP array.

Establishment of a syndromic diagnosis in patients with unclassified IBMFSs based on CNV data

In three cases with an unclassified IBMFS, detection of CNVs led to establishment of a syndromic diagnosis. Patient 8 had MDS. He was found to carry a germline deletion of the entire GATA2 gene (Fig. 3a–c and Supplementary Table 6). He had moderate pancytopenia, high mean corpuscular volume, developmental delay, attention-deficit/hyperactivity disorder, retrognathia and cutaneous warts on his hands. Bone marrow revealed hypocellular specimen, decreased trilineage hematopoiesis and dyserythropoiesis. Bone marrow cytogenetic analysis showed 46,XY,+1,der(1;7)(q10;p10)[23]/47,idem,+8[5]/46,XY[23].

Fig. 3
figure 3

Deletion of GATA2 gene in a patient with myelodysplastic syndrome. a The figure shows the raw data dispersion with 95% confidence intervals from patient 8. 97.73% of the reads were within fitted interval. The purple dots on the left shows clear separation of the GATA2 deletion from the whole data and from the chromosome 3 data in each of the patient samples. b Gel electrophoresis shows clear reduction in the quantity of signal of GATA2 exon 1, 3, and 5 in the patient compared to controls. In this experiment RPL11 exon 2 was used as internal controls. c The panel shows the results of band densitometry from the gel electrophoresis.

Patient 9 had an unclassified IBMFS with pancytopenia and was found to have a homozygous deletion of the FANCA gene from exon 1 to exon 5 (Fig. 4a–c and Supplementary Table 7). The patient presented with severe thrombocytopenia and mild neutropenia at 8 years of age. Evaluation of his disease revealed low birth weight, short stature, café-au-lait spots, absent right kidney, and developmental delay. This patient did not respond to immunosuppressive therapy for aplastic anemia and eventually died after HSCT. No data about chromosome fragility testing and genetic testing were reported to the registry.

Fig. 4
figure 4

Deletion of FANCA gene in a patient with unclassified inherited bone marrow failure syndrome. a The raw data dispersion with 95% confidence intervals from patient 9 are shown. Totally, 98.30% of the read were within fitted interval. The purple dots on the left exhibit ratio of <0.05 and are clearly separated from the whole data and from the other reads of chromosome 16. b Gel electrophoresis shows the absence of amplified FANCA exon 2 and 5 in the patient compared to controls. In this experiment, FANCA exon 6 and RPL11 exon 2 were used as internal controls. c The panel shows the results of band densitometry from the gel electrophoresis.

Patient 10 presented with thrombocytopenia at 1 year 6 months of age and then progressed to pancytopenia. This patient had multiple anomalies as described in Table 3 and in our previous published article.21 The patient was initially diagnosed with an unclassified IBMFS. Due to an identified presence of compound heterozygosity in PARN and further functional assays, the patient was classified as having dyskeratosis congenita.21 Analysis of the NGS reads by NextGene revealed both the heterozygous deletion and pathogenic point variant. Compound heterozygosity was confirmed by parental testing.21

Discussion

CNVs are an important cause of many genetic disorders including IBMFSs.22 However, methods to detect CNVs lack sensitivity, and frequently multiple diagnostic approaches and tests are required to detect them. Herein, we investigated the ability to uncover CNVs from a method primarily designed for nucleotide-level analysis. We identified CNVs in a proportion of the cases where pathogenic point variants were not found. Currently, NGS gene panels are the standard method to search for genetic mutations. Ideally, these panels would also used to detect CNVs at the same time. This strategy would save time and decrease overall cost of investigation. In the case of IBMFSs, the mutations (large chromosomal changes, smaller CNVs, indels, and other point mutations) and location (protein and nonprotein encoding genes) are variable.6,10 Consequently, the time and cost of genetic testing, if not done efficiently, could be prohibitive. This is particularly important since many IBMFSs patients need treatment urgently. Hence, extracting the maximum amount of information out of each test is critically important. Our study indicates that using NGS gene panel assay for both point mutations and detection of CNVs increases the information that can be retrieved from the assay.

Results of our study demonstrate that analysis of NGS gene panel assays for CNVs can not only establish a genetic diagnosis (such as identifying RPL5 mutations in patients with DBA), but can also identify the specific diagnosis in unclassified or idiopathic bone marrow failure syndromes. In this study, a diagnosis of Fanconi anemia was made in a patient with unclassified IBMFSs and a diagnosis of GATA2-related disorder in another with idiopathic MDS. The identification of a genetic diagnosis in the case of DBA for example is critically important for genetic and family counseling. Similarly, establishing a specific diagnosis of Fanconi anemia in a patient with aplastic anemia is extremently important given the treatment for Fanconi anemia is completely different from that of idiopathic aplastic anemia. Importantly, although positive chromosome breakage testing is the hallmark of Fanconi anemia, in rare cases it is negative or difficult to interpret due to somatic genetic correction and lymphocyte mosaicism in the peripheral blood chromosome fragility testing. Also, establishing a diagnosis of GATA2-related disorder in a patient with MDS would completely change the approach to counseling, screening and selection of family members as donors for bone marrow transplantation for the affected subject.

To our knowledge, this is the first study to analyze an IBMFS gene panel by computerized software to detect small and large size CNV. This software uses normalized coverage value to compare with a defined number of controls, which we set at 10. Most of the detected CNVs were successive exons where at least one of them showed a normalized coverage ratio of ≤0.33 in the case of monoalleic deletion and a normalized coverage ratio of ≤0.05 in the case of bialleic deletion. Validation by an additional method is recommended in case of borderline normalized ratios.

Detection of CNVs from NGS reads using the NextGene software has some limitations. Even positive tests included the combination of low dispersion and normalized coverage ratios that were found in 3 sets of control data but some borderline results might still be positive and may require validation. The combination of developing new algorithms to interpret borderline results and correlating data with clinical phenotype may improve categorization of results as true positive versus false positive. CNV duplications are difficult to identify by the software. Further, some duplicated fragments may not reside in the gene region and may not disrupt the coding sequence.

In summary, NGS gene panels can be used to detect CNVs by careful analysis of normalized coverage values. In our analysis small CNVs encompassing one to few exons were detected from an IBMFS NGS gene panel assay. Due to time and cost advantages, we recommend that NGS gene panels routinely analyze for CNVs before moving on to additional mutation detection strategies such as CNV microarray or whole exome/genome sequencing. Recently the cost of a clinical whole-exome sequencing (WES) test dropped substantially (about $3500 (USA)) and became closer to clinical NGS panels (about $1500 (USA)). Therefore, performing WES as a first line genomic diagnostic test can be considered, particularly for designing virtual panels and only analyzing genes of interest. In such cases, similar strategies to those described in the present study can be applied to obtain CNV data from WES. The results of our work have a significant clinical relevance not only to the field of bone marrow failure, MDS and leukemia, but also to many other disciplines where genetic testing by NGS gene panels became the standard of care. Our study also provides additional tools for research on the prevalence of CNVs in various genetic disorders and on CNV detection by whole exome/genome sequencing.

Methods

Selection of patients

The patients included in this study were prospectively enrolled in the Canadian Inherited Marrow Failure Registry (CIMFR) and did not have known disease-causing mutations. The CIMFR and the experiments outlines in this manuscripts were approved by the Research Ethics Boards of the Hospital for Sick Children and the participating tertiary medical centers in Canada. The study is conducted in accordance with the Declaration of Helsinki, and informed written consent was obtained from all patient or their guardian prior to enrollment. Patients who fulfilled diagnostic criteria for an IBMFS23 in any of the 17 participating centers in Canada were enrolled in the CIMFR since 2001, by the site research team. The vast majority of children with IBMFSs are treated in one of the participating CIMFR site centers. Patients information was collected at study entry and periodically thereafter.

The eligibility criteria for the CIMFR included evidence of chronic bone marrow failure in addition to either a family history of an IBMFS, or physical malformations, or presentation earlier than one year of age. Patients enrolled in the CIMFR who had positive genetic testing for an IBMFS gene were excluded from this analysis. When possible, each case was assigned a specific syndromic diagnosis by the participating center. Diagnoses were reviewed centrally, and if necessary adjusted based on published diagnostic criteria of specific IBMFSs1,2,3 after verification with the respective center. Cases that fulfilled the eligibility criteria, but did not meet the clinical, laboratory and genetic diagnostic criteria for any known IBMFS subtype were defined as unclassified IBMFSs.24

NGS panel assay

Genomic DNA was extracted from peripheral blood, bone marrow fibroblasts, skin fibroblasts or expanded peripheral blood T-cells. Comprehensive NGS panel of known IBMFS genes was designed. Overall, 72, 77 and 141 genes, were included in the first, second, and third analyzed batch of patients, respectively (Supplementary Table 1). The Haloplex Capture Kit (Agilent Technologies, Santa Clara, CA) was used for DNA library preparation according to the manufacturer instructions. Briefly, the assay was based on a hybridization oligonucleotide pool, which covers coding regions, 50 bp flanking intronic regions that included splicing sites, 3′-untranslated regions that included potential translation regulatory elements, and upstream promoter regions. The oligonucleotides were 150 mers with 3× tiling and a maximum of 10 bp overlap between oligonucleotides. The panel design was submitted to the Agilent HaloPlex Design Wizard program (http://www.halogenomics.com/haloplex/custom-reagent-kits). Targeted fragments were amplified and were sequenced on the Illumina HiSeq2500 platform as previous described.6

Variant analysis and filtering strategy

The algorithm used to filter non-relevant point variants and the software programs and websites used to predict protein damage, conservation and minor allele frequency of pathogenic point variants are as previously described.6 Briefly, variants were defined as “pathogenic” if they had been reported as disease-causing in public databases. Novel variants were considered “most likely pathogenic” if (1) they appeared in allelic dosage that was consistent with the known inheritance mode of the disease, (2) evolutionary conserved amino acid/s are affected, (3) the minor allele frequency was <0.001 (4) the variant was considered damaging by at least half of the following prediction software programs: PolyPhen2, SIFT, MutationTaster, MutationAssessor, Provean. Splicing variants were assessed by the Human Splicing Finder software program. Variants that were reported in databases as having unknown significance (usually due to only one or two reported cases), but fulfilled the above criteria were considered as “most likely pathogenic”. Variants that fulfilled most but not all the above criteria remained of unknown significance.

CNV analysis by normalized coverage values

Patients who were found to have no pathogenic point variants by the NGS gene panel assay were analyzed for CNVs. We used the NextGene software program, CNV Tools, to compare the coverage ratio of specific regions in a test sample to ten samples of control projects, which were samples of sex-matched healthy control subjects or patients with other disorders that are not expected to have mutations in the analyzed gene. The beta-binomial model was used to evaluate dispersion. The Hidden Markov Model was used to calculate normalized count ratio and make a classification of specific regions as CNV. To minimize false positive results, we selected calls with dispersion of ≤0.01, minimal normalized read counts of 100, minimal region length of 50 bp and percentage of regions in which CNV calls are expected to be made is ≤5%. After the test sample and 10 control sex-matched samples were loaded, a comparison was made first to the average coverage of all 10 controls; second, to median coverage of all 10 controls; third, to the coverage of one control subject, whose coverage was closest to the test sample. Only regions that were deemed deleted in all three comparisons, were considered true deletions. Short variant calls (<50 bp) were excluded since most often they represent random background noise.

Candidate heterozygous deletions were considered and selected for validation studies if they fulfilled the following criteria: (1) the raw data dispersion was ≥95% and the normalized coverage ratio was ≤0.33 in all three types of comparisons to controls as indicated above; (2) multiple successive exons with raw data dispersion of >95% and at least one of the exons shows normalized coverage ratio of ≤0.33 in all three types of comparisons to controls as indicated above.

Results were considered candidate homozygous deletions and were selected for validation if they fulfilled the following criteria: (1) the raw data dispersion was >95% and the normalized coverage ratio was <0.05 in all three types of comparisons to controls as indicated above; (2) the raw data dispersion was >95% and multiple successive exons that at least one of them shows normalized coverage ratio of <0.05 in all three types of comparisons to controls as indicated above. Determination of CNV frequency in the general population and degree of overlap with previously reported CNVs was done automatically by the software using the Database of Genomic Variants,25 and manually using the following databases: Human Gene Mutation Database (http://www.hgmd.cf.ac.uk/), ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/), DatabasE of Chromosomal Imbalance and Phenotype in Humans using Ensembl Resources (https://decipher.sanger.ac.uk/) and Genome Aggregation Database (gnomAD, https://gnomad.broadinstitute.org/).

Polymerase chain reaction

DNA was amplified using customized primers flanking the regions that was found to be deleted by the NGS panel. The primer sets are described in Supplementary Table 2. Amplified DNA fragments were separated by agarose gel electrophoresis and were visualized by ultraviolet light. Band densitometry was determined using the ImageJ software.

Affymetrix SNP array 6.0

DNA was processed, hybridized to Affymetrix Genome-Wide Human SNP array 6.0 (Affymetrix Inc., Santa Clara, CA, USA) and scanned as previously described.21 Genotyping calls were determined using the Birdseed v.2 algorithm as described.6 CNV were considered novel if they did not appear in healthy controls from The Center of Applied Genomics (Hospital for Sick Children, Toronto, Canada).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.