Chromosomal microarray testing (CMT) has been established as a diagnostic tool for detecting genomic copy number variations (CNVs). In 2010, CMT was suggested as a first-tier screening for patients with developmental delay (DD) due to unknown etiology.1 The identified CNVs should be validated through several steps.2 First, the identified CNVs are checked for any relationship to known chromosomal disorders or recurrently identified genomic deletion/duplication syndromes. If the identified CNVs do not correlate with any known syndromes, further analyzes must be performed via PUBMED (https://www.ncbi.nlm.nih.gov/pubmed), CNV databases such as dbVAR (https://www.ncbi.nlm.nih.gov/dbvar/), and CNV morbidity map of DD (https://www.ncbi.nlm.nih.gov/dbvar/studies/nstd100/) to evaluate whether the same CNVs have been previously reported. If the pathogenicity of CNVs is difficult to define, the test laboratories can deposit the genotype/phenotype data to online databases such as DECIPHER (https://decipher.sanger.ac.uk/) in order to curate the pathogenicity of those CNVs gradually. However, the publicly available information on genomic variants, especially rare/unique variants, may not always be accurate. Thus, the pathogenicity of genomic variants should be reviewed periodically.

If the same CNVs cannot be found in the databases or the literature, parental examination is recommended to evaluate whether the CNVs were inherited or de novo.3 However, de novo occurrences per se cannot be regarded as evidence to support pathogenicity. In contrast, inherited CNVs are more complicated due to phenotypic variability, incomplete penetrance, and a recessive mode of inheritance. The most frequently observed 22q11.2 deletions and duplications are often shared among relatives possibly due to incomplete penetrance.4,5 Thus, a clear classification of whether the CNVs are pathogenic or benign is not always easy in such cases.

At present, we have analyzed ~2,000 samples from patients and their relatives associated with DD for the possible presence of CNVs. Through this analysis, definite pathogenic CNVs were detected in 17% of these samples.4 The CNVs that are frequently observed and already indexed in existing databases can be easily interpreted; however, rare/private CNVs are not, and hence cannot be cross-referenced in such databases. Benign CNVs could also be primarily detected in healthy parents in some cases. Furthermore, the pathogenicity of CNVs is related to their size. Larger CNVs are typically suspected to be pathogenic, whereas smaller CNVs are usually shared within the families and are considered benign.2,6,7 However, there are exceptional cases of large but benign CNVs.

In this study, we analyzed the characteristics of large CNVs that were identified in healthy individuals to better understand their role in the genome. For this purpose, unique CNVs larger than 1.5 Mb that were only identified in a single family or healthy individuals were selected, since most CNVs larger than 1.5 Mb are usually pathogenic (affected versus control=52/1,782; odds ratio=20.3) and most benign CNVs are less than 1.5 Mb.6 The criteria used for the selection of CNVs include (1) larger than 1.5 Mb, (2) identified in healthy individuals, (3) unique and not repeatedly detected in our laboratory, and (4) previously not reported as pathogenic. This study was performed in accordance with the Declaration of Helsinki, and permission was obtained from the ethical committee of our institution. Blood samples were collected from patients and their families after obtaining written informed consent. CMT was then performed using the Agilent microarray 60 K/105 K (Agilent Technologies, Santa Clara, CA, USA) as described previously.8 CNVs were detected using the Agilent Genomic Workbench (Agilent Technologies), and the CNV sizes were measured using the report function. Upon identification of unique CNVs, a trio analysis via microarray or fluorescence in situ hybridization (FISH) was performed using the parental samples. The identified CNVs were then indexed into a database to be evaluated for their frequency and cross-referenced with the GRCh37/hg19 genome assembly via the UCSC genome browser (https://genome.ucsc.edu/).

As a result, five deletions were confirmed to fulfill our criteria (Table 1). Clinical information of the subjects is summarized in Supplementary Information. The number of RefSeq (https://www.ncbi.nlm.nih.gov/refseq/) and OMIM (http://omim.org/) genes included in the deleted regions was evaluated. The dosage effects (haploinsufficiency scores) were also checked through ClinGen Dosage Sensitivity Curation Page (https://www.ncbi.nlm.nih.gov/projects/dbvar/clingen/). The largest deletion identified via the 105 K array (the other four deletions were identified through the 60 K array) was a 6.63-Mb deletion in 6q12–q13 (Figure 1a). This deletion partially overlaps with a previously reported pathogenic deletion.9 The second largest deletion identified was a 6.23-Mb deletion in 3q11.2–q12.2 (Figure 1b). Although similar deletions in 3q12 have been previously reported, only a partial overlap was observed with this interstitial deletion.10 The third largest deletion was a 5.77-Mb deletion in 13q31.1–q31.3 (Figure 1c). Previously reported deletions in 13q were fairly large; hence, smaller deletions were thought to be quite rare.11,12 Additionally, a previously reported deletion was very similar to this deletion; however, it had been identified in a fetus through non-invasive prenatal testing.13 The fourth was a 4.43-Mb deletion in 18q12.2–q12.3 (Figure 1d). An overlapping deletion has been previously reported in a fetus with unknown pathogenicity.14 The fifth largest deletion was a 2.39-Mb deletion in 18q22.1 (Figure 1e). A similar CNV has been reported in a patient with a late-presenting diaphragmatic hernia and microphthalmia; however, the CNV was maternally inherited.15

Table 1 Summary of the deletions larger than 1.5 Mb identified in healthy individuals
Figure 1
figure 1

Chromosomal microarray testing results. The identified deletions are shown in Gene View created by the Agilent Genomic Workbench (Agilent Technologies). (a) subject 1, (b) subject 2, (c) subject 3, (d) subject 4, and (e) subject 5. Previously reported deletions that overlapped with the deletions reported in this study are depicted by blue rectangles or pentagons for comparison. Directions suggested by the pentagons indicate large deletions beyond the edge. All genomic positions are unified to the hg19 assembly.

The five deletions larger than 1.5 Mb identified in this study were from healthy individuals. The fifth deletion in the 18q22.1 region was first identified in a healthy individual, whereas the other identified deletions were shared between the patients with DD and their parents. We cannot rule out the possibility that these deletions may be related to any clinical feature with incomplete penetrance or autosomal recessive inheritance. Even if the pathogenicity of these deletions cannot be clearly determined, it is still important to index these findings into the public databases.

The most characteristic feature among these five CNVs was the sparse distribution of genes in the corresponding regions. The human genome is approximately 3,000 Mb in size and the OMIM database contains gene descriptions for 15,394 genes (https://omim.org/statistics/entry). Hence, we expected an average density of 5 OMIM genes per 1 Mb of genomic region. As such, typical deletions in 22q11.21 include 12 OMIM genes within a 2.52-Mb region (Table 1). However, the density of OMIM genes within the deleted regions identified in this study was smaller than the expected average. Furthermore, all five deletions identified in this study do not include genes with haploinsufficiency scores in association with autosomal dominance trait (evaluated by the ClinGen Dosage Sensitivity Curation Page; Table 1; Supplementary Information). These results were finally evaluated when this manuscript was submitted. Because most of the genes within the relevant regions have not been curated yet and database will be updated periodically, these results may change in the future. In contrast, the typical deletions in 22q11.21 include a gene associated with haploinsufficiency scores as the autosomal dominant trait (Table 1). These findings suggest that we may not be able to evaluate observed CNVs without first checking OMIM genes and their haploinsufficiency scores, even if those CNVs are larger than 1.5 Mb.

As mentioned above, two deletions similar to those identified in this study (deletions in 13q31.1–q31.3 and 18q12.2–q12.3) were primarily identified through prenatal screening. Such rare CNVs identified via prenatal screening often present diagnostic challenges owing to the limited availability of clinical information about the fetus. Therefore, we suggest that indexing these potentially benign but rare CNVs would be beneficial for researchers to allow for evaluation of CNVs with unknown significance. However, compiling a complete list of such potentially benign CNVs would be perpetual because of the large number of benign CNVs being continuously reported. This study also reaffirmed that trio analysis is necessary for the evaluation of individual rare and private CNVs.

Additional Information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.