The application of array CGH or chromosomal microarrays is causing a revolutionary change in clinical genetics and especially cytogenetics, as it enables the genome wide identification of submicroscopic copy number variations (CNVs).1 Given the significant increase in diagnostic yield compared with conventional karyotyping in patients with intellectual disability (ID) and the technical ease of use, the technique is now recommended as a first tier diagnostic test for patients with ID and/or multiple congenital anomalies (MCA).2, 3 Arrays not only enable detection of disease-causing CNVs in patients with ID/MCA, but also in patients with isolated heart defects, neurological diseases and psychiatric disorders. Therefore, besides pediatricians and clinical geneticists, more and more other medical specialists request array analysis arrays.4, 5, 6, 7 In addition, there is a rapid implementation of array CGH in prenatal diagnosis.8, 9, 10, 11

For a small subset of CNVs, the association with an ID/MCA phenotype is beyond doubt. However, many CNVs detected using high-resolution arrays remain private, or the functional relationship with the phenotype is, at best, vague. To enable genotype–phenotype correlations, databases collecting phenotypes and genotypes have been established.3, 12, 13 Although those databases have been successful in establishing a functional relationship for recurrent CNVs, a genotype–phenotype relationship has yet to be established for a majority of rare CNVs. Variable expressivity and reduced penetrance often confound significant associations to be made and is especially challenging for rare variants.1 Due to the difficulty of associating CNVs with a phenotype, several reports provide guidelines on how clinical laboratories can interpret array results. If a clear association between the phenotype under investigation and the CNV is lacking, a series of steps are guiding the interpretation of the clinical significance. In all guidelines, a rule of thumb is that de novo CNVs, not occurring in normal individuals, are considered causal for the abnormal phenotype.1, 3, 14, 15, 16

During our screen of patients with mental retardation and developmental disorders, we identified several private de novo CNVs. Previously, we reported a 250 kb de novo deletion in C20orf133, nowadays known as MACROD2, in a patient with Kabuki syndrome.17 A highly conserved region of C20orf133, likely to have a role in chromatin or chromosome biology, was deleted, and the gene was shown to be expressed in mice in the tissues affected in Kabuki syndrome. Although screening of 62 Kabuki syndrome patients failed to identify mutations in C20orf133, the disorder was hypothesized to be genetically heterogeneous.17, 18 Recently, mutations were identified in MLL2 in 33 out of 50 Kabuki syndrome patients.19 Sequencing of this gene in the patient with the C20orf133 deletion, shows the presence of a de novo mutation in MLL2.20

Most recently, we had a similar experience with another de novo CNV. During a screen of patients with ID and eye disorders, we identified a de novo 86.5 kb deletion in a patient referred because of an eye malformation associated with mental retardation. The deletion harbored only a single gene, AMBRA1. As the gene is expressed in the neural retina and brain, and mice knock-outs result in exencephaly,21 the deletion was considered a likely cause for the observed phenotype. To further establish the clinical relationship between the gene and the phenotype, morpholino knockdowns were performed in zebrafish, which resulted in eye coloboma as well as equilibrium defects. On the basis of these observations and during the preparation of a manuscript, a more detailed phenotypic description of the patient was requested. The patient was characterized by bilateral coloboma, anosmia, disturbance of the equilibrium and ID, a CHARGE like phenotype. Once the diagnosis of CHARGE was uttered, a mutation analysis of CHD7 was instigated.22 A de novo mutation was identified causing a splice site mutation, thus disrupting the CHD7 gene.

Although smaller modifier effects of the deletions on the phenotype cannot be excluded, the overall phenotype can in both cases be explained by de novo point mutations rather than to de novo CNVs. Note that CNV detection in both cases was performed on DNA extracted from blood and not on cell cultures. The latter are known to accumulate chromosomal rearrangements and hence, its use for CNV detection should be avoided for clinical diagnosis. We believe that these two case reports are representative of a larger number of misinterpretations that are currently made in diagnostic laboratories offering array CGH testing. Two important messages for the human (cyto) genetics community can now be drawn.

First, despite the flurry of schemes suggesting that de novo CNVs can be interpreted and counseled to patients as causal for the investigated phenotype,1, 3, 14, 15, 16 this is precocious. These guidelines are based on the (unwritten) assumption that the co-occurrence of both are a rare phenotype with a rare mutational event is statistically unlikely. For large, microscopically visible CNVs harboring multiple genes, this assumption is true. This is the type of rearrangement that (cyto) geneticists have come acquainted with over the last 50 years. However, for smaller CNVs, the chance may be significantly higher. Direct estimates of the genome-wide CNV mutation rates from family studies have estimated the CNV mutation rate to be in the range of 1.2 × 10−2 CNVs per haploid genome per transmission at a median resolution of 150 kb, amounting to about 2.5 CNVs/100 live births.23, 24, 25 One study also demonstrated that larger imbalances (>500 kb) are significantly more enriched than smaller imbalances in the autism spectrum population as compared with a control cohort of normal individuals, which is not surprising, as larger imbalances harbor more genes.26 On the basis of the fourfold increased incidence of de novo CNVs in this autism population, as compared with the unaffected siblings, the number of CNVs in the size range of 60–500 kb wrongly classified as causative can be estimated at about one in five. At present, a direct estimate of the mutation rate for imbalances <60 kb is lacking, but it can be conceived that the frequency of de novo mutation events not leading to developmental anomalies exceeds the number of events that do hit genes causing such disorders. The indirect estimate of de novo CNV <500 bp rate is estimated at a minimum of 6 × 10−2/diploid genome per generation.24, 26 In an extrapolation of the frequency, CNVs cause Duchenne muscular dystrophy, leading to an estimate of one deletion every eight generations and a duplication of 1/50 generations.27 As a consequence, smaller de novo imbalances cannot automatically be classified as likely causal for the investigated phenotype in the absence of strong evidence from other data sources. Recently, novel bioinformatic approaches have been developed to aid the clinical interpretation. These include the incorporation of structural and functional genomic features to distinguish pathogenic from benign CNVs.28, 29 Although those approaches will greatly improve the interpretation, they remain indirect. The extensive collection of CNVs and associated phenotypes in common databases will be a prerequisite for proper clinical interpretation of CNVs.

A second observation is that a molecular diagnosis can only be as good as the clinical diagnosis. In the second patient, before having a thorough clinical work-up, arrays were requested and an apparently causal CNV was detected. However, completeness of the clinical information directed the genetic testing, which subsequently enabled a proper molecular diagnosis to be made. It can be anticipated that in the near future, full exome or genome sequencing will be offered as a clinical diagnostic test.30, 31 A genomic sequence will offer the apparent security that a full genome is analyzed. However, a genomic analysis of all variants will only enable the causative variants to be identified in relation to a well-defined clinical question.