Main

Clinical genetics investigates mutations underlying complex associations of phenotypic features, such as multiple congenital malformations and mental retardation (MCAMR). These conditions can be caused by a mutation in a specific gene (e.g., Lesch Nyhan syndrome) or by a genome rearrangement. The latter, including aneuploidies, aneusomies, and chromosome rearrangements, existing either in all cells of an individual or in mosaic form, are classically analyzed by karyotyping.1,2 During the past decades, techniques such as Fluorescence In Situ Hybridization (FISH) allowed detection of small aneuploid segments of a single or several chromosomes, which seem to underlie clinically recognizable conditions such as Wolf-Hirschhorn syndrome, Williams Beuren syndrome, Smith Magenis syndrome, Charcot-Marie-Tooth syndrome, DiGeorge syndrome, which are collectively termed Genomic Disorders.3 Detection of microdeletions and microduplications by FISH or by Multiplex Ligation-mediated Probe Amplification proved to be a major contribution to clinical genetic diagnosis.46

Both FISH and Multiplex Ligation-mediated Probe Amplification are limited to investigating only a few loci at a time. Applying probes on a support (e.g., a glass slide) allowed interrogating larger numbers of genomic loci simultaneously.7 Thus, array-based comparative genome hybridization (array-CGH) has, because of a much higher resolution than classical karyotyping, produced both a dramatically increased detection rate and additional discoveries of much smaller aneuploid segments in the human genome. These subsequently allowed identification of specific genes for syndromes such as CHARGE (coloboma, heart anomalies, chonal atresia, retardation, genital and ear anomalies),8 Peters Plus,9 del(17)(q21),10,11 Pitt-Hopkins,12 recurrent 1q21.1 rearrangements,13 and thrombocytopenia-absent-radius (TAR) syndrome14 (for a review, see Ref. 15). In addition, a large number of “unique” cases of de novo segmental aneuploidy have been identified (for reviews, see Refs. 2 and 16). The use of karyotyping and array-based segmental aneuploidy profiling as two distinct and complementary approaches to analyzing genome rearrangements has considerably widened our understanding of the mutations underlying congenital malformations and psychomotor retardation. However, novel problems have emerged and a revision of our approach to dealing with genome rearrangements and imbalances in patients with these conditions seems warranted.

ASSESSMENT OF THE CLINICAL SIGNIFICANCE OF COPY NUMBER CHANGES IS NOT STRAIGHTFORWARD

As outlined in Table 1, several mechanistic relationships between segmental aneuploidies and phenotypic features may exist and may occur simultaneously. This complicates straightforward implementation of array-based aneuploidy profiling in clinical genetics. Similar to karyotyping, array-based aneuploidy profiling, uncovers, in addition to pathogenic imbalances, apparently phenotypically neutral polymorphisms, albeit at a much higher rate.1722 As of February 1, 2010, a compilation of data from 35 publications in the Database of Human Genomic Variants (DGV) lists 49,988 entries, comprising 29,133 copy number variants (CNVs) representing 8,410 CNV loci.23 A CNV is defined as a DNA segment longer than 1 kb with a variable copy number compared with a reference genome.21 Because array-based aneuploidy profiling measures differences in fluorescence intensities after hybridization to an array of bacterial artificial chromosomes or oligonucleotide probes of a patient DNA sample in comparison with a reference sample, these assays actually measure DNA copy number changes (CNCs). “Variants” are generally being considered as alterations or uncommon forms of no clinical significance. To account for this, and to avoid confusion, the term CNC, rather than CNV has been proposed.24,25 Detection of significant numbers of CNCs in healthy individuals has presented us with the challenge of distinguishing phenotypically neutral CNCs from those contributing to the patient's phenotype.21,22

Table 1 Pathogenic mechanisms and diagnostic consequences of genomic imbalances

SIZE AND GENE CONTENT OF CNCs

Some features of these CNCs, such as size, the number of protein-encoding genes involved, and prevalence in the general population, have been proposed as tools to differentiate among these possible interpretations. Vermeesch et al.26 and de Vries et al.27 suggested that polymorphisms may be smaller in size than pathogenic rearrangements. Thienpont et al.28 suggested that not mere size but rather the number of genes contained within a CNC may determine its plausibility as contributing to the phenotype of the patient. However, a single, critical gene in a small CNC may be sufficient to cause a specific disorder. When analyzing 278 patients and 48 healthy parents with a 3,783 bacterial artificial chromosomes-based array,29 we found that, in 20 patients, segmental aneuploidies have arisen de novo, ranging in size from 0.11 up to 8.16 Mb, whereas familial polymorphisms ranged between 0.22 and 4.70 Mb.30 By classical karyotyping, an appreciable number of phenotypically neutral transmitted unbalanced chromosomal rearrangements and euchromatic variants has been detected (for a compilation see, Barber, The Chromosome Anomaly Collection, http://www.ngrl.org.uk/wessex/collection/index.htm).17 In a recent study of 2,500 healthy subjects, Itsara et al.22 uncovered phenotypically neutral variants >500 kb in 5-10% and variants >1 Mb in 1-2% of individuals. Given this high degree of overlap in size and high frequency of occurrence in healthy individuals, pathogenic segmental aneuploidies and transmitted benign, familial, CNCs cannot be distinguished based on their size alone.22

PREVALENCE AND DE NOVO FORMATION RATES OF CNCs

Recent studies of microdeletions flanked by low copy repeats (LCRs) have revealed significant prevalences in affected and in healthy individuals.3134 As a result, the relative risk for a phenotypic feature (e.g., epilepsy) conveyed by such a microdeletion ranges well below the level needed for a dominant negative effect.35,36 This further complicates clinical interpretation in case such a microdeletion has been detected. In studies of families with such cases, transmission of a segmental aneuploidy from a healthy parent is taken as not conveying a phenotypic effect.3739 Thus, only segmental aneuploidies arising de novo in a sporadic patient will be taken into account. In this way, it is generally assumed that de novo formation of CNCs is a rare event. To date, a single genome-wide estimate of the rate of de novo formation of CNCs has been published.40 By Representational Oligonucleotide Microarray Analysis (ROMA), Sebat et al.40 found two de novo CNCs in a population of 196 healthy individuals. This amounts to a de novo generation rate of CNCs of 0.01. A different estimate has been derived from the de novo deletion and duplications rates of the Duchenne Muscular Dystropy (DMD) gene, which extrapolated to the entire genome would amount to 0.14 insertion/deletion events per generation.41,42 This estimate is indirect and does not take into account the highly variable rates of de novo aneuploidy formation at different loci. Recent measurements of the de novo deletion/duplication rate at four different loci of genomic disorders were directly assayed in single sperm cells. Frequencies of de novo rearrangements ranging from 6 × 10−5 for the hereditary neuropathy with liability to pressure palsies (HNPP)/Charcot-Marie-Tooth disease type 1A (CMT1A) locus to 2 × 10−6 at the Smith-Magenis syndrome locus were found.43 For the recurrent 22q11.21 deletion associated with velocardiofacial syndrome, the mutation rate was estimated to be as high as 2.5 × 10−4.44 All of these loci are known to undergo recurrent rearrangements by nonallelic homologous recombination (NAHR). However, the total number of these loci is not known. In addition, no de novo rearrangement rates have yet been determined for those which may have arisen by microhomology mediated processes.45 In contrast, the average de novo mutation rate for single nucleotide polymorphisms in the human genome is estimated to be 2.5 × 10−8 per nucleotide per generation.46 Thus, de novo structural rearrangements may occur at a higher frequency within the general population.47 Accordingly, the per-locus mutation rate for genomic rearrangements has been estimated at a 100- to 10,000-fold the rate of point mutations.48

DELETIONS REVEALING RECESSIVE MUTATIONS

A ramification of such a high frequency of de novo CNC formation is that CNCs involving losses of one or several protein-encoding genes lead much more frequently to a hemizygous mutation than do heterozygous (gene) mutations. In case a heterozygous gene mutation coincides with a loss of the allele on the trans chromosome, the carrier experiences a state equivalent to a compound heterozygous mutation. As has been pointed out,3,49 hemizygous losses may ”unmask” autosomal recessive mutations. This is probably a frequent, albeit underreported, mechanism of action of recessive mutations.50 In a patient with a hemizygous loss of the 4p16 region, such “unmasking” of an autosomal recessive mutation (in the WFS1 gene) on the structurally normal chromosome 4 has been shown to contribute to the complex clinical phenotype of this patient with a combined Wolfram- and Wolf-Hirschhorn syndrome.51 This means that patients harboring a deletion may have inherited a mutation in a phenotypically relevant gene within the deletion from their other parent, thus provoking an autosomal recessive disorder. In this way, the Hardy-Weinberg equilibrium, assumed to underlie allele frequencies in the general population, may get distorted. A systematic investigation of such a distortion and the resultant lower frequency of pathogenic gene-mutations in comparison with a putatively higher frequency of CNCs conveying hemizygous gene losses has yet to be undertaken.

SEGMENTAL ANEUPLOIDIES OVERLAPPING WITH KNOWN CNVs

A further complication in the process of distinguishing pathogenic and phenotypically neutral CNCs results from the finding that probably pathogenic, segmental aneuploidies in patients frequently encompass, or at least overlap with, one or even several CNCs that have also been detected in healthy individuals. This suggests that haploinsufficiency for a gene contained within a CNC is not necessarily pathogenic in patients with losses involving these CNCs. Recent studies of recurrent losses and gains flanked by segmental duplications in regions such as 1q21, 15q11.2, and 15q13.3 in cohorts of patients with epilepsy,13,31,34 schizophrenia,33,52 mental retardation,13,38,39 and autism spectrum disorders32,37 have shown a wide range of phenotypes, including “normal,” in carriers of these segmental aneuploidies. Interestingly, in families with a patient carrying a deletion in 15q13.3, the number of healthy individuals exceeds the number of patients by almost 3 to 1.38 To evaluate a putative contribution to the patient's phenotype, if any, criteria such as cosegregation with the clinical phenotype in the patient's family, should be taken into account.21

Considering only CNCs that emerged de novo and cosegregate with the clinical disorder in the pedigree of the patient may provide a rigorous, albeit conservative, approach to identifying genes that may be causally related to the phenotype of the patient under study.3,21,53 This does not necessarily mean that haploinsufficiency or a 3-fold gene dosage resulting from the CNC is the only pathogenic mutational mechanism underlying the phenotype of the patient. For instance, analyses of pedigrees of families with multiple patients with autism spectrum disorder suggest that unaffected carrier mothers may transmit a mutation to their sons who then manifest several aspects of this complex disorder up to a full blown autism.54 Recent studies of large cohorts of subjects with microdeletions in the 15q13.3 region have demonstrated such a mechanism in roughly half of the families.31,32,34 Because such a parent-of-origin effect may be involved in families in which affected children inherit a CNC, such a mechanism should also be considered.

POSITION EFFECTS OF SEGMENTAL ANEUPLOIDY

In addition, loss or gain of a stretch of DNA devoid of protein-encoding genes should not necessarily be discarded as phenotypically irrelevant.5557 As has been shown recently in patients with Williams-Beuren syndrome, deletions may influence expression levels of the adjacent, nondeleted genes.58 Recently, Castermans et al.59 showed that the AMYSIN gene located at 300 kb from a breakpoint of a ring 14 became silenced and this may account for the autistic phenotype of the patient. These examples indicate that other mechanisms than mere gene copy number alteration, such as gene silencing, may be involved in determining clinical phenotypes and therefore should be taken into account during interpretation of results from array-based aneuploidy profiling.

It is conceivable, in particular with large, gene-rich, CNCs that several of the aforementioned mechanisms (Table 1) may operate simultaneously. This complicates interpretation of CNCs in clinical practice even more.

DISTINGUISHING POLYMORPHIC AND PATHOGENIC CNCs

The interpretation of CNCs in clinical genetics aims to answer two basic questions. First, does the detected aneuploidy (e.g., CNC) contribute to explaining the clinical phenotype of the patient? Second, what is the risk of recurrence for this CNC and the resulting clinical phenotype?

The aforementioned characteristics of CNCs obviously complicate the process of clinical interpretation. In 2007, Lee et al.21 listed a number of characteristics associated with pathogenic CNCs. Thus, a CNC inherited from an affected parent, or similar to one found in an affected relative, which does not overlap with a CNV listed in the DGV, which is rich in protein-encoding genes and does contain OMIM genes is more likely to be pathogenic than a CNC inherited from a healthy parent or that is similar to one found in a healthy relative. Trying to translate these characteristics into a workflow would require full knowledge of the CNC complement of both parents and, if available, multiple relatives. As discussed before, although a full family examination may be preferable, mandatory array-CGH of all relevant family members before the interpretational process (as recommended by Lee et al.21) may be costly and time-consuming. Thus, a workflow based on extensive family investigation may, however justifiable from a clinical genetic point of view, be difficult to achieve in daily practice.

Recently, streamlined workflow schemes basing the evaluation of CNCs on their presence or absence in online databases have been proposed.6065 These approaches all rely heavily on anonymous compilations of large population-based studies in which the investigated DNA donors were assumed to be healthy, although they have not been exhaustively phenotyped. Thus, these data sets may have been “contaminated” by CNCs that may have had a phenotypic impact. As discussed by Pinto et al.,66 compilations based on HapMap samples suffer from the limitation that no medical information has been obtained. In addition, the DNA for these samples has been isolated from Epstein Barr Virus-transformed lymphoblastoid cell lines, which may harbor transformed-induced genomic rearrangements.66,67 In addition, the boundaries of the CNVs in for instance the DGV have been mapped with array platforms with different levels of resolution.49 This makes it difficult to decide as to whether the CNC found in a given patient overlaps with a reported CNV. On the other hand, multiple appearances of the same CNV in multiple, unrelated healthy individuals decrease the likelihood that this CNV is associated with a significant deleterious phenotype. To avoid relying too heavily on such databases and to rapidly discern between putatively contributing CNCs and those that are likely to be phenotypically neutral, we propose a gene-centered approach (see also Fig. 1).

Fig. 1
figure 1

Flowchart outlining the three-step workflow procedure (for explanation, see text).

  1. 1

    As a first step to evaluate each CNC, available databases (such as ECARUCA, DECIPHER65; for URLs, see Appendix) should be searched to determine whether the same CNC or one overlapping with the one under scrutiny has previously been found in a patient with a similar or overlapping phenotype. Although ECARUCA and DECIPHER appear the currently most useful databases, others (such as Genopedia/Phenopedia, KEGG, OMIM) may also be of help. Because genome-wide aneuploidy profiling is a relatively novel discovery-based approach a large number of “novel,” previously unreported CNCs are likely to continue to emerge in the near future. In addition, as compilation of data is an ongoing process, these databases are as yet not an exhaustive source of information. Another complication is the lack of consistency in descriptions of clinical phenotypes. Recently, a web-based tool has been developed aimed at a systematic and comprehensive phenotypic description.68 Therefore, an entirely “novel” (i.e., not yet described) CNC should not be discarded, but further interrogated as outlined below. For these cases, we need to proceed to step 2.

  2. 2

    As a second step, previously not described CNCs should then be evaluated regarding their plausibility of contributing to the clinical phenotype. To do so, the functions and the tissue transcription patterns of any protein-encoding gene contained within a given CNC should be evaluated. In this step also, genes in the regions flanking the CNC that may be affected in their transcription levels should be considered.57 A systematic evaluation of strategies for gene-prioritization is currently not available yet within DECIPHER, a tool for prioritizing the likelihood that genes within the CNC may contribute to the phenotype has been incorporated.65 First priority should be given to genes that are known to be susceptible to haploinsufficiency and thus may explain (part) of the patient's phenotype. However, a gene involved in an autosomal recessive condition should not be discarded outright, because it may also explain the phenotype when the deletion unmasks a mutation on the other allele.51,69 Second, the temporal and spatial expression patterns of the gene should in principle be commensurate with the pathogenic process leading to the clinical phenotype. Thus, for a gene to contribute to a developmental disorder, it should be expressed during formation of the relevant tissue and organ. This criterion does not allow to discard or to prove a pathogenic role of a CNC, but merely evaluates its plausibility based on existing information on the genes within that CNC. In addition, CNCs containing genes that are not transcribed in tissues involved in the clinical phenotype (e.g., development of the heart in a patient with congenital heart disease) are less likely to contribute. However, such a CNC may contain an hitherto undiscovered microRNA that in the future be linked to a disease phenotype.70 For now, these CNCs are classified as “less likely to contribute” (see Fig. 1) and may be revisited at a later stage.

  3. 3

    The third step investigates whether a CNC has either arisen de novo or cosegregates with an inherited phenotype in a family (van Daalen et al., unpublished data). In a sporadic patient, a de novo CNC is more likely to be causal than an inherited one. However, familial occurrence, i.e., inheritance from an apparently unaffected parent, does not necessarily imply that the CNC is unrelated to the phenotype.54 For instance, unmasking of a recessive mutation (see criterion 2, Refs. 51, 69), an alteration of the size of the imbalance,24 position effects,5658 and subclinical, unrecognized manifestations in the carrier parent are still consistent with a pathogenic contribution of a given CNC. An additional complication is the recent description of highly variable phenotypes in subjects with microdeletions in regions such as 1q21.1 and 15q13.3.13,31 In case such CNCs were found in a patient and in healthy family members, we should continue searching for other potentially contributing CNCs or other mutations.53 In this way, recurrent microdeletions,3139 which are currently viewed as CNVs of uncertain clinical significance, may in some cases be found to contribute to the patient's phenotype.

Sequential application of these criteria should, first, allow us to rapidly select CNCs that are most likely contributing to the clinical phenotype of the patient. Second, the number of costly and time-consuming follow-up studies, sometimes chasing familial polymorphisms,53 would be minimized. Third, the clinical interpretation for genetic counseling of the patient's aneuploidy profile can move ahead even before all relevant family members have been analyzed.

MECHANISMS OF ORIGIN OF CNCs AND DETERMINATION OF THEIR RECURRENCE RISK

Once the CNCs likely to contribute to the phenotype have been identified, an attempt can be made to answer the second core question in clinical genetics: what is the risk of recurrence for this mutation and the resulting clinical phenotype? The recurrence risk for a genomic rearrangement depends on whether the mutation has arisen de novo or is the result of a transmitted chromosomal structure predisposing to the formation of CNCs. A de novo interstitial or terminal loss or gain may arise through a homology driven nonallelic recombination event (NAHR), microhomology-driven processes such as FoSTeS, or nonhomologous end-joining (NHEJ) during repair of a double strand break.16,45,71,72 Rearrangements mediated by NAHR generally result in a recognizable syndrome such as the del(17)(q12) syndrome that presents with a combination of renal disease, diabetes, epilepsy, and characteristic facial dysmorphisms.73 This type of segmental aneuploidy represents the classical “genomic disorder” as defined by Lupksi et al.3,74

Most of the CNCs detected in genome screening studies seem to be more or less “unique” cases,2,16,30,7576 with breakpoint regions that do not contain LCRs, clusters of olfactory receptor genes, or any of the other known chromosomal architectural features and thus have breakpoints not coinciding with other cases.2,30 Those most likely have arisen through nonhomologous end joining of spurious double strand breaks or through a microhomology-driven process.16,45,71 Therefore, their breakpoints cannot as clearly be defined as those underlying genomic disorders. The variability in size of the imbalances may explain why the phenotypes in these patients often are not readily clinically recognizable. In addition, given the incidental nature of spurious double strand breaks, their recurrence at exactly the same site is likely to be much lower than that of LCR-based Genomic Disorders. Patients with this kind of genomic rearrangements are often the most puzzling, because their phenotypes are highly complicated and variable.30,78,79

CNCs may also have resulted from a de novo or a familial reciprocal translocation giving rise to an unbalanced chromosome complement. The recurrence risk of the latter event can be determined unambiguously. A recent publication of such a familial translocation leading to a 5.0 Mb deletion of region 1q44 in a patient with a Dandy Walker variant and agenesis of the corpus callosum provides a case in point.80 The presence of this translocation indicates a recurrence risk for the same clinical constellation of 1 out of 4 and for carrying the translocation without clinical signs of 1 out of 2. Without knowledge of the translocation, the recurrence risk for segmental aneuploidy would have been estimated to be vanishingly small, whereas the second constellation would not even have been considered.80 A second point is that this particular translocation escaped routine karyotyping. It was detected only after array-CGH revealed a segmental loss of 1q44 that had prompted subsequent FISH probing. This, and many other, cases emphasize the continuing need for confirmation of the chromosomal structures underlying the imbalances by karyotyping or FISH.2,80

IMPACT OF A CNC DISCOVERY-BASED DIAGNOSTIC WORKUP ON GENETIC COUNSELLING

The majority of MCAMR patients carry genome imbalances that do not conform to the criteria for single gene-based syndromes or other well-circumscribed conditions such as recurrent Genomic Disorders resulting from the specific architecture of the human genome, which could have been detected by a targeted approach.3,11,81 Therefore, genome-wide aneuploidy profiling may be the most fruitful first step of genetic investigation in such patients.2,82 Precise breakpoint mapping using high-resolution array-based platforms and long range PCR will allow differentiating between cases with Genomic Disorders, which are flanked by LCRs, segmental duplications, or other known architectural features, and cases with “unique” breakpoints.45 Third, information regarding the structure of the genome rearrangement is essential because the recurrence risk for the rearrangement depends upon its structure as outlined above.2,80

The approach outlined above prompted us to invert the diagnostic workup of patients with idiopathic MCAMR. Instead of trying to subsume the patient's phenotype to a preconceived notion of either a (consistent) syndrome or a genome disorder, we should make use of high-resolution genome-wide screening techniques to detect novel CNCs, as the preferred initial genetic investigation in MCAMR patients.2,82 These CNCs need to be subsequently evaluated for their power to explain phenotypic features of the patients. Following the criteria outlined above may help to rapidly identify the CNCs contributing to the phenotype and thus allow us to fully garner the potential of this novel discovery-driven approach to clinical genetic diagnostics. These efforts need to be complemented by a systematic and comprehensive ontology of clinical phenotypes.68

The recent attempts to develop decision trees to determine the potential phenotypic contribution of novel CNCs,6365 and the one outlined above, clearly indicate a need in the current era of genome-wide screening methods. This need may become even more pressing in view of the large amounts of data currently being generated in exome and whole genome sequencing efforts.83,84 The strategies adopted in the published decision trees are remarkably distinct. Although one may, on theoretical grounds, argue in favor or against certain approaches (see above), eventually only large, possibly multicenter studies, such as attempted by Buysse et al.,64 may allow to judge the relative merits and drawbacks of each individual approach. During such an evaluation, minimizing the number of CNCs or other mutations of “uncertain significance”64 may be a particularly important parameter for weighing the clinical utility of the respective strategies.

In summary, in the era of genome-wide aneuploidy screening, newly discovered CNCs may lead us to inverting the diagnostic approach to clinical genetic diagnosis in which novel classes of clinically relevant genomic imbalances will be detected and used to pinpoint genes more or less likely to contribute to the clinical phenotype. Integrating precise information regarding the breakpoint regions, and the structure of the rearrangement as determined by classical karyotyping and FISH investigation will allow us to determine the mutagenic mechanism. Subsequently, these data should be interpreted in terms of the patient's pathology and to eventually estimate its recurrence risk.