Main

Chromosomal imbalances and rearrangements have long been implicated in the etiology of mental retardation (MR) and congenital anomalies, beginning with the recognition in 1959 that an extra copy of Chromosome 21 is the cause of Down syndrome.1 Standard karyotype analysis can detect microscopically visible chromosome rearrangements, which include trisomies, monosomies, supernumerary marker chromosomes, unbalanced translocations, and large (roughly >5 Mb) deletions and duplications; such large events are responsible for 10% to 15% of severe MR,2 a common diagnosis for which karyotype studies are performed. In the early 1980s, improvements in cytogenetic techniques including high-resolution karyotyping and fluorescence in situ hybridization (FISH) resulted in the identification of smaller cytogenetic rearrangements. Examples include the common deletions responsible for clinically recognizable syndromes such as Prader-Willi and Angelman (15q11-q13 deletions)3 and Smith-Magenis (17p11.2 deletions)4 syndromes as well as the subtelomeric rearrangements that underlie another 2.5% to 5% of MR.57

Along with improved cytogenetics and the recognition that deletions of specific genomic regions result in common phenotypic features came molecular diagnosis of several syndromes including Prader-Willi,8 Angelman,9 Smith-Magenis,10 velocardiofacial,11 and Williams-Beurens12 syndromes. However, the molecular tests available (primarily FISH) were targeted and largely used for molecular confirmation of a specific suspected clinical diagnosis—a “phenotype first” approach. Efforts to fine map the breakpoints of deletions responsible for these disorders revealed that each breakpoint lies within large blocks of duplicated sequence.812 The same flanking structure had been noted for reciprocal duplication and deletion events on Chromosome 17p12 that cause Charcot-Marie-Tooth type IA13 and hereditary neuropathy with liability to pressure palsies,14 two neurologic disorders. Together, these discoveries suggested nonallelic homologous recombination (NAHR) as a possible mechanism for rearrangement, and the term genomic disorder was coined to describe conditions due to recurrent rearrangements that occur because of regional genomic architecture.15

More recent advances in technology for detecting copy number changes—most notably array comparative genomic hybridization (CGH) and single nucleotide polymorphism (SNP) microarrays—enable genome-wide detection of submicroscopic deletions and duplications (referred to here as microdeletions and microduplications). Since the introduction of these technologies, the rate of discovery of submicroscopic rearrangements in both affected and unaffected individuals has increased dramatically. Genome-wide studies of copy number variation were first performed in control cohorts1624 to catalog the copy number changes that the human genome can tolerate without apparently deleterious consequences. Almost simultaneously, discovery of microdeletions and duplications underlying various disorders including MR and developmental delay,2530 autism,3136 and congenital anomalies3740 were underway. Several large studies reported apparently pathogenic deletions or duplications detected by array CGH in 5% to 15% of individuals with MR,25,26,29,4143 and results from large diagnostic laboratories performing clinical array CGH are similar.28,44 Although initial reports of novel genomic disorders primarily described MR syndromes, studies of copy number variation have now expanded beyond phenotypes traditionally thought to have a chromosomal basis (i.e., MR, congenital anomalies. and autism) to include schizophrenia,4549 epilepsy,5052 amyotrophic lateral sclerosis,53 autoimmune diseases,54 craniosynostosis,55 and many other disorders.

Of course, the microdeletions and microduplications that are identified by these methods include both nonrecurrent and recurrent events. Although both types are clearly important to disease etiology, this review will focus on recurrent events, defined here as those that are found in multiple individuals, are the same size, and have the same breakpoints each time they occur. As described earlier, recurrent events occur (and recur) because of a specific underlying genomic architecture in which a unique stretch of DNA is flanked by large, highly homologous segmental duplications that facilitate NAHR.56,57 Later, the importance of genomic architecture, the implications of a genotype-first approach and the burden of proving pathogenicity are discussed. Finally, I will review the findings associated with four of the genomic disorders that have been described primarily since the widespread introduction of whole-genome technologies and that exhibit a wide range of phenotypes.

Importance of genomic architecture for recurrent rearrangements

All genomic disorders described to date share a common genomic landscape: flanking duplications tend to be >10 kb in size, >95% identical at the sequence level, and separated by 50 kb to 10 Mb of DNA.57 Importantly, in order for NAHR to result in deletion or duplication, the flanking duplications must be in direct orientation; if they lie in inverted orientation, the result of NAHR is inversion of the intervening sequence. Given the large size and high degree of sequence identity between flanking duplications, it is reasonable to hypothesize that recombination events might take place anywhere within the flanking duplications. However, detailed studies in some disorders have revealed recombination hotspots ranging from 500 bp to 12 kb within the larger segmental duplication blocks.5864 Therefore, although the large blocks of duplicated sequence may facilitate misalignment at meiosis, there are likely additional local sequence requirements for NAHR to occur.

Using the criteria described earlier (segmental duplications >10 kb, >95% identical, and separated by 50 kb–10 Mb), Bailey et al.56 mined the sequenced human genome (NCBI Build 34) and identified 130 regions of the genome with this architecture that were predicted to be susceptible to NAHR. Several of these genomic “hotspot” regions were already known to be associated with disease but the majority was not. Using a bacterial artificial chromosome array targeted to the hotspots, they investigated copy number variation in controls22 as well as individuals with MR and/or congenital anomalies.29,39 Interestingly, in a study of 290 individuals with idiopathic MR and 360 controls, six hotspots were deleted or duplicated only in affected individuals and all have since been confirmed as pathogenic rearrangements. These include deletions of 17q21.31,26,29,30,64 15q13.3,51,52,6571 15q24,33,72,73 and 17q1239; and reciprocal deletions and duplications of 16p13.117476 and 1q21.1,20,68,69,7780 some of which are discussed in further detail later. Another structural feature that seems to contribute to the susceptibility of some regions to NAHR is polymorphic inversion. In Williams-Beurens syndrome, for example, 25% to 30% of parents who transmit a de novo deletion to their child have a 1.5-Mb inversion encompassing the commonly deleted region; the same inversion is present in only 6% of the general population.60,81 Similarly, mothers who transmit a de novo deletion causing Angelman syndrome are more likely to carry a polymorphic inversion of 15q11-q13 than those in the general population.82 Two more striking examples are found in Sotos syndrome63 and the 17q21.31 microdeletion syndrome.26,29,30,64,83 In each of these disorders, every parent studied to date in which a de novo microdeletion arises carries an inversion of the same region. Because there are balanced rearrangements, inversions are more difficult to detect than copy number variations. Sequence-based studies have been used to detect inversions, however, and it is interesting to note that polymorphic inversions are also present at several additional sites of recurrent rearrangement including 1q21.1, 3q29, 15q13.3, 15q24, and 17q12.8486 Further studies will be required to determine whether any (or all) of these inversions are predisposing factors for disease-causing rearrangements. Whether they increase risk of rearrangement, it is clear that inversions can be markers of vulnerable genomic regions.

Implications of a “genotype-first” approach

The use of genome-wide assays—or even technologies that are targeted but evaluate multiple genomic regions—has several important implications. In contrast to targeted FISH tests, which require clinical suspicion of a specific disorder, genome-wide studies can be (and often are) performed without a suspected diagnosis. In the clinical setting, this can be advantageous in that an individual may be diagnosed before he has developed all of the features of a given disorder, perhaps allowing earlier intervention as well as recurrence risk counseling for the family. In another case, a given individual may not have the classic features associated with a known disorder but the molecular diagnosis will clarify the clinical diagnosis. Importantly, the ability to perform a single test in a patient with nonspecific findings and end the “diagnostic odyssey” can be very beneficial to both families and physicians. Finally, it can be argued that such an approach, being available to the nonspecialist, can expedite diagnosis by primary care providers if such technologies are more widely—but carefully and appropriately—applied and interpreted.

Conversely, casting such a wide net inevitably leads to unexpected findings. These can include detecting an unexpected pathogenic change unrelated to the patient's phenotype (e.g., a 17p12 duplication predicting Charcot-Marie-Tooth disease in a patient being evaluated for congenital heart disease [CHD] and developmental delay). Another troubling finding is the identification of a deletion or duplication of “uncertain significance”—one that has not previously been associated with disease but has also not been reported as a being copy number variant. Such a finding can prolong the diagnostic odyssey because of evaluation of parents and other family members, which may or may not help clarify the significance of the aberration, or create new diagnostic dilemmas with attendant problems.

In both the research and clinical settings, the unbiased whole-genome approach has led to unprecedented discovery of novel genomic disorders. Since the introduction of array CGH and SNP microarrays circa 2005, 18 new genomic disorders involving 12 regions of the genome have been described, more than doubling the number of disorders described in the previous 20 years (see Refs. 8890 for more thorough discussion of these). New syndromes identified by this genotype-first approach are often defined by a genomic location and rearrangement type (e.g., 15q13.3 microdeletion syndrome), and clinical features are compared among patients after a common rearrangement is identified. Because more diverse phenotypes are being evaluated with similar platforms, it is now clear that for some deletions and duplications, the associated phenotypes are so diverse that a phenotype-first approach to identify affected individuals would never be successful. In addition to phenotypic diversity, there is also incomplete penetrance associated with many of the recently identified genomic disorders. As described in the next section, these features complicate the process of proving pathogenicity, not to mention genetic counseling and clinical care.

Determining pathogenicity

The criteria for determining whether a given deletion or duplication is pathogenic have traditionally been the following: (i) the event is observed in affected individuals; (ii) the same event is not found in unaffected parents or control populations; and (iii) individuals who have the same event have the same or similar phenotypic features. Examples of microdeletion syndromes that generally follow these “rules” include Smith-Magenis syndrome, Prader-Willi and Angelman syndromes, and Williams syndrome. Many of the recently identified microdeletion or duplication “syndromes,” however, tend to bend these rules. Deletions and duplications can no longer be dismissed because they are inherited from an apparently unaffected parent.87 One must consider other factors including size, gene content, overlap with known benign copy number changes, and frequency of the same event in controls, although none of these are necessary or sufficient to unambiguously determine pathogenicity. For the novel genomic disorders discussed in the next section, deletions or duplications can be inherited or de novo; most exhibit incomplete penetrance; and affected individuals may have any one or more of a wide range of neuropsychiatric conditions. As a result, new rules are in play for determining the pathogenic significance of any given event. In most cases, statistical arguments have been used to determine whether a given event is pathogenic. That is, the deletion or duplication is deemed pathogenic if it is clearly enriched in affected individuals compared with unaffected individuals. This task is easier for recurrent rearrangements because of the simple fact that they recur; as a result, far fewer cases and controls need to be evaluated to identify multiple individuals with the same event.

There are several excellent reviews describing new genomic disorders identified during the past few years,8891 some of which cause clinically recognizable syndromes. Later, I will review the findings associated with four recently described genomic disorders that exhibit extreme phenotypic diversity and elude syndromic classification. These genomic disorders cannot be recognized based on shared clinical features, but the rearrangements (and genes within) underlying each disorder clearly play an important role in development.

Recurrent rearrangements of 15q13.3

Microdeletions of 15q13.3 were first described in a series of six probands and three affected family members with intellectual disability and minor dysmorphic features,29,67 and the deletion was clearly enriched in affected individuals compared with controls. Seven of the nine affected individuals also had seizures or abnormal electroencephalography findings. Since that first report, there have been several additional series of patients with MR (with and without seizures) or autism.65,66,70,71 The severity of MR phenotypes in all of these studies ranges from mild to severe; the frequency of seizures and of autistic features differs among series; and there are no consistent dysmorphic features (if any) among affected individuals. Although some of the deletions reported in these studies were de novo, in many cases the deletion is inherited, suggesting incomplete penetrance in addition to variable expressivity.

Studies that have focused on patients with neuropsychiatric conditions other than MR or autism reveal that the phenotypes associated with deletions of 15q13.3 are even more variable. Two large studies independently reported enrichment of the 15q13.3 deletion in patients with schizophrenia compared with controls (0.2% vs. 0.02% combined).68,69 Investigating yet another phenotype, Helbig et al.51 followed up on the finding that 7 of 9 patients reported by Sharp et al.67 suffered from seizures. In their study, 12 of 1223 (1%) patients with idiopathic generalized epilepsy (IGE) were found to carry deletions of 15q13.3 compared with 0 of 3699 controls. Unlike patients from previous studies, the majority of individuals in the IGE series had normal intellect. Two studies have confirmed the association with epilepsy50,52 in independent IGE cohorts, and one study estimated an odds ratio of 68 for carriers of the deletion to develop epilepsy.52 The frequency of the deletion in epilepsy probands (1%) is notably higher than in patients who present with MR or autism (0.3%)65,67 or schizophrenia (0.2%).68,69 Remarkably, the 15q13.3 deletion is now the most prevalent known genetic risk factor for epilepsy.

Candidate genes

There are seven genes within the commonly deleted region on 15q13.3, one of which is CHRNA7, encoding a subunit of the nicotinic acetylcholine receptor. Through linkage studies, CHRNA7 has been identified as a candidate gene for both epilepsy92,93 and schizophrenia.94 In addition, mutations in several other acetylcholine receptor subunit genes have been shown to cause epilepsy. Although emphasis has been placed on CHRNA7, one of more of the other genes in the region is likely to play a role as well.

Reciprocal rearrangements of 16p13.11

A similar story is emerging for rearrangements of the 16p13.11 region. Deletions of 16p13.11 have now been reported in patients with a wide range of neuropsychiatric conditions. In studies of patients presenting with MR or autism,7476 affected individuals with deletions had a range of phenotypes that include mild to severe MR, autistic features, brain anomalies, and epilepsy. Deletions of 16p13.11 may also be a risk factor for schizophrenia. Need et al.49 identified deletions of this region in 3 of 1013 probands with schizophrenia and 0 of 1084 controls. Although not statistically significant, this finding is intriguing and evaluation of additional cohorts will be required to confirm this finding. Like deletions of 15q13.3, deletions of 16p13.11 are also significantly enriched in probands with IGE50 and may be as important as 15q13.3 deletions in determining genetic risk for IGE. Although occasionally inherited from mildly affected or unaffected parents, the deletion is often de novo, is rarely reported in control cohorts,68,74,95 is significantly enriched in affected individuals, and appears to be highly (though not fully) penetrant.

The reciprocal duplication on 16p13.11 is also a risk factor for neurocognitive disease, albeit with reduced penetrance. Ullmann et al.76 first reported four male patients with severe autistic features from three families who all carried duplications of this region. In this study, several mildly affected or unaffected family members had the same duplication and the frequency of the duplication in controls was not reported. A subsequent study found duplications of 16p13.11 in both affected and unaffected individuals, suggesting that the duplication was either incompletely penetrant or benign variant.74 In yet another study of patients with unexplained MR, we found the duplication in 1% of affected individuals, representing a significant enrichment compared with controls.75 Interestingly, duplications of 16p13.11 have also been reported in patients with schizophrenia.48,68 Although not statistically significant, the data suggest that duplications of this region may be a risk factor for schizophrenia, too. Taken together, these data suggest that duplications of 16p13.11 are likely to be pathogenic and contribute to the genetic etiology of neuropsychiatric illness but exhibit reduced penetrance.

Candidate genes

The 16p13.11 deletion region contains 15 RefSeq genes, at least two of which are plausible candidate genes for the some of the phenotypes associated with imbalances of this region. The NDE1 gene is predominantly brain expressed and interacts with both LIS1,96 mutations in which cause lissencephaly, and DISC1, an important gene in schizophrenia.97,98 NTAN1 is also brain expressed, and mice lacking the gene exhibit abnormal social behavior and learning phenotypes.99

Recurrent reciprocal rearrangements of 1q21.1

Deletions of 1q21.1 are also associated with a broad range of phenotypes. Early and relatively small studies reported deletions of this region in a few patients with disparate phenotypes including CHD,78 unexplained MR,29 schizophrenia,47 autism,34 and mullerian aplasia.100 Much larger studies in patients with various phenotypes have confirmed that all of these initial associations are likely valid. Two large studies of probands with schizophrenia independently reported a significant enrichment of 1q21.1 deletions in affected individuals (0.26% of affected, combined).68,69 In a large study of 5000 patients ascertained for MR, autism, or congenital anomalies and 5000 controls, we confirmed that deletions of 1q21.1 are indeed pathogenic and result in a range of phenotypes,80 a finding that was replicated in a second large study77 with similar patient ascertainment. Cognitive phenotypes of affected individuals in these studies range from normal intellect to severe MR; dysmorphic features are reported in some but not in all patients; and other anomalies in some patients include CHD, microcephaly, cataracts, and seizures. Deletions have also been reported in patients with isolated IGE without cognitive delays50; this finding was not a significant enrichment compared with controls and further studies of copy number variation in epilepsy will be required to confirm an association. Reciprocal duplications of the same region on 1q21.1 have also been reported in patients with MR and developmental delay,77,80 autism,34,80 and isolated CHD.77,79 In a recent study of 512 patients with isolated tetralogy of Fallot, one patient carried a 1q21.1 deletion, consistent with previous reports of CHD with 1q21.1 deletions.78 Interestingly, four additional patients in this study had the reciprocal duplication,79 which represented a statistically significant enrichment compared with controls and a higher frequency of 1q21.1 duplications than that reported in MR or autism. In most of the studies cited earlier, both de novo and inherited rearrangements of 1q21.1 have been reported, again suggesting incomplete penetrance.

Candidate genes

The 1q21.1 critical region contains eight RefSeq genes with several good candidates for some of the phenotypes reported to date. GJA8 has previously been implicated in schizophrenia101 and cataracts102,103 and may very well influence these phenotypes. GJA5 is highly expressed in the heart, and null mice have a high incidence of cardiac malformations.104 Although these genes are plausible candidates, given the very diverse phenotypes reported so far, other genes in the region and elsewhere likely play a role.

Rearrangements of 16p11.2

The genetics of autism is clearly complex, and large efforts have been made to identify genetic risk factors contributing to this condition. An exciting advance came with the discovery by several groups that recurrent microdeletion of 16p11.2 is present in 0.5% to 1% of affected individuals.3234,36 Early studies hinted at phenotypic diversity. The deletion was also present in individuals with language or psychiatric disorders, although at a lower frequency (0.1%), as well as in the general population (0.01%).36 Subsequent studies have confirmed that the deletion is not specific to autism but is also enriched in patients with MR, developmental delay, or schizophrenia without autistic features.75,95,105 Reciprocal duplications of 16p11.2 have also been reported, but the pathogenic significance is less clear. Duplications are often inherited from an unaffected parent and are found at a higher frequency in the population than deletions (0.03%). Even larger studies may be required to determine whether the duplication is a low penetrance risk factor or a benign variant.

Candidate genes

The 500-kb critical region on 16p11.2 is gene rich, with >25 genes. Pathway analysis has shown that nearly half of the genes map to a single genetic network involving cell–cell interactions.32 There has been one follow-up study to look for association or mutation of individual genes within the region in patients with autism.106 This study failed to find strong association, but additional studies are sure to follow.

What are the modifiers and how do we find them?

These studies all suggest a common genetic and developmental etiology for MR, autism, schizophrenia, and epilepsy. Given the extreme variability in phenotypes among individuals with the same rearrangements, there must be other factors—so-called modifiers—that contribute to phenotypic outcome. Several possibilities are discussed briefly later. One explanation for variable phenotypes could be that the deletions (or duplications) are not really identical and that differences in size reflect differences in phenotype. We have performed very high-density oligonucleotide array CGH to evaluate the breakpoints of 1q21.1 and 15q13.11 deletions in patients with different phenotypes and cannot detect appreciable differences in the breakpoints.51,80,88 Given the complexity of the segmental duplications in many of the breakpoint regions, it is also possible that there are sequences not represented in the current assembly that contribute to phenotypic diversity but have not yet been evaluated. Genetic modifiers could include differences in the sequence of one or more genes on the nondeleted allele (in the case of deletions). For example, sequence variants in the COMT gene within the 22q11 deletion region have been shown to influence cognitive and psychiatric phenotypes in individuals with 22q11 deletions.107,108 Comprehensive studies of sequence variation on the nondeleted allele have not been published for any of the disorders described earlier. Of course, sequence variation influencing phenotypic outcome need not be restricted to the region of genomic rearrangement. SNPs or mutations in genes that interact with deleted or duplicated genes—or even in the same developmental pathway—could potentially influence phenotype. With improvements in sequence capture and sequencing technologies,109112 both targeted (e.g., the nondeleted allele or all genome-wide exonic sequence) and whole-genome studies of sequence variants in affected individuals will soon be feasible. Other copy number variants could also contribute to phenotype diversity. Because most copy number studies are now carried out on whole-genome platforms, analysis of copy number changes that are common among affected individuals should be possible. Finally, epigenetic changes may play a role. An extreme but well-known example of this is imprinting in Prader-Willi (paternally derived deletions) and Angelman (maternally derived deletions) syndromes, which accounts for very different outcomes associated with the same deletion. In the genomic disorders discussed here, there has been no clear difference in phenotype associated with parent of origin, whether de novo or inherited. However, as with sequence variation, no large studies have been performed to evaluate epigenetic changes at the site of rearrangement of genome wide. Finally, it must be remembered that nongenetic and nonepigenetic factors such as the environment may be critically important in influencing the penetrance of an underlying genomic rearrangement. Clearly, there are many studies that can and should be performed to answer the question of why there is such a range of phenotypes associated with each of the rearrangements described earlier. One of the challenges will be to collect a large cohort of well-phenotyped patients for each deletion and duplication to detect significant differences among subphenotypes. This will require large collaborative studies with standardized phenotyping evaluations. These studies are essential and will undoubtedly increase our understanding of genes and pathways critical for normal development. Furthermore, understanding the modifying factors will allow better recurrence risk counseling and hopefully facilitate interventions to prevent or ameliorate the adverse outcomes associated with the deletions and duplications described.

Implications for medical genetics and conclusions

With the introduction of whole-genome technologies to identify copy number changes, the rate of discovery of pathogenic rearrangements, including novel recurrent genomic disorders, has rapidly increased. In many ways, discovery of rearrangements in diverse cohorts of patients has outpaced our ability to interpret those changes in the clinical setting. Although statistical arguments have been helpful at the level of case-control studies for determining pathogenic significance, they are not as comforting in the setting of trying to counsel an individual family. Although we have learned a great deal in the past few years, careful genetic counseling of patients and families will remain critical as this emerging field has a greater and greater impact on medical care. The translation of such information in the clinic will remain difficult until we have a much better understanding of the other factors—genetic, epigenetic, or environmental—that influence phenotypic outcomes.