INTRODUCTION

Small supernumerary marker chromosomes (sSMCs) are, by definition, additional centric chromosome segments that are too small to be unambiguously characterized by conventional chromosome banding. Although sSMCs are found in approximately 0.043% of newborn children, sSMCs are 10 times more common in individuals with mental retardation (0.426%) and 4 times more common in the subfertile population (0.165%).1 Because most sSMCs are found in only a small percentage of cells, detecting and characterizing sSMCs in a diagnostic setting is problematic without screening large numbers of cells and using molecular cytogenetic techniques.

The chromosomal origins of some sSMCs have been identified and associated with known syndromes, such as isochromosome 12p [i(12p)] and Pallister-Killian syndrome (OMIM #601803), isochromosome 18p [i(18p)] syndrome,2 supernumerary-derivative chromosome 22 [der(22)t(11;22)(q23;q11.2)] syndrome,3 and inverted duplication 22 [inv dup(22q)] and cat eye syndrome (OMIM #115470). However, most sSMCs (approximately 30–60%) have yet to be accurately characterized1 because of variations in euchromatic DNA content, different degrees of mosaicism, uniparental disomy of the chromosomes homologous to the sSMC, and technical limitations of fluorescence in situ hybridization (FISH) and G-banding that do not allow for accurate detection of sSMCs at high resolution.4 This has resulted in a lack of genotype/phenotype correlation for most sSMCs.

A wide variety of molecular cytogenetic techniques are now available to characterize sSMCs, including locus-specific FISH, whole-chromosome painting (WCP), microdissection coupled with reverse painting, and centromere-specific and subcentromere-specific multicolor FISH (cenM-FISH and subcenM-FISH, respectively).4,5 However, most of these techniques are extremely labor intensive and are not practical without prior knowledge or clinical suspicion of an sSMC.

Recently, microarray-based comparative genomic hybridization (array CGH) has emerged as a rapid and highly sensitive technique for the characterization of copy number imbalances throughout the genome at high resolution.6 The application of array CGH technology to clinical diagnostics alleviates some of the problems associated with conventional cytogenetics in that it uses genomic DNA extracted from uncultured peripheral blood and can detect low-level mosaicism, although detecting mosaicism at levels <20% can be problematic.12 Even though some targeted arrays contain limited coverage of the pericentromeric regions,6,7 little effort has been put forth to develop an array that interrogates all unique pericentromeric regions at high resolution. We herein report the construction and validation of a microarray containing 974 BAC clones that cover approximately 5 Mb of the most proximal unique sequence on all 43 unique human pericentromeric regions (excluding the acrocentric short arms). The utility of BAC-based array CGH in the clinical diagnosis of chromosomal alterations of the human pericentromeric regions is illustrated by the identification and characterization of sSMCs in 15 cases.

MATERIALS AND METHODS

BAC clone identification

The University of California Santa Cruz (UCSC) Genome Browser (July 2003 draft) was used (http://genome.ucsc.edu/cgi-bin/hgGateway) to identify the most proximal unique sequence BAC clones adjacent to the centromere on all 43 unique pericentromeric regions of the human genome (excluding the acrocentric short arms). Clones were purchased from Invitrogen, Inc. (Carlsbad, CA) and used as probes in systematic FISH experiments to determine the most proximal unique sequence BAC clone that hybridized specifically to each unique pericentromeric region. Once identified, these clones served as anchors for the construction of contigs of BAC clones spanning approximately 5 Mb distal to our anchors. Because BAC clones that hybridize to multiple locations in the genome are of limited utility in diagnostic array CGH, clones containing sequences >1 kb in length with >90% identity to another region of the human genome were identified using UCSC's BLAT alignment program (http://genome.ucsc.edu/cgi-bin/hgBlat) and were excluded from the 5-Mb pericentromeric coverage.

FISH analysis and clone confirmation

Systematic FISH analysis using each BAC clone as a probe was performed to confirm the chromosomal location of each BAC and to verify its hybridization specificity under standard, uniform conditions as previously described.6 In all of these confirmatory FISH experiments, two BAC clones from each chromosome were hybridized to metaphase spreads derived from peripheral blood cultures from a single chromosomally normal male. Short-arm probes labeled with digoxigenin-dUTP (Roche Diagnostic, Indianapolis, IN) and long-arm probes labeled with biotin-dUTP (Roche) were cohybridized to the same metaphase spread. In the case of acrocentric chromosomes, a separate long-arm control probe was used to verify that hybridization was occurring on the correct chromosome.

Microarray construction

The pericentromeric microarray was constructed according to methods described in Bejjani et al.6 with some minor modifications. Briefly, 10 μg of each BAC DNA was filter-purified using Millipore Montage PCR96 Filter Plates (Millipore, Billerica, MA) on a Biomek FX liquid handler (Beckman Coulter, Fullerton, CA) and hydrated with sterile water to a concentration of 1.25 μg/μl. Purified BAC DNAs were then sonicated using a Misonix 3000S sonicator (Misonix, Inc., Farmingdale, NY) to generate fragments of DNA between 0.5 and 20 kb. Before printing, sonicated DNAs were diluted to a final concentration of 0.625 ng/μl with DMSO containing nitrocellulose as previously described.8 To prevent naturally contiguous clones from being printed next to each other on the microarray, each prepared clone was selectively placed in a 384-well microtiter plate. Each clone was printed in four separate quadrants on the microarray, and the entire microarray was duplicated on the top and bottom of the same slide using an Omnigrid 300 Microarrayer (Genomic Solutions, Ann Arbor, MI). Microarrays were printed on low-autofluorescence glass slides (Schott, Elmsford, NY) coated with aminosilane (Sigma-Aldrich, Sheboygan Falls, WI). Printed slides were baked at 80°C from 4 hours to overnight and stored protected from light in a desiccator cabinet at room temperature.

Cases and controls

The 20 cases included in this study were originally referred by physicians from across the United States and abroad for testing with the SignatureChip® targeted microarray at Signature Genomic Laboratories, LLC (Spokane, WA). The most common clinical presentations of the cases referred for testing were mental retardation, developmental delay, or multiple congenital anomalies. Thirteen of the 20 cases had previously identified sSMCs and the referring physician had requested further characterization of the marker by array CGH. Seven cases had a marker chromosome(s) identified after targeted microarray analysis, which includes a minimum of 3–6 overlapping BAC clones at the most proximal end of the pericentromeric region for each chromosome arm (excluding the short arms of the acrocentric chromosomes). All cases were further analyzed using the high-density pericentromeric array. DNA from a chromosomally normal male and a chromosomally normal female were used as a reference control for all array CGH hybridizations.9

Microarray hybridization

Blocking of the microarrays was accomplished using bovine serum albumin fraction V (Sigma, St. Louis, MO) and salmon sperm DNA (Invitrogen, Carlsbad, CA) as previously described.6 Genomic DNA isolation from whole blood of cases and reference controls was performed using a Puregene DNA isolation kit (Gentra Systems, Minneapolis, MN). Genomic DNAs were subsequently sonicated to produce 0.5- to 4.0-kb fragments. We performed dye-swap experiments in which 500 ng of the test and reference DNAs were labeled with Cyanine 3 (Cy3) and Cyanine 5 (Cy5), respectively, and cohybridized to the microarray on the top half of one slide. The test and reference DNAs were then oppositely labeled and cohybridized to the microarray on the bottom half of the same slide. Genomic DNAs were labeled, hybridized, and washed as previously described.6

Microarray analysis

Microarrays were scanned using a GenePix Autoloader or 4000B dual-laser scanner (Axon Instruments, Union City, CA). All features on the array were analyzed using GenePix Pro 6.0 and Acuity 4.0 imaging and analysis software (Axon Instruments) as previously described.6 Briefly, the average ratio (Cy5:Cy3) of fluorescent intensity obtained from cohybridized test and reference DNA at each of the four features for each clone was calculated and normalized using ratios obtained from reference features on the same slide. The average ratios of the four features for each case were converted to a log2 scale and plotted in Microsoft Excel. The results of the dye-swap experiments were plotted together on the same plot. The theoretical log2 conversions for ratios (case/control) of 1/2, 2/2, 3/2, 4/2, and 6/2 are approximately −1, 0, 0.58, 1, and 1.58, respectively. In practice, the actual values never reach their theoretical limits. For single-copy losses (1/2) and single-copy gains (3/2), we used thresholds of approximately −0.3 and 0.3. For two-copy gains (4/2) and four-copy gains (6/2), we typically observed values of approximately 0.6 and 0.9, respectively.

RESULTS

Construction of a high-density pericentromeric microarray

Of 1839 BAC clones analyzed by BLAT for the presence of sequences duplicated in another region of the genome, 1386 (approximately 75%) were considered sufficiently unique to proceed with FISH confirmation of chromosomal location. FISH analysis identified 30 BAC clones that did not map to the correct chromosome or chromosomal location as designated by the UCSC genome browser. In addition, 306 BAC clones cross-hybridized to locations other than the primary chromosomal location as designated by the UCSC genome browser. Another 27 clones did not hybridize or showed poor hybridization signals under uniform FISH conditions. Thus, from a total of 1386 BAC clones that were evaluated, 1023 (73.8%) were deemed adequate for use on the microarray. Of this number, 974 clones were selected to comprise three-clone contigs spaced approximately 0.5 Mb apart for the final version of the array. Table 1 and Figure 1 summarize the location and coverage of this pericentromeric BAC clone set.

Table 1 Genomic coverage of the pericentromeric microarray with respect to the centromere for each chromosome arm based on the UCSC May 2004 draft of the human genome
Fig. 1
figure 1

Euchromatic DNA coverage of the pericentromeric clone set for each unique pericentromeric region. The pericentromeric regions are listed in the left column. Each column represents a 0.5-Mb interval of euchromatic DNA. For example, the 0 Mb interval encompasses 0–0.5 Mb of euchromatic DNA, the 0.5 Mb interval encompasses 0.5–1.0 Mb of euchromatic DNA, and so forth. Light gray boxes indicate that no unique sequence clones were identified in this interval; therefore, this region is not represented on the microarray. Dark gray boxes indicate intervals in which unique sequence clones were identified and are therefore represented on the microarray.

Identification and characterization of sSMCs by array CGH

Once the high-density pericentromeric microarray was constructed, genomic DNAs from 20 cases known to have sSMCs were screened by array CGH to characterize their chromosomal content more fully. For 15 cases, pericentromeric array CGH not only identified the chromosomal origin of the sSMC(s), but also distinguished between the involvement of the short arm and/or the long arm of each chromosome and uncovered complex rearrangements or multiple sSMCs in single individuals. In these 15 cases, 18 sSMCs were identified, 16 of which were unique. However, for five cases in which sSMCs of unknown origin were previously identified by conventional chromosome banding, sSMCs could not be detected by this assay. This may be the result of very low-level mosaicism (<20%). Indeed, for three of the five cases in which sSMCs were not detected by array CGH, previous chromosome analyses had identified marker chromosomes in only 9 of 50 (18%), 2 of 20 (10%), and 2 of 50 (4%) cells. However, for one of the five cases, previous chromosome analysis had identified a mosaic marker in 19 of 31 (61%) cells. Our results suggest that this marker may be very small and may not contain detectable euchromatin. For the remaining case, there was no information on the level of mosaicism for the marker. Table 2 summarizes the analysis of the 15 cases in which sSMCs could be identified and characterized by array CGH and indicates the chromosomal origin, percent mosaicism, approximate euchromatic content, parental origin (if known), and any previous cytogenetic analyses known to us at the time of this study. The coverage of this pericentromeric microarray was sufficient to define the euchromatic content of 12 of 18 (67%) of the sSMCs included in this study, but for 6 of 18 (33%) sSMCs characterized in this study, the euchromatic content extended beyond the approximately 5-Mb coverage of this microarray.

Table 2 Summary of 15 cases in which small supernumerary marker chromosomes were identified and characterized by array CGH using a high-density pericentromeric microarray

Of the 18 sSMCs characterized by array CGH, 4 were derived from chromosome 22. For three of these cases (12, 14, and 15), a mosaic sSMC of unknown chromosomal origin had been previously identified by chromosome banding. In two cases (12 and 14), extensive FISH studies were previously unable to confirm the chromosomal origin but suggested that these sSMCs were dicentric derivatives of either chromosome 14 or 22, although FISH with a 22q11.2 TUPLE1 probe was normal (Table 2). Array CGH analysis of cases 12, 14, and 15 all showed patterns consistent with tetrasomy of the 22q11.1-q11.21 region (approximately 2–3 Mb) including the cat eye syndrome critical region, whereas array CGH analysis of case 13 identified what seemed to be only a single-copy gain of all pericentromeric clones on 22q11.1-q11.23 (>7.5 Mb) including the cat eye syndrome and DiGeorge syndrome critical regions (Fig. 2 A–C). Confirmatory FISH analyses of all four cases with sSMCs derived from chromosome 22q identified two copies of the 22q pericentromeric region on each marker consistent with tetrasomy 22q (Fig. 2 D and E). In addition, FISH with a chromosome 22 centromere probe suggests that the sSMC in case 13 is monocentric, whereas the sSMCs in cases 12, 14, and 15 are dicentric (Fig. 2 D and E, insets). Confirmatory FISH analysis of case 13 identified an SMC in only 57% of cells consistent with the array CGH observation of log2 ratios for all clones being just under 0.3 (rather than approximately 0.6).

Fig. 2
figure 2

Pericentromeric array CGH plots and fluorescence in situ hybridization (FISH) images for small supernumerary marker chromosomes (sSMCs) derived from chromosome 22. Array CGH data for the 22q pericentromeric region clones represented on the microarray are displayed with the most proximal clone on the left and the most distal clone on the right. The blue line is a plot of the data from the first array CGH experiment from the top half of the slide (reference Cy5/patient Cy3). The pink line is a plot of the data from the second array CGH experiment from the bottom half of the same slide in which the dyes have been reversed (patient Cy5/reference Cy3). A. A normal array CGH plot for the chromosome 22q pericentromeric region. B. Case 13 showing a gain of all clones across approximately 7.57 Mb of 22q pericentromeric DNA (22q11.1-q11.23) including the cat eye and DiGeorge syndrome critical regions. The log2 ratio for all clones is just under 0.3 (rather than approximately 0.6) because the marker is found in only 57% of cells. C. Case 15 (the plots for cases 12 and 14 were essentially identical) showing a gain of approximately 2.2 Mb across the proximal pericentromeric region (22q11.1-q11.21) including the cat eye syndrome critical region but not the DiGeorge syndrome critical region, which is located approximately 3.7 Mb from the centromere. The log2 ratio for the six most proximal clones is just over 0.6 indicative of tetrasomy for the region. This marker 22q was found in 100% of cells examined in this patient. D. FISH image of an sSMC 22q in case 13 (arrow). Clone RP11-1037C4 (cat eye region clone) is labeled in red with a 22q telomere clone (RP11-676E13) labeled in green as a control. The two red signals on the marker suggest that the marker contains two copies of the cat eye region. This is consistent with tetrasomy 22q in this patient. FISH using the same RP11-1037C4 probe labeled in red and the Cytocell 14/22 centromere probe labeled in green identified only one green centromeric signal on the marker (inset) consistent with a monocentric marker 22q. E. FISH image of an sSMC 22q identified in case 15 (arrow). Clone RP11-1037C4 (cat eye region clone) is labeled in red with a 22q telomere clone (RP11-676E13) labeled in green as a control. FISH using the Cytocell 14/22 centromere probe as described in E identified a dicentric marker 22q (inset).

Four of the 18 sSMCs characterized by array CGH were identified as derivatives of chromosome 15. Cases 9 and 10 showed copy number gains consistent with tetrasomy 15q (Fig. 3, A–C), whereas case 8 showed an even higher copy number gain (Fig. 3D). Furthermore, array CGH with this high-density pericentromeric array indicated that three sSMCs contained different sizes of euchromatic DNA from 15q ranging from approximately 7 Mb to >9.5 Mb, all of which contained the Prader-Willi/Angelman critical region genes of SNRPN and UBE3A. Confirmatory FISH experiments identified structurally distinct sSMCs in all three cases: a single monocentric 15q sSMC in case 10 (Fig. 3E); a single dicentric 15q sSMC in case 9 (Fig. 3F); and two monocentric 15q sSMCs in case 8 (Fig. 3G).

Fig. 3
figure 3

Pericentromeric array CGH plots and fluorescence in situ hybridization (FISH) images for small supernumerary marker chromosomes (sSMCs) derived from chromosome 15. Array CGH data for the 15q pericentromeric region clones represented on the microarray are displayed with the most proximal clone on the left and the most distal clone on the right. The pink and blue line plots are the same as described for Figure 2. A. A normal array CGH plot for the chromosome 15q pericentromeric region. B. Case 10 showing a gain of approximately 7.26 Mb of 15q pericentromeric DNA (15q11.2-q12) including the Prader-Willi/Angelman (PWA) syndrome loci SNRPN and UBE3A. The log2 ratio for all abnormal clones is approximately 0.6, indicative of tetrasomy across the region. C. Case 9 showing a copy number gain across the entire 9.67-Mb coverage of the 15q pericentromeric region (15q11.2-q13.1) including the PWA syndrome loci SNRPN and UBE3A. The log2 ratio for all clones is approximately 0.6, indicative of tetrasomy across the region. D. Case 8 showing a gain of 7.61 Mb of 15q pericentromeric DNA (15q11.2-q13.1) including the Prader-Willi/Angelman (PWA) syndrome loci SNRPN and UBE3A. The log2 ratio for most abnormal clones is approximately 0.9, indicative of a four-copy number gain across the region. E. FISH image of the marker 15q found in case 10 (arrow). Clone RP11-701H24 (SNRPN-containing clone) is labeled in red with a 15q telomere clone (RP11-14C10) labeled in green as a control. FISH using the same RP11-701H24 (SNRPN-containing clone) labeled in red and the Cytocell chromosome 15 centromere probe labeled in green identified a monocentric marker 15q (inset). F. FISH image showing the sSMC derived from 15q identified in case 9 (arrow). Probes are labeled as in E. FISH using the Cytocell chromosome 15 centromere probe as described in E identified a dicentric marker 15q (inset). G. FISH image showing the two sSMCs derived from 15q identified in case 8 (arrows). Clone RP11-125E1 is labeled in red with a 15q telomere clone (RP11-14C10) labeled in green as a control. FISH using the Cytocell chromosome 15 centromere probe as described in E identified two monocentric markers with a tandem duplication of the 15q pericentromeric region.

Another three sSMCs were found to be derivatives of chromosome 8. In case 3, array CGH identified an sSMC consisting only of euchromatin from the 8p pericentromeric region (>5.5 Mb) (Fig. 4, A and B), whereas case 5 had a mosaic sSMC that contained euchromatin from both 8p (>5.5 Mb) and 8q (>6.5 Mb) (Fig. 4C). Case 4 was more complex in that array CGH identified a two-copy gain of the more proximal 8p pericentromeric region (3.5 Mb) and a single-copy gain of at least 0.5 Mb of more distal pericentromeric DNA (Fig. 4D). FISH analysis confirmed the array CGH results for cases 3 and 5 (data not shown) and identified a single sSMC containing two copies of proximal 8p and one copy of the more distal 8p region in case 4 (Fig. 4, E and F).

Fig. 4
figure 4

Pericentromeric array CGH plots and fluorescence in situ hybridization (FISH) images for small supernumerary marker chromosomes (sSMCs) derived from chromosome 8. Array CGH data for the pericentromeric region clones of chromosome 8 represented on the microarray are displayed distal to proximal on the left side of the plot for the 8p12-p11.1 clones and proximal to distal for the 8q11.21-q11.23 clones on the right. The pink and blue line plots are the same as described for Figure 2A. A normal array CGH plot for the pericentromeric regions of chromosome 8. B. Case 3 showing a gain of >5.55 Mb of 8p pericentromeric DNA. C. Case 5 showing a gain of >5.55 Mb of 8p pericentromeric DNA and a gain of >6.49 Mb of 8q pericentromeric DNA. D. Case 4 showing a two-copy gain of the most proximal approximately 3.53 Mb of the 8p pericentromeric region (arrow) followed by a single-copy gain of approximately 0.5 Mb of the more distal 8p pericentromeric region (arrowhead). E. FISH image identifying an sSMC derived from chromosome 8 in case 4 (arrow). 8p pericentromeric clone (RP11-598P20) is labeled in red with an 8q telomere clone (RP11-1143I12) labeled in green as a control. F. FISH image identifying the complex nature of the sSMC derived from chromosome 8 in case 4 (arrow). 8p pericentromeric clone (RP11-598P20) from the region showing two-copy gain is labeled in red and 8p pericentromeric clone (RP11-359E19) from the region showing single-copy gain is labeled in green. Note the two red signals and only one green signal on the marker chromosome.

In three cases (1, 6, and 8), two marker chromosomes were identified by array CGH and FISH. Case 1 was found to carry two mosaic markers of chromosome 2q origin, and array CGH on case 6 identified one sSMC derived from chromosome 11q and one sSMC derived from 17p. Similarly, case 8 carried two isodicentric chromosomes 15q and was discussed above. In three additional cases, array CGH identified a nonmosaic sSMC derived from 14q (case 7), a 68% mosaic sSMC derived from chromosome 18 (case 11) containing euchromatin from both 18p and 18q, and a low-level mosaic marker chromosome 7 (case 2) containing euchromatin from both 7p and 7q that was found in only 30% of cells (Fig. 5).

Fig. 5
figure 5

Pericentromeric array CGH plot and fluorescence in situ hybridization (FISH) image for a small supernumerary marker chromosome (sSMC) derived from chromosome 7 (case 2). Array CGH data for the pericentromeric region clones of chromosome 7 represented on the microarray are displayed distal to proximal on the left side of the plot for the 7p12.2-p11.1 clones and proximal to distal for the 7q11.21-q11.22 clones on the right. The pink and blue line plots are the same as described for Figure 2. A. A normal array CGH plot for the pericentromeric regions of chromosome 7. B. Case 2 showing a gain of 0.49 Mb of 7p pericentromeric DNA and approximately 1.42 Mb of 7q pericentromeric DNA (7p11.1-q11.21). Note the log2 ratio for all abnormal clones is <0.3, indicative of a low-level mosaic copy number gain across the region. C. FISH image identifying an sSMC derived chromosome 7 in case 2 (arrow). 7q pericentromeric region clone RP11-587O11 is labeled in red with a 7p telomere clone (RP11-449P15) labeled in green as a control. In a separate FISH experiment, a 7p pericentromeric clone (RP11-1324A7) also hybridized to the sSMC (data not shown).

DISCUSSION

Development of a pericentromeric BAC clone set identifies unique sequence islands within highly duplicated genomic regions

We have identified a set of unique BAC clones spanning, on average, 5.3 Mb of the most proximal unique sequence in the pericentromeric regions of the genome (excluding the short arms of the acrocentric chromosomes). We have used this clone set to construct a high-density CGH microarray for the detection and characterization of sSMCs in clinical diagnostic specimens.

The construction of a human pericentromeric microarray presented some unique challenges because of the complex and repetitive nature of these regions. Before the sequencing of the human genome, the model for the organization of the pericentromeric regions was relatively simple with tandem repeats of higher-order alpha-satellite DNA forming large array structures nearest the centromere and blocks of alpha-satellite DNA lacking higher-order structure and other pericentromeric satellite DNA sequences mapping to the periphery.9 This model suggested a clear boundary between the pericentromeric region of alpha-satellite repeats and the euchromatic unique sequence region containing expressed genes.

Sequencing of the human genome has revealed a more complex model of human pericentromeric organization than originally appreciated. In addition to the alpha-satellite repeats present at the centromere, most human pericentromeric regions contain blocks of segmental duplication. In fact, interchromosomal duplications in the pericentromeric regions are six times more common than intrachromosomal duplications. This suggests that more than one third of all segmental duplications between chromosomes occur within the first 5 Mb of the centromere.10 Interestingly, these segmental duplications seem to be more prevalent in close proximity to the centromere where, within the first 500 kb, a clear gradient is observed. This gradient of increasing segmental duplications moving toward the centromere is accompanied by a decline in exon content and transcriptional diversity.10

Although the general model for pericentromeric region organization suggests that they are enriched for duplications, variations in the amount of duplicated material, and relative distinctiveness of the boundary between the euchromatic unique sequences allowed for human pericentromeric regions to be categorized into three broad groups.10 The first group consists of eight pericentromeric regions (4q11, 5p11, 6q11, 8p11, 16q11, 18q11, 19q11, and Xp11) that have a duplication content below the genome average (<5.2%) and show a relatively abrupt boundary between the unique and alpha-satellite DNA. The second group consists of 16 pericentromeric regions (1p11, 3p11, 3q11, 4p11, 5q11, 8q11, 11q11, 12p11, 12q11, 14q11, 17q11, 20q11, 19p11, 20p11, Xq11, and Yq11) that show an intermediate level of duplication between the genome and pericentromeric average (5.2–32.2%); whereas the third group consists of 19 pericentromeric regions (1q11-1q12, 2p11, 2q11, 6p11, 7p11, 7q11, 9p11, 9q11, 10p11, 10q11, 11p11, 13q11, 15q11, 16p11, 17p11, 18p11, 21q11, 22q11, and Yp11) that show extensive regions of duplication ranging from 500 kb to 5.5 Mb in length.

Although sophisticated analysis of the human genome sequence suggests a complex and variable structure within the pericentromeric regions of the genome, we set out to physically identify, characterize, and bring together a unique set of pericentromeric region probes suitable for use in diagnostic array CGH. The paucity of unique sequence clones in the pericentromeric regions of the genome is evident in that of the 1839 BAC clones that we analyzed, only 1386 (75%) were considered sufficiently unique to proceed with FISH confirmation of chromosomal location. Another 363 BAC clones were subsequently rejected because they cross-hybridized, did not map correctly, or hybridized poorly under standard uniform conditions. Thus, only 56% (1023 of 1839) of the pericentromeric clones originally identified from the July 2003 draft of the human genome were considered sufficiently unique to be included on a diagnostic microarray.

As expected, our sequence analysis of the pericentromeric regions identified islands of proximal unique sequence clones interspersed with duplicated sequences of various sizes (Fig. 1). However, the identification of the most proximal unique sequence clones was particularly challenging for 2p, 9p, 9q, and 16p. Because we did not identify more proximal unique sequence clones, our coverage for these pericentromeric regions begins 4.8, 7.3, 4.7, and 8.0 Mb from the centromeric heterochromatin, respectively (Table 1 and Fig. 1). This is consistent with the extensive zones of duplication found from sequence analysis of these pericentromeric regions.10 In contrast, our coverage for other pericentromeric regions, such as 19q, begins essentially adjacent to the centromeric heterochromatin (Table 1 and Fig. 1) as expected for this pericentromeric region, which has very few duplicated sequences.10 Although the actual distance between the centromeric heterochromatin and the most proximal unique sequence euchromatic clone in this pericentromeric clone set varied for each pericentromeric region, the average distance is only 1.6 Mb (Table 1). In addition, the average coverage of this probe set over each unique pericentromeric region is approximately 5.3 Mb (Table 1). Because the chromosomal location of each clone was confirmed by FISH, this pericentromeric clone set promises to be a powerful tool for the identification and characterization of sSMCs and other pericentromeric imbalances by array CGH.

Pericentromeric array CGH facilitates the detection, identification, and characterization of sSMCs of various chromosomal origins and sizes in clinical diagnostic specimens

Identification of the chromosomal origin of sSMCs by pericentromeric array CGH

Autosomal sSMCs can be broadly grouped into two distinct classes based on whether they are derived from either acrocentric or nonacrocentric chromosomes. Acrocentric sSMCs are the most commonly occurring sSMCs and account for approximately two-thirds of all autosomal sSMCs (68%), whereas nonacrocentric sSMCs make up the remaining one-third (32%).11 Of the acrocentric sSMCs, more than half (51%) are derived from chromosome 15, with SMC(13/21), SMC(14), and SMC(22) accounting for 18%, 17%, and 13% respectively.11 Aside from sSMCs derived from 12 and 18, which are associated with known syndromes, derivatives of 1, 8, and X are among the most commonly reported nonacrocentric sSMCs.1

In this study, sSMCs originating from acrocentric chromosomes were common (9/18) and were derived from chromosomes 14 (1/9), 15 (4/9) and 22 (4/9). In our limited sample, sSMCs derived from chromosome 8 were the most frequently observed nonacrocentric sSMCs (3/9). Surprisingly, no X-derived sSMCs were identified. This may represent a bias of ascertainment, as these sSMCs are often found in patients with a karyotype 45, X, +mar associated with a Turner syndrome-like phenotype and may therefore be readily detectable by FISH without being sent to a laboratory for array CGH. Interestingly, one sSMC identified by array CGH was derived from chromosome 11, which is one of the least commonly observed sSMCs.1

Although the frequencies of specific sSMCs in our small sample are, in general, consistent with previously published results, we predict that array CGH adopted in a routine clinical laboratory setting may identify rare sSMCs that have been previously underappreciated. We have recently reported the ability of array CGH to consistently detect and characterize low-level mosaic chromosome aberrations at levels of approximately 20%.12 This finding is again supported by the detection of an sSMC derived from chromosome 7 in 30% of cells in case 2 (Fig. 5).

Although the chromosomal origins of 18 sSMCs were successfully identified in 15 cases, 5 cases known to carry sSMCs based on cytogenetic analysis were undetectable by this microarray. These cases may represent markers with little or no euchromatic DNA. These cases may also represent low-level mosaic sSMCs that are below the level of mosaicism routinely detected by array CGH (20%).12 We did not perform karyotype analyses on these to reassess the level of mosaicism. Alternatively, they may represent neo-centromeric markers derived from nonpericentromeric DNA not represented on the array.

The value of array CGH in characterizing sSMCs is also apparent in that multiple markers of the same or different chromosomal origins can be identified in a single individual in a single assay (e.g., sSMCs of 11q and 17p in case 6 and two sSMCs of 2q in case 1). Thus, pericentromeric array CGH would be particularly useful in cases in which multiple markers of unknown origin are identified in a single individual by standard cytogenetic techniques.13–15

Characterization of euchromatic DNA content of sSMCs by pericentromeric array CGH

Many of the molecular cytogenetic assays currently used to identify the chromosomal origin and euchromatic content of sSMCs are limited to the detection of the presence or absence of one or a few probes in the centromeric or pericentromeric regions of the genome (e.g., centromere-specific single FISH probes, cenM-FISH, and subcenM-FISH). Other methods such as WCP or multicolor banding (MCB) can be more inclusive but are best suited for larger marker chromosomes and can still lead to ambiguous interpretation.16–18 The use of array CGH with a high-density pericentromeric array provides higher resolution than most techniques in that it is limited only by the size of the BACs on the array and the genomic distance between clones. Thus, copy number gains can be directly linked to their precise genomic location allowing for a more accurate interpretation of the euchromatic DNA content of sSMCs.

The utility of a pericentromeric clone set spanning approximately 5 Mb of euchromatic DNA is evident in that a more precise characterization of the euchromatic DNA content of 18 sSMCs was revealed (Table 2). In this study, we identified sSMCs from a variety of chromosomes with a wide range in the size of euchromatic DNA retained on the marker. Although gaps in clone coverage, primarily caused by the presence of large duplicated sequences within the pericentromeric regions, make precise sizing difficult, the amount of euchromatic DNA present on the sSMCs identified in this limited study ranged from approximately 0.5 to 9.5 Mb (Table 2). However, 33% of the sSMCs in this study contained more euchromatic DNA than that present on this pericentromeric microarray. Thus, more coverage, perhaps 10 Mb, may be even more valuable.

Implications for understanding the mechanism(s) of sSMC formation

Among the nonacrocentric markers, most (6 of 9) were derived from single chromosome arms. These results are consistent with ring chromosome formation from an interstitial deletion, in which one break occurs within the centromeric alpha-satellite repeats.1 We hypothesize that at least one of the chromosome 8 sSMCs identified in this study (case 4) may be a ring chromosome. Our results are consistent with a ring chromosome derived from a chromosome 8 that has undergone an asymmetric breakage of the p-arm sister chromatids and lost all or most of the q-arm. Fusion of asymmetric sister chromatids would generate a ring chromosome similar to the one we observed showing a proximal two-copy gain and a distal single-copy gain. However, the presence of a small amount of heterochromatin or other proximal sequences not represented on this microarray from both chromosome arms cannot be excluded without additional molecular studies.

Confirmatory FISH using centromeric probes on the acrocentric markers derived from chromosomes 15 and 22 identified at least three different structural rearrangements. The identification of monocentric sSMCs in cases 10 and 13 is consistent with ring chromosome formation as the result of an interstitial deletion with one break occurring within the centromeric alpha-satellite repeats. Unlike the asymmetric ring chromosome in case 4, these ring chromosomes seem to have been generated by symmetric breakage of sister chromatids. The sSMCs in cases 9, 12, 14, and 15 seem to be isodicentric chromosomes, whereas the sSMCs in case 8 are apparently monocentric markers with tandem duplications of 15q.

The three sSMCs identified in this study that contain euchromatin from both chromosome arms (an sSMC 7 in case 2, an sSMC 8 in case 5, and an sSMC 18 in case 11) may represent markers or ring chromosomes derived from different mechanisms.1 Although further molecular studies are needed to determine the molecular basis of these and other sSMCs, pericentromeric array CGH provides a rapid method for the initial characterization of sSMCs that should facilitate more detailed molecular studies to elucidate these mechanisms.

CONCLUSION

We developed a set of unique sequence BAC clones surrounding the centromeres of all human chromosomes (excluding the acrocentric short arms). We identified islands of unique sequence within the pericentromeric regions of the genome that can be assayed to identify and characterize sSMCs and other novel pericentromeric imbalances. We used this unique set of pericentromeric clones to construct a microarray and demonstrate the utility of pericentromeric array CGH in identifying the chromosomal origins and extent of euchromatin of sSMCs in diagnostic specimens. We anticipate that this diagnostic tool will enable better phenotype/genotype correlations for some sSMCs. This is especially important in the prenatal setting as this information may benefit genetic counseling in prenatally detected sSMCs.