Introduction

Copy number variations (CNVs) are defined as DNA changes >1 Kb that differ among individuals. They include duplications or deletions and can involve large DNA regions.1, 2, 3 CNVs have an important role in modulating the phenotypic spectrum both in single gene and multigenic diseases.3 They are not randomly distributed across the genome, and their presence correlates with chromosomal structural features (eg, inverted repeats, duplications, highly homologous regions and so on).2, 4, 5, 6, 7, 8

In the past few years, DNA microarrays have increased capabilities for the detection of both pathological and neutral polymorphic variations by analyzing the entire genome with a resolution depending on the number, length and type of probes.1, 9, 10, 11, 12, 13, 14 Genome tiling microarrays are now available, allowing DNA analysis at an intermediary resolution between cytogenetic techniques (≥5 Mb) and conventional DNA sequence analysis (1–800 bp).9

Mental retardation (MR) is a relatively frequent disorder affecting about 3% of the general population. MR etiological diagnoses are made usually in fewer than half of the affected individuals.15 Genetic causes account for 17.4–47.1% of cases, with reported frequencies also varying on the basis of the different techniques used for analysis.15, 16 Several years ago, microarray platforms were used for the first time in clinical genetics, showing that abnormalities in genomic copy number account for about 10–15% of patients affected by MR, especially when it is associated with multiple congenital anomalies (MCA) and/or dysmorphism.17 Microarrays represent a robust and high-resolution method of analysis. The recent increase in various commercially available microarray platforms encouraged their application in diagnostic settings, and it is expected to improve procedures for the establishment of genetically determined MR etiology.18 Among commercial platforms, SNP arrays offer the highest resolution using approximately two million probes. These arrays were originally designed for genome-wide genotyping, but they can also be used to detect CNV.19 This approach involves only one DNA sample to be hybridized on the chip. Software is then used to reveal copy number at each locus quantifying the fluorescent signal intensity and comparing it with HapMap controls (International HapMap Project, 2005; http://www.hapmap.org/). One limitation of these arrays is that probe selection is not homogeneous, as it reflects the distribution of SNPs in the genome, and SNPs are very sparse in genomic segments containing low-copy repeats and segmental duplications.9 SNP arrays frequently offer a better detection of CNVs in gene-poor regions.20 To overcome this restriction, the latest generation high-resolution SNP array platforms were generated by adding nonpolymorphic sequences to cover the SNP-poor region (Illumina, San Diego, CA, USA and Affymetrix, Santa Clara, CA, USA). For example, the Affymetrix 6.0 GeneChip consists of both 906 600 SNP sequences and about 900 000 nonpolymorphic oligonucleotides, with an overall spacing of 1 sequence per 700 bp along the entire genome.21 The theoretical resolution of a microarray depends on the length, number and genomic density of probes; however, many other parameters can determine the practical resolution, including the level of experiment noise and the sensitivity of copy number measurements.9, 22, 23 Thus, in oligonucleotide arrays as in SNP arrays, the reduced length of probes tends to generate a lower signal-to-noise ratio of hybridization, and analyses need to average data from several consecutive spotted sequences, thus diminishing the overall resolution.9

In this study, we analyzed 70 karyotyped patients affected by MR/MCA, who were previously analyzed with an oligonucleotide array with an average spacing of 32 kb. CNVs detected by SNP array were selected following the workflow as previously suggested,24 and they were verified systematically by other techniques (fluorescence in situ hybridization (FISH) and/or quantitative-PCR (qPCR)). The purpose of this study was to assess Affymetrix GeneChip 6.0 array sensitivity in CNV detection to establish the effective resolution of this platform and to explore the role of CNVs in the definition of as-yet-unexplained MR patients. In particular, we intended to evaluate the potential diagnostic usefulness of this technique coupling a more sensitive chip with strict clinical criteria.

Materials and methods

Patients

A total of 70 patients were evaluated. They were selected from the cohort of subjects referred to our diagnostic laboratory for CNVs analysis in the 2006–2007 2-year period. All individuals were karyotyped (>450 band resolution): 64 had a normal karyotype, 5 had apparently balanced structural rearrangements and 1 patient had a subtelomeric rearrangement detected by FISH analysis. Diagnostic screening was performed using an oligonucleotide-based array (44K – AMADID 014950; Agilent Technologies, Walldbron, Germany). A total of 19 patients were positive in this initial screening. This group of individuals (SET I) was used for validation purposes. The remaining 51 patients (SET II) were analyzed to assess whether the higher resolution of GeneChip 6.0 resulted in an effective higher sensitivity in detecting smaller pathogenic CNVs in these individuals. These two platforms were comparable in terms of the cost and timetable of a single experiment. All studies were approved by the Institutional Review Board committee and conducted with patients’ informed consent. All patients were affected by moderate-to-severe MR (QI<50), showed dysmorphic features and/or developmental delay and/or had at least one major severe congenital malformation and/or multiple mild congenital anomalies. A commercially available NucleoSpin Blood kit (Macherey-Nagel GmbH & Co, Düren, Germany) was used to extract genomic DNA from 2 ml of peripheral blood.

Controls

A data set of genotype and copy number calls on 270 control samples from the International HapMap Project was used (http://www.hapmap.org/). The SNP 6.0 copy number calls from these HapMap samples were provided by Affymetrix as a reference model file for use in Genotyping Console software (Affymetrix).

Affymetrix GeneChip mapping 6.0 microarray

The GeneChip 6.0 platform consists of about 906 600 SNP sequences and about 900 000 nonpolymorphic oligonucleotides, which cover the whole genome with an average spacing of 0.7 Kb. In two separate reactions, genomic DNA (250 ng) was digested with NspI or StyI (New England Biolabs, Ipswich, MA, USA), as recommended by the manufacturer (Affymetrix). After digestion, an adaptor was linked to the restricted fragments, the reaction was diluted 4 × and the fragments were amplified by PCR. After purification using Magnetic Beads (Agencourt Bioscience Corporation, Beverly, MA, USA), 90 μg of PCR products was fragmented and end labeled using 30 U/μl of terminal deoxynucleotidyl transferase, and then hybridized for 16–18 h to the Affymetrix 6.0 chip at 49°C in a GeneChip Hybridization Oven 640 (Affymetrix). The chips were washed, stained in GeneChip Fluidic Station 450 (Affymetrix) and scanned with Scanner 3000 7G (Affymetrix).

Genotype and copy number calls

Affymetrix CEL files were loaded into Genotyping Console (GTC) 3.0.1 for genotype and copy number analysis. The GTC Contrast Quality Control (QC) metric was used to filter out low-quality samples using the default threshold of 0.4. Copy number analysis, using the Affymetrix HapMap270 Reference Model File for comparison, was performed with the default settings in GTC, including regional GC correction. Detailed descriptions of the algorithms implemented by GTC copy number analysis are provided in Korn et al.25 Briefly, GTC applies the BRLMM-P Plus algorithm, which uses a one-dimensional Gaussian mixture model to establish the intensity of each genotype and copy number probe at discrete levels relative to the reference model file. A Hidden Markov Model is then applied for smoothing the copy number data on the basis of neighboring values in a region.

Copy number segment analysis

The GTC segment-reporting tool was supplied to identify regions in the genome with a default setting of at least 25 markers showing consensus for gain or loss spanning at least a 75-kb region. Data for regions meeting these criteria were exported for further analysis in Microsoft Excel. These results of the segment analysis were used to compare aberrant regions against known copy number polymorphisms and select copy variations for validation. The copy number aberrations were further filtered to select CNVs more likely to be pathogenic on the basis of the size and number of markers in the region. A CNV was scored when it encompassed >75 Kb and included at least 80 markers. Considering that the average spacing of the targets of this array is 0.7 Kb, one would expect roughly 107 SNP markers ((1 SNP/0.7 kb) × 75 kb=107 SNPs). The smallest CNVs were considered only when at least 75% of the CNV length was covered by markers.

Validation of CNVs

The database of Genomic Variants (http://projects.tcag.ca/variation/), UCSC Genome browser (http://www.genome.ucsc.edu/) and DECIPHER (http://www.sanger.ac.uk/PostGenomics/decipher), were used to identify known genomic variants, genes and MR-associated regions. Each CNV containing known genes and/or miRNA sequences not found in controls was attempted to be confirmed by other methods. FISH analysis was used to confirm deletions >150 Kb and duplications >400 Kb, whereas custom quantitative-PCR assay was used to confirm smaller CNVs.

FISH

To confirm new imbalances, appropriate BACs (bacterial artificial chromosomes) were selected from a genome-wide 32K BAC library (BACPAC Resources Center, Children's Hospital Oakland Research Institute, Oakland, CA, USA) (http://www.chori.org/bacpac). Probes were prepared and FISH was performed as previously described.26 An Eclipse 80i fluorescence microscope equipped with a computerized system (Genikon; Nikon, Florence, Italy) was used to analyze and acquire images. To confirm an imbalance, about 30 metaphase spreads and/or nuclei for each sample were counted.

qPCR

Smaller sized DNA copy number changes were confirmed using an AB 7000 Sequence Detection System (Applied Biosystem, Foster City, CA, USA) and DNA-binding dye SYBR Green (Invitrogen Corporation, Carlsbad, CA, USA) as described by Carbone et al.27

X-inactivation assay

The X-chromosome inactivation pattern in patient 29 was assessed by studying the methylation state of the androgen receptor gene.28 PCR products were separated on an ABI Prism 3100 DNA sequencer and analyzed using GeneScan version 3.5.1 (Applied Biosystems) and Genotyper software version 2.1.5 (Applied Biosystems).

Results

This study focused on testing for CNVs in patients with MR on a high-resolution SNP array platform consisting of about 1 800 000 probes including polymorphic and nonpolymorphic sequences. A total of 70 MR/MCA patients were hybridized on a GeneChip 6.0 platform and data were analyzed using Genotyping Console software (GTC v3.0.1). SNP array analysis detected the presence of CNVs in all patients with a highly variable number ranging from 10 to 313 per subject.

Using filters described in Materials and Methods, an average of 21 CNVs were detected in each sample (range 8–41), with a size range of 0.76–16.314 Mb. This number was reduced further, following the recommendations of Koolen et al.24 CNVs were not considered as pathogenic if the rate of overlap with known nonpathogenic polymorphic CNVs was >50–70% (81% of CNVs and 100% of CNVs smaller than 100 kb), or if the CNV did not involve known genes (4.4% of nonpolymorphic CNVs). CNVs with a probe density <2 kb were excluded as well. These rules were not considered for CNVs affecting regions known to be involved in microdeletion/microduplication syndromes and for X-linked CNVs occurring in male patients. All selected CNVs were confirmed using other techniques and, when possible, this test was expanded to parents to distinguish between de novo-generated and inherited CNVs. Clinical features, SNP array results, standard karyotypes and array-CGH data of patients are summarized in Table 1. The number of probes present in each affected chromosome region, their density, degree of overlap with known common CNV regions, confirmation technique and CNV origin are also listed.

Table 1 Clinical and molecular features of patients with positive SNP array analysis

SET I

This set included the 19 patients previously shown to be positive for copy number changes using oligonucleotide array analysis. SNP array analysis confirmed the presence of all CNVs previously detected: 14 deletions ranging in size from 0.257 to 6.494 Mb and 9 duplications of 0.583–7.930 Mb. In total, seven patients (1, 4, 7, 8, 13, 16 and 18) displayed a pathogenic deletion, including the subtelomeric deletion of 1p, 2q and 9q. A total of five patients (2, 9, 10, 12, 15) had a duplication including the MECP2 duplication, and duplication of the proximal 15q region associated with autism in two subjects. Patient 6 was shown to be the carrier of two losses on both analyses: deletion of 4.9 Mb at 3q23q24, near the break point of a de novo reciprocal translocation (3; 21), and deletion of about 3.2 Mb at 11p11.2. Patient 11 had a terminal deletion and associated duplication of two contiguous regions at 10pter, whereas Patient 17 harbored a terminal deletion associated with a contiguous duplication on chromosome 1 (Figure 1).

Figure 1
figure 1

Patient 17: (a) Subtelomeric deletion of the 1p36.33 region disclosed by fluorescence in situ hybridization (FISH) analysis (arrow). (b) Single-nucleotide polymorphism (SNP) array analysis shows the deletion (red bar and arrow), followed by the duplication of a contiguous segment (blue bar and arrow). Although copy number segment analysis indicates a simple gain, the CN state (pink line) suggests that a part of the duplicated region could present a triplication, may be because of the high copy number variability of this region (c) FISH analysis on interphase nuclei shows the inverted duplication of the 1p36.33p36.32 region (arrows).

In four patients (3, 5, 13 and 19), SNP array analysis showed additional CNVs not detected by the oligonucleotide array. Patient 3 had a 1.4-Mb deletion of 3p25.3p25.2 on both platforms, whereas the SNP array disclosed an additional 0.871-Mb duplication of 18q22.1, which proved to be maternally inherited by FISH analysis. Patient 5 disclosed the duplication of two separate but neighboring regions (3.494 and 6.091 Mb) on chromosome 5q, confirmed by FISH (Figure 2a and b). These duplications were shown to derive from the malsegregation of a complex balanced paternal rearrangement involving chromosomes 3, 5 and 11 (Figure 2c). SNP arrays also showed a duplication of 0.140 Mb at Xq13.1, considered to be a false-positive result, as qPCR performed on PJA1 disclosed only two copies similar to that in controls. Patient 13 showed a 0.373-Mb subterminal deletion at 9q34.3 on both analyses, in addition to a paternally inherited gain of 0.684 Mb at 1q43 (ZP4 gene) on SNP arrays. Patient 19 showed a de novo gain of 0.188 Mb at 5q14.3, encompassing the CETN3 gene, in addition to a deletion at 1p35.3p35.2 disclosed by both platforms.

Figure 2
figure 2

(a) Duplications at 5q31 and 5q34 (patient 5) identified by single-nucleotide polymorphism (SNP) arrays and indicated by blue bars and arrows. (b) Fluorescence in situ hybridization (FISH) analysis shows that duplicated segments map to chromosomes involved in a paternal translocation: in particular, the 5q31 region is partially translocated onto the derivative chromosome 3 (not shown) and is partially translocated onto the derivative chromosome 11 at 11q23.3 (top); 5q34 is translocated onto the derivative chromosome 3 at 3q25.3 (bottom). (c) Left: FISH analysis shows a complex translocation involving chromosomes 3, 5 and 11 in the patient's father as described in ideograms (right).

SET II

This set included 51 samples previously found to be negative for copy number changes using oligonucleotide arrays. A total of 48 patients showed no CNV change after SNP array analysis. Three samples (29, 54 and 66) were positive on SNP arrays, disclosing CNVs ranging in size from 0.352 to 0.583 Mb. In patient 29, the SNP array showed a duplication of 0.583 Mb at Xp11.4 containing two genes (TSPAN7 and MID1IP1) (Figure 3a). Although the parents’ DNA was not available to assess the origin of this imbalance, an X-inactivation assay showed a random inactivation of X chromosomes (68%) (Figure 3a). Patient 54 showed a deletion of 0.545 Mb at 9q33.1 involving TRIM32 and ASTN2 (Figure 3b), which was not present in his mother, whereas the father's DNA was not available to elucidate the origin of CNV. In patient 66, the SNP array disclosed a de novo duplicated region of about 0.52 Mb containing three known genes (ANKRD56, SEPT11 and CCNI) at 4q21.1 (Figure 3c).

Figure 3
figure 3

(a) Patient 29: gain at Xp11.4, 0.583 Mb in size, including the TSPAN7 gene (blue bar and arrow). Top right: duplication was confirmed by qPCR on TSPAN7 (blue and yellow bars are male and female controls, respectively, whereas the violet bar represents the patient). Center and below: random X inactivation was shown on the androgen receptor gene (AR), showing two alleles (275 and 281), on the basis of the ratio of areas of two alleles after digestion. (b) Patient 54: deletion of about 0.5 Mb at 9q33.1 (red bar and arrow), including ASTN and TRIM32 genes. TRIM32 is indicated by the red square, as this gene is included in the ASTN sequence. Deletion was confirmed by qPCR on TRIM32 (blue and yellow bars are controls, whereas the violet bar represents the patient). (c) Patient 66: 0.3 Mb duplication at 4q21.1 (blue bar and arrow) detected by single-nucleotide polymorphism (SNP) array and confirmed by interphase fluorescence in situ hybridization (FISH) (arrows).

Discussion

Microarray-based platforms to be applied in diagnostic practice require a high level of robustness, reproducibility and effective resolution in detecting CNVs. Hehir-Kwa et al, performed a statistical power analysis to compare four different genomic platforms and showed that the actual performance of a microarray depends not only on the spacing of probes but also on the noise and sensitivity of copy number variation calls. In particular, the number of consecutively spotted sequences needed to detect a reliable CNV affects the resolution power.22 Coe et al,29 introduced the term ‘functional resolution’ to indicate the actual performance of a microarray-based platform, which is also estimated considering the uniformity of element spacing on the array and the sensitivity of each platform to single-copy alterations.

In our study, the high-resolution SNP array GeneChip 6.0 (Affymetrix) sensitivity in identifying copy number changes was tested by analyzing 19 patients previously shown positive and 51 patients who were negative to copy number analyses on Agilent lower-resolution arrays. Data were processed using selection criteria, and CNVs smaller than 75 kb and represented by less than 80 targets were excluded. These parameters are appropriate, but may become less rigorous with the decreasing size of imbalances. The parameters used could be useful to screen patients with phenotypes suggestive of genomic imbalance, although they may very likely underestimate the detection of very small CNVs. Using this approach, we found only one false-positive CNV mapping to Xq13.1, spanning 0.140 Mb, which was not confirmed by other techniques (patient 5; Table 1). Moreover, small CNVs with an average length of several Kbs cover about 12% of the normal human genome,11 and it is difficult to assess whether they have any clinical consequence.30 To optimize the management of CNV analyses, a workflow for clinical practice was published by Koolen et al.24 In our study, these guidelines were followed to further select the most likely pathogenic CNVs, and it was shown that more than 80% of CNVs detected by high-resolution platforms were polymorphic, nonpathogenic copy number changes. In particular, we found that 100% of CNVs showing a length lower than 100 Kb completely overlapped to polymorphic genomic regions. No false negatives were recorded in our study, as all previously detected CNVs were identified by SNP arrays (SET I). These CNVs were regarded as pathogenic, as they either corresponded to known regions involved in microdeletion/microduplication syndromes and/or were quite large imbalances encompassing several genes, suggesting a causal relationship with the phenotype. The imbalance size and gene content are important parameters to be considered while assessing the pathogenic role of CNVs.24, 31 Moreover, for 12 of our 19 positive patients it was possible to trace the CNV origin, and only 1 was found to be inherited (patient 12). This was a male patient hemizygous for an Xp22.31 duplication inherited from his unaffected mother. The duplication was because of a nonhomologous allelic recombination event among members of the VCX/Y gene family flanking the rearranged segment. The reciprocal deletion of this genomic segment is associated with 80–90% of X-linked icthyosis/MR cases32 and this duplication was reported in a child with MR,28 although its pathogenicity has not been definitely proved.33

Microarray analysis showed that chromosome anomalies detected by standard and molecular cytogenetic techniques may uncover more complex rearrangements resolved by CNV studies. For example, in patient 5, two duplications of about 6 and 3.5 Mb on chromosome 5q were detected. The patient was affected by developmental delay associated with dysmorphic features, and harbored an apparently balanced translocation involving chromosomes 3q and 11q. As the translocation was inherited from the unaffected father, in the beginning it was not considered to be related to the patient's phenotype. Integrated microarray and FISH analyses showed that the patient was heterozygous for duplications resulting from a malsegregation of a more complex balanced paternal rearrangement, involving chromosomes 3, 5 and 11. In patient 17, deletion of the 1pter region was diagnosed by FISH-based subtelomeric investigation. However, the patient's clinical features only partly overlapped the phenotype of a 1p36 deletion syndrome,34 prompting microarray analysis to further characterize the rearrangement. CNV analysis confirmed a 1.4-Mb deletion of the subtelomeric region of 1p36.33, which was smaller compared with the one usually associated with the 1p36.33 syndrome.35 Array-based analysis also disclosed a 1.6-Mb duplication of a contiguous 1p36.32 segment (Figure 1). This duplication was reported previously in association with a 1p36 deletion,36 and it is likely that its frequency has been underestimated lacking appropriate diagnostic techniques. Both the size of the deleted segment and the cooccurrence of the duplication could well explain the atypical clinical features of this patient, including distinct facial appearance and absent heart defects and seizures.34 Interestingly, the Affymetrix SNP platform detected these imbalances, but the CNV overlapping rate indicated that these copy number changes were neutral polymorphisms (Table 1). Recent data suggest that the total genomic content of known common human CNVs in databases is overestimated.37 Therefore, caution should be exercised while interpreting the significance of the CNV rate, especially for changes involving large genomic areas particularly rich in small polymorphic CNV segments.

The increased resolution provided by SNP array platforms provides an increased ability to detect small CNVs. In our cohort of patients, six additional copy number changes were detected, including four imbalances, which may be considered pathogenic, but further studies are required to substantiate this contention. In particular, SNP arrays disclosed likely causative CNVs in three patients regarded as normal on the basis of previous CNV analyses. This means that 6% of patients belonging to the present SET II were found to be affected by a ‘genomic disorder’. In patient 29, a duplication at Xp11.4 involving the TSPAN7 (OMIM 300096) gene was detected (Figure 3a). TSPAN7 mutations have been associated with X-linked nonsyndromic MR, and a duplication of this gene has been reported in three patients,38 although the pathogenic function of copy number gains remains controversial. In our case, parents were not available to assess the origin of these CNVs, but in structural abnormalities of the X chromosome, females presenting with a random X-inactivation pattern is associated with a severe phenotype.39 SNP array analysis in patient 54 disclosed a loss at 9q33.1 including ASTN2 and TRIM32 genes. ASTN2 is highly expressed in the central nervous system and a relationship with schizophrenia was suggested recently.40 TRIM32 has been associated with the Bardet–Biedl syndrome 11 (BBS11; OMIM 209900). Considering that the inheritance of this disorder was hypothesized to be triallelic, it can be surmised that TRIM32 heterozygotes contribute to the BBS phenotype, whereas a deletion of this gene can result in a distinct phenotype. In patient 66, 4q21.1 duplication encompassed the ANKRD56, SEPT11 and CCNI genes. Parental FISH and qPCR analyses showed the de novo origin of CNV. However, the role of these genes is not clear at present, thus preventing the possibility of establishing a correlation between their overexpression and the patient's phenotype. Notably, the SEPT11 gene was shown to be expressed in CNS, both in cell bodies and dendrites,41, 42 and septins are GTP-binding proteins interacting with developing myelin structures.43

No mosaic genomic alteration was found in our cohort of patients, either indicating that mosaic imbalances are a very rare cause of MR/MCA or reflecting the low sensitivity of the SNP array platform to detect CNVs in samples containing different cell populations. This point was discussed by Zhang et al,44 who compared different high-resolution microarrays and observed that some platforms, including the Affymetrix SNP array, could escape the detection of mosaic cases because of data noise.

In summary, genomic microarrays are important tools for studying patients with idiopathic MR. The opportunity of identifying very small imbalances using high-resolution microarray platforms improves our ability to detect the genetic defects responsible for rare forms of MR. However, this opportunity should be weighed against the detection of false positives and small CNVs devoid of any obvious pathogenic effect, including CNVs inherited from unaffected parents. Therefore, stringent parameters must be used for data analysis, and parents’ DNA should be available whenever these genomic platforms are used for diagnostic purposes.