Introduction

It is well known that various genetic imbalances can result in a wide range of pathological conditions such as mental retardation, malformations, developmental delay, neurological disorders and congenital anomalies. Different methodologies have been introduced to clinical practice in order to detect these genetic imbalances and provide prenatal and postnatal diagnosis to numerous groups of patients by revealing the genetic cause and mechanism of their disease. Classical karyotyping allows whole-genome screening for chromosomal aberrations larger than 5 Mb, however, smaller aberrations are not detectable with the existing G-banding analysis. Such small imbalances, called ‘submicroscopic’, require other methodologies that provide high-resolution analysis.

The development of fluorescence in situ hybridization (FISH) methods led to the detection of subtle cryptic chromosomal abnormalities smaller than 5 Mb.1, 2, 3 Comparative genomic hybridization (CGH), using metaphase chromosomes permitted the detection of copy-number changes as small as 3 Mb.4 The resolution of small size copy-number change detection was dramatically increased with the advancement of array-based CGH using BAC, PAC or cDNA clones,5, 6, 7 achieving a resolution higher than 1 Mb.8 To further increase, the sensitivity and specificity, the hybridization of genomic representations to CGH arrays has been developed.9 An average resolution of 30 kb throughout the genome was achieved by arraying long oligonucleotide probes and hybridizing with representations from cancer and normal cells.10 In addition, Affymetrix SNP arrays have recently been used for genome-wide detection of chromosomal regions with DNA copy-number alterations11, 12 and other commercial solutions exist with extreme resolution.13 Currently, the tiling resolution BAC arrays covering the entire human genome with approximately one clone per 100 kb are also available.14

A novel DNA dosage detection method, called multiplex amplifiable probe hybridization (MAPH), enables accurate and reliable detection of changes in DNA copy-number with a theoretical resolution up to 100 bp within a defined region of interest.15 The basic principle of the method is that the probes can be quantitatively recovered and amplified after hybridization on solid matrix. In practice, the studied genomic DNA is denatured, immobilized on a membrane and hybridized to a mixture of probes. Each probe is designed to be amplifiable and to target any unique coding or non-coding genomic sequence. As MAPH is fast, cost-effective and relatively easy to perform,16 it has been used for the detection of small DNA copy-number changes.17 A major drawback of MAPH is the fact that gel-based detection used in this method is only applicable to a limited number of probes. The majority of mentioned classical MAPH studies have included about 30–50 probes per assay.

We have applied classical MAPH to a novel microarray-based approach, aiming to contribute to the advancement in diagnosis of small copy-number changes throughout the human genome. The new array-MAPH method enables accurate and reliable determination of copy-number changes in virtually any targeted locus of human genome. It combines the flexibility, specificity and sensitivity of MAPH with the potential of high-resolution genomic analysis, provided by a microarray format. Here, we describe the detection of large and small genomic imbalances through the fabrication, interrogation and validation of a human chromosome X-specific array at the 238 kb median spacing of probes.

Methods

Array-MAPH methodology is described in Figure 1. The methodology requires (a) probe selection and preparation, (b) microarray preparation and (c) array-MAPH hybridization and data analysis.

Figure 1
figure 1

A flow diagram of array-MAPH methodology.

Probe selection and preparation

Array-MAPH requires the use of specifically designed hybridization probes. Such hybridization probes should be: unique in the human genome; evenly spaced over the studied genomic region; similar in size (preferably 400–600 bp) and GC content (30–55%) to ensure similar hybridization conditions. In addition, these probes have to be amplified from the human genome, which requires design of PCR primers suitable for genomic PCR. To design locus-specific hybridization probes used in the current study, 1500 evenly spaced candidate regions from chromosome X were determined. The sequences (1000 bp) from each region were downloaded from the ENSEMBL database (NCBI Human Genome assembly build 35.1). In the first phase of the hybridization probe design, all candidate sequences were tested for their uniqueness by comparing them against each other with BLAST2 program.18 Sequences that showed similarity score higher than 100 bits (identical regions longer than 50 bp) with any other candidate region were rejected. Secondly, PCR primers for amplifying hybridization probes from the candidate sequences were designed with the modified Primer3 program19 using the parameters shown in Supplementary Method 1. Thirdly, all designed PCR primers were tested for their uniqueness in the human genome. A substring consisting of 18 nucleotides from the each primer's 3′ end was detached and the number of the occurrences of these substrings in the human chromosome sequences was counted as potential binding sites. If a pair of primers had in sum more than 10 putative binding sites or if both primers had more than three occurrences in the human genome, they were discarded together with corresponding probe region. Primers were also rejected, if a pair of primers was able to generate more than one PCR product from the human genome. Product was predicted if two primers bound onto opposite strands within 1000 bp from each other. In the last step, the remaining candidate probes were tested conclusively for their uniqueness in the human genome using the BLAST2 program. The DUST filter within BLAST2 algorithm was switched off with the – F F option, otherwise, some repeated regions in the human genome could remain unnoticed. Probes that showed similarity score higher than 120 bits (identical regions longer than 60 bp) with regions other than their own correct binding site in the genome were removed.

From the designed candidate probes, 558 probes from the human chromosome X were finally selected on the basis of uniform spacing, maintaining a 150–350 kb between probes, resulting in median distance of 238 kb. Another 107 probes from human autosomal chromosomes were designed as normalization controls. The selected probes, primers for their amplification, probe locations (according to the NCBI Human Genome assembly build 35.1) in the human genome and their lengths are presented in Supplementary Table 1.

Probe and microarray preparation

The candidate probes were amplified by PCR from normal 46,XY source of genomic DNA (Supplementary Method 2a) and confirmed for their expected size, amplification efficiency and specificity. Amplicons were then cloned into the pCR2.1 vector using a TOPO TA cloning kit (Invitrogen Co., Carlsbad, CA, USA) allowing archiving and rapid amplification of each probe. Universal vector-derived primers were used to confirm cloning efficiency (Supplementary Method 2b). Clone identity was confirmed for every probe by PCR amplification using the probe-specific primer sets (Supplementary Method 2c). The successfully cloned candidate probes were cultured in LB medium with 40 μg/ml ampicillin and cryopreserved with 10% glycerol at −80°C. Array-MAPH amplifiable probes were developed by PCR amplification using PZA and PZB universal primers flanking the vector-cloning site (Supplementary Method 2b).

Array-MAPH target sequences were amplified for spotting onto the array by PCR using specific unique primers for every cloned probe (Supplementary Method 2c). The array-MAPH target sequences were then dissolved in 25% DMSO at final concentration of 30 ng/μl and spotted onto Genorama™ SAL-1 microarray slides (Asper Biotech Ltd, Tartu, Estonia) in duplicates with the use of VersArray ChipWriter Pro arrayer (BioRad Laboratories, Hercules, CA, USA). All probes were arrayed in random order, to minimize the possibility that a spatial artifact during array hybridization will be incorrectly interpreted as an aberration.

Array-MAPH hybridization and data analysis

Two micrograms of human test genomic DNA was immobilized on Hybond+ filters (GE Healthcare, Piscataway, NJ, USA) as described previously.15 The filters were then hybridized with the array-MAPH amplifiable probes and washed to remove unbound and nonspecifically bound probes (Supplementary Method 3a). Each filter was placed into separate tube and all bound probes were recovered from filters by denaturing and quantitative PCR amplification (Supplementary Method 3b). The recovered probe mixture was purified and labeled by nick translation using aminoallyl-dUTP-s and later treatment with amino-reactive Cy3 dye (GE Healthcare) (Supplementary Method 3c). The recovered probe mixture was hybridized to the microarray for quantification using automated hybridization station HS-400 (Tecan Austria GmbH, Grödig/Salzburg, Austria) (Supplementary Method 3d). The microarrays were scanned using Affymetrix 428 microarray scanner (Affymetrix Inc., Santa Clara, CA, USA). Raw signal intensities were extracted with BaseCaller module of Genorama™ Genotyping Software 4.2 Package (Asper Biotech Ltd). As each target sequence was presented in duplicate, their average pixel fluorescence intensity was used for further analysis with the specifically designed software for array-MAPH, called MAPH-Stat, available on request for academic use. The same program was also used for resorting of probes into their genomic order.

Microarray signals were normalized using the MAPH-Stat program (between-slide normalization) with respect to the median of autosomal control probe-specific signals from the given microarray. In order to adjust the raw data to normal distribution, which is required for correct calculation of confidence intervals (CIs), raw signal intensity values were converted to logarithmic scale by taking logarithm of signal intensity values with base 2. Then, average (AVERAGE) and CI 90% values were calculated for each (i) probe using data from the control panel containing signals from five DNA samples of cytogenetically controlled phenotypically normal individuals. This was carried out separately for male and female control panel. The use of CI 90% gives the theoretical false positive (two consecutive probes deviated from CI values to the same direction) rate of 0.5%.

The following transformation was made for signal intensity values from studied DNA calculated from the control panel: CONVERTED_VALUEi=VALUEi/AVERAGEi. CI values for each probe were converted by using the same formula. An example of data analysis is shown in the Supplementary Table 2.

Signal intensity values of at least two adjacent probes with the intensity above or below the CI 90% values were considered indicative for the potential copy number change in the analyzed region.

FISH and PCR confirmation analyses

FISH analyses were used to confirm duplication on chromosome X of patient A-2879 (Supplementary Figure 1) and PCR analyses were used to confirm the deletion in patients 220728, 22467 and A045 (Supplementary Method 4). For confirmation with FISH, BAC and PAC clones were obtained from the clone library of The Wellcome Trust Sanger Institute and cultured according to the provided instructions (www.sanger.ac.uk). STS gene-specific probe for FISH analysis was obtained from Vysis Inc. (Downers Grove, IL, USA). The protocol for BAC and PAC extraction was obtained from BACPAC Resources Center Home Page (www.bacpac.chori.org). Labeling and hybridization was performed using Nick Translation Kit (Vysis Inc.) according to the provided standard protocol. The confirmation with PCR analyses is described in Supplementary Method 4.

Results

The first step towards the application of MAPH to a microarray format was the development of a chromosome X-specific array containing 558 probes spanning almost the entire chromosome X with median spacing of 238 kb and 107 control probes, representing all autosomal chromosomes. To simplify the in silico conducted probe design process and to make it reproducible, we prepared special software and tools for it. We also made a web interface called MAPHDesigner (http://bioinfo.ebc.ee/MAPH) for the developed programs. As web-based service, MAPHDesigner is freely available for all academic and non-academic users.

Probes were spotted onto microarray slides and used as hybridization targets. The chromosome X-specific microarray was tested with the new array-MAPH methodology to analyze normal male and female DNA samples, as well as DNA samples from patients carrying known and unknown genomic imbalances. Each microarray with patient data is compared against a control panel of five normal individuals, which are analyzed on individual microarray. The signals from each array are normalized before comparison using the median signal intensity from the autosomal control probes from various experiments. Normalized signal values of the analyzed male and female samples were compared with CI 90% to the respective signal intensities for each of the array-MAPH probes from the control experiments.

Initial validation of array-MAPH was performed with normal DNA samples. All experiments were carried out in four replicates and gave comparable results. Replicate experiments are required to minimize possible biases related to experimental variations. DNA from a normal 46,XX control female subject was studied by comparing it against a control panel of five normal female subjects. As expected, the vast majority of the fluorescence intensities of chromosome X-specific probes remained within the calculated CI 90% values. From similar assay with four normal individuals, we estimated our empirical rate of false positives. This was carried out by counting all consecutive two probe deviations from CI 90% values trending into the same direction and was found to be ca. 1% on the average and 3% in the worst case. Secondly, DNA from a phenotypically normal male subject was compared to a control panel of five normal female subjects DNA samples. In this case, the normalized ratios for chromosome X-specific probes were clearly differentiated from autosomal as they remained below the CI 90%, demonstrating that the method reliably detects differences between one and two chromosomal copies.

To further validate the array-MAPH method, we used patient DNA samples with known chromosome X abnormalities. For various experiments, female patient DNA samples were compared to female control panels and male patient DNA samples to male control panels. Patient A-2879 is a female whom the initial cytogenetic analysis showed a chromosome Xp rearrangement. First FISH analyses using loci-specific probes for the STS and SHOX gene, the loci DXZ1 and whole chromosome X paint revealed a complex aberration with a deletion of distal-most part of Xp22.3 and a proximal Xp22.31–p22.32 duplication. Array-MAPH was used to identify the size of the duplication. Our data showed the same duplication found by FISH; moreover, the duplicated area appeared to span a greater region than expected, extending to chromosomal bands Xp22.32–p22.12 (Figure 2). Second FISH analysis using the STS gene-specific probe, six BAC and one PAC clones confirmed the extended duplication identified by array-MAPH (Supplementary Figure 1). Our current probe list does not cover the distal-most part of Xp, so in this study, we could not analyze the small deleted region from initial FISH finding. Detailed description of the patient, clinical data and additional cytogenetic analysis will be reported elsewhere.

Figure 2
figure 2

Array-MAPH profile of patient A-2879.

DNA samples of patients 22 0728 and 22467 were kindly provided by Dr J Vermeesch (Center for Human Genetics, Catholic University of Leuven, Belgium). The patient 220728 is a male carrying a deletion estimated initially to cover a region of maximum 7.5 Mb on the short arm of chromosome X detected by array-CGH and confirmed by FISH (Dr J Vermeesch, personal communication). The patient DNA was used in a blind test to validate the array-MAPH approach. Array-MAPH determined a 5.45–6 Mb Xp terminal deletion, spanning from probe X020 (2 771 275 bp) to X63 (8 443 336 bp) (Figure 3). Deletion breakpoints were confirmed by probe-specific and locus-specific PCR. The proximal breakpoint was estimated between 8 220 853 bp and 8 443 336 bp, whereas the distal breakpoint was between 2 445 055 bp and 2 771 275 bp (Figure 4). The discrepancy between array-CGH and array-MAPH results could be explained by the difference in the resolution of the two platforms.

Figure 3
figure 3

Array-MAPH profile of patient 220728.

Figure 4
figure 4

Probe-specific PCR confirmation of 5.67 Mb deletion for patient 220 728. Probes X5, X58, X60 and locus X:8220853, covering the region between 2771275 bp and 8220853 bp gave no product when amplified from patient's DNA, whereas probes X63 and locus X:8528549 proximal to the breakpoint and locus X015 and X020 distal to the breakpoint give normal PCR products.

Patient 22 467 is a male, carrying a 1.5 Mb deletion on the short arm of chromosome X (Dr J Vermeesch personal communication). Array-MAPH showed a deletion between probes X34 (5 952 842 bp) and X58 (8 000 728 bp) (Figure 5), which was confirmed by probe-specific PCR (Figure 6).

Figure 5
figure 5

Array-MAPH profile of patient 22467.

Figure 6
figure 6

Probe-specific PCR confirmation of 1.5 Mb deletion for patient 22467. Probes X49, X52, X53 and X55, covering the region between 7 103 805 bp and 7 700 590 bp give no product when amplified from patient's DNA, whereas probes X58 and X60 proximal to the breakpoint and probes X34 and X35 distal to the breakpoint give normal PCR products.

Screening of 20 clinically affected male individuals from X-linked mental retardation families (unpublished data) resulted in the detection of a 500 kb deletion on Xq25 in patient A045 (Figure 7). The deletion was confirmed with probe-specific PCR (Figure 8a). Additional primers were designed into the deleted region using the existing methodology for developing array-MAPH probes (Supplementary Table 3) and the breakpoints were refined by PCR (Figure 8b), estimating the deletion to be 23 kb in size spanning from 121 018 669 bp to 121 041 883 bp.

Figure 7
figure 7

Array-MAPH profile of patient A045.

Figure 8
figure 8

(a) Probe-specific PCR confirmation of 500 kb deletion for patient A045. Probe X01200 at 121 037 494 bp gives no product when amplified from patient's DNA, whereas probes X1159 and X1162 proximal to the breakpoint and probes X1167 and X1170 distal to the breakpoint give normal PCR products. (b) Amplification with additional primers narrows down the deletion to 23 kb between 121 018 669 bp and 121 041 883 bp. Probes located proximal to 121 018 669 bp and distal to 121 041 883 bp give normal PCR products, whereas probe located at 121 030 378 bp gives no product.

Discussion

We describe the detection of large and small genomic imbalances using the new array-MAPH methodology. Array-MAPH was validated using a chromosome X-specific array with 558 probes, providing a median spacing of 238 kb over the entire human chromosome X. The 154-Mb human chromosome X was chosen because it is one of the best characterized chromosomes with complete sequence information available and provides an excellent model for studying X-linked genetic diseases. Initial validation of array-MAPH approach revealed certain limitations and advantages of the method.

Among the limitations of array-MAPH is the relatively large amount and concentration of starting DNA material needed. It was empirically established that best results can be obtained with 2 μg of genomic DNA, whereas only 0.5–1 μg is needed for gel-based MAPH,15 0.5 μg for BAC/PAC array-CGH20 and 250 ng for Affymetrix SNP arrays.11 However, this amount is less than the 6 μg needed for some oligonucleotide-based array platforms.21 The necessary DNA quantity must be contained in a limited volume (less than 10 μl) to enable spotting on small (2 × 3 mm) filters. Therefore, DNA concentration should be at least 200 ng/μl. Nevertheless, one should consider that in case of array-MAPH the 2 μg of DNA are required for preparing a filter and the same filter can be used for multiple analyses.

Array-MAPH involves two hybridization steps: the first one on filters and a second one on arrays. As a result, possible errors from both hybridizations are added to the final analysis. This problem is partially solved by uniform size and GC content of all probes, which allows maximum optimization of PCR and hybridization steps, avoiding errors and biases. In addition, the uniform size and GC content allows to overcome a potential hybridization issue of classical MAPH, where the probes have to be designed with different size for gel electrophoresis-based identification and quantification.

A potential drawback of single sample hybridization might be an increased influence of microarray-caused artefacts and variances. Although normalization and analysis is performed in a way that minimizes this type of experimental bias, still it is important to use automated hybridization systems to get as uniform hybridization efficiency as possible. At the same time, an automatic hybridization station will increase sample throughput.

A limiting factor for the clinical application of array-MAPH is that array-MAPH probes, similarly to oligonucleotides used for array-CGH, are extremely sensitive to DNA heterogeneity, which might cause an alteration of signal ratios. Therefore, one would expect that cancer cell line analysis might present certain challenges for array-MAPH, as tumor biopsies are often contaminated with normal tissue. This is also true for other samples with possible DNA contamination, such as amniotic fluid or fetal blood and mosaics.

A considerable advantage of the new approach is the nature of specifically designed probes it uses. In silico, probe development excludes repetitive or even partially redundant sequences and ensures specificity and sensitivity of array-MAPH, owing to 100% probe identity to the target sequence. Also, the uniqueness of probes enables to avoid the pseudoautosomal region on the chromosome X, which assures that probes used in such study will not result in the incorrect copy-number assessment for the corresponding chromosome X sequences. Similar probe development approach has been used also by other groups to design gene-specific strictly sequence defined microarrays for detection of diagnostically significant gene copy-number aberrations.22, 23 Comparison of array-CGH with genomic and PCR-based targets has shown that drawbacks of genomic clone-based approach caused mainly by the presence of redundant DNA sequences in the genomic clones can be circumvented by using short PCR-based microarray probes.24 Probes for array-MAPH have the flexibility to target virtually any locus in the genome and cover large-scale copy-number variations (LCVs).25, 26 The relatively small size of probes (400–600 bp) allows detection of copy-number changes of small regions, facilitating the design of exon-specific and region-specific high-resolution arrays. This is also in a good agreement with data from others,23 who have demonstrated that the shortest stretch of unique sequence, which allows reliable detection of single copy-number changes is 490 bp using fluorescently labeled total human DNA in the hybridization experiments.

Furthermore, one should consider the low complexity of DNA material hybridized to the microarray in case of array-MAPH and consequently increased concentration of probe complementary to target sequences. This is critical for obtaining reliable data,9, 27, 28 as it significantly improves hybridization kinetics and increases signal to noise ratio. As genomic representations in case of ROMA technology,10 array-MAPH also provides means to amplify the quantity of studied loci, thereby increasing signal to noise ratio during hybridization to arrayed probes.

Unlike array-CGH, where two different fluorophores are used, array-MAPH labeling involves a single fluorophore, with reference signal values for comparison. Data from patient material is directly compared to CIs of the control data set. Recent large-scale studies show the presence of LCVs in several genomic regions of phenotypically normal individuals.25, 26 When searching for aberrations causing abnormal phenotypes, comparison of studied DNA against a panel composed of different pooled normal DNAs will enable more reliable analysis. However, LCVs can be investigated as well by limiting the heterogeneity of DNAs included in the control panel. In routine applications, a control panel will reduce the need for repeated analysis of normal DNAs, saving on materials and analysis time.

In contrast to clone-based technologies, the method does not rely on clone availability from BAC, PAC or other libraries, as probes can be rapidly and almost unrestrictedly selected from any location in genome. Having all probe-specific primers available, it is easy to carry out PCR-based confirmation procedures for any detected copy-number alteration in studied DNA. Moreover, it is easy to confirm or reject candidate copy number changes and further refine the breakpoints by designing more primers using the existing software tools developed for MAPH probe design. Using this approach, we confirmed the deletion and its breakpoints for patient A045. Despite the fact that in general, single probe deviations as in case of A045, are considered to be ‘false positive’ and at present, we rely on two or more consecutive probes to warrant a copy-number change, there is always the option of simultaneous investigation of 10–20 loci, that showed ‘false positive’ ratios. Therefore, in some cases, even single probe alterations, classified as ‘false positive’ can be further studied and smaller deletions or duplications detected.

Our current data demonstrate the ability of array-MAPH followed by locus-specific PCR to detect deletions and duplications as small as 23 kb (patient A045). Nevertheless, the array-MAPH platform itself carries the potential of even higher resolution considering the 500 bp probe size and the flexibility to vary probe spacing according to the scientific question to be answered.

Array-MAPH can be useful for several applications in medical genomics. Even though array-MAPH can easily be applied to diagnosis of known genetic syndromes caused by copy-number alterations, it is a particularly useful tool for high-resolution screening of patients with abnormal phenotype and no genetic findings, detection of microdeletions, microduplications, genotype–phenotype correlations, gene identification and studying genetic variation. Further development and optimization of the technique may result in improved cost-effectiveness, reproducibility and dynamics, thereby revealing the full potential of array-MAPH platform.