Introduction

One of the hallmarks of cancer is copy number alteration (CNA) across the entire genome. CNA can affect the development or progression of human malignancies by altering the expression of cancer-related genes. Recently, microarray-based comparative genomic hybridization (array-CGH) has emerged as a very efficient technology with higher resolution for the genome-wide identification and characterization of CNAs (Pinkel et al., 1998; Yim and Chung, 2004). Improved resolution enables the identification of submicroscopic chromosomal alterations which are less likely to be detected by conventional cytogenetics tools. These small-sized chromosomal changes including repeatedly altered regions (RAR) in various cancers are thought to contain cancer-related genes (Albertson and Pinkel, 2003; Kim et al., 2006). In addition, whole-genome approach can help to understand the contribution of CNAs to tumorigenesis and tumor behaviors such as metastasis in comprehensive terms.

In spite of technical advancement, the precise identification of CNAs is still challenging and needs further improvement. Most currently available BAC arrays provide the resolution of ~1 mb, which makes it difficult to detect smaller than mb-sized CNAs (Brennan et al., 2004; Carvalho et al., 2004). Oligonucleotide microarrays (oligoarray) provide higher resolution and sensitivity in detecting submicroscopic CNAs. Especially long oligoarray (>60 mers) guarantees increased signal intensity thanks to improved hybridization kinetics, which enables more reliable copy number analysis (Ylstra et al., 2006).

Since CNAs are thought to affect expression of genes by changing genomic dosages or gene structures, it is important to interpret copy number status together with gene expression profiles (Lynch, 2002). For example, genomic amplifications of 8q24, 11q13 and 17q12 are commonly observed to be associated with overexpression of MYC, CCND1 and ERBB1 in diverse cancers (Croce, 2008). Recently, integrated genome and transcriptome analysis results for breast and lung cancers have been reported, for which different platforms were used separately for copy number and expression profiling (Dehan et al., 2007; Vincent-Salomon et al., 2008). However, there is no array platform currently available for the integrated CNA-expression analysis.

To achieve high-resolution copy number analysis integrated with analysis of expression on the same platform, we established human 30k oligoarray-based genome-wide copy number-expression analysis system using the platform originally designed for expression analysis. In this platform, 60mer-sized oligo probes covering over 17,500 genes are spotted across the whole chromosome, which is basically the same to long oligoarrays for whole genome copy number analysis (Brennan et al., 2004; Carvalho et al., 2004). In this study, we explored the applicability of this system for integrated genome and transcriptome analysis using cancer cell line.

Results

Comparison of CNA detection ability between oligoarray and BAC array

Firstly, to see how accurately single copy difference would be detected by two different platforms, we hybridized normal male and female genomic DNA onto the two platforms, oligoarray (30k, 100 kb resolution on average) and BAC array (3k, 1mb resolution on average). Although both platforms successfully identified the single copy difference of X chromosome between male and female, oligoarray detected the difference more accurately than BAC array. In oligoarray-CGH results, mean log2 signal intensity ratio value of the chromosome X probes in the female-versus-male hybridization was 0.78 (Standard Deviation (SD)=0.29) and that of autosomes was 0.0027 (SD=0.21). However, in BAC array-CGH results, mean log2 ratio value of the chromosome X probes was 0.51 (SD=0.16) and that of autosomes was -0.0020 (SD=0.06). Female versus male array-CGH plots from both platforms are available in the Supplemental Data Figure S1.

To examine the sensitivity of two arrays for detecting regional CNAs, we analyzed the same cancer cell line using two platforms (Figure 1). All the chromosomal alterations identified by BAC array-CGH were consistently detected by oligoarray (Table 1). Mean size of the CNAs identified by both arrays was 18 mb. However, many of the relatively smaller sized-CNAs were detected only by oligoarray. In total, 122 alterations which were not detected in BAC array were identified by oligoarray, of which mean size was 1.79 mb. Detailed information of the 122 alterations is available in the Supplemental Data Table S1.

Figure 1
figure 1

Whole genome array-CGH profiles of the MDA-MB-231 using 2 different arrays. (A) Profile by 30k oligoarray and (B) Profile by 3k BAC array. X axis represents individual chromosomes and Y axis represents the signal intensity ratio (tumor/normal) in log2 ratio. Red dots represent the probes above ratio zero and green dots represent below zero.

Table 1 Genomic alterations identified by both BAC and oligoarrays in MDA-MB-231

Most of the high copy changes, i.e. amplifications (AMP) and homozygous deletions (HD), were also identified only by oligoarray. In total, 18 AMPs and 14 HDs (mean size = 0.88 mb) were detected by oligoarray (Table 2). Among them, only 1 AMP (17p12) and 1 HD (9p24) were detected by both BAC- and oligoarrays. All the others were detected only by oligoarray under our criteria.

Table 2 High copy alterations identified by oligoarray in MDA-MB-231. abp, base pair. *High copy alterations identified in both BAC and oligoarrays

Validation of CNAs identified by oligoarray-CGH

We firstly selected a copy number gained region (17p12) consistently identified by both oligo- and BAC arrays. Both FISH and genomic qPCR demonstrated the existence of the copy number gain in this region, which agrees with array-CGH results (Figure 2). In FISH analysis, median number of red signals targeting 17p12 (3 per each cell, n = 107) was significantly higher than number of green signals targeting 2q35, diploid control region (2 per each cell, n = 107) (P < 0.0001). In genomic qPCR, the signal intensity ratio of 17p12 was approximately two times higher than that of the diploid control region.

Figure 2
figure 2

Validation of the 17p12 copy number gain identified by oligoarray-CGH. (A) FISH analysis of the MDA-MB-231. The signal number of 17p12 (red arrow) is higher than the signal number of 2q35 where no CNA was identified by oligoarray-CGH (green arrow). (B) Genomic qPCR analysis of MDA-MB-231. The signal intensity ratio of the test DNA (MDA-MB-231) is 1.9 (SD = 0.09).

Next, to further validate the reliability of oligoarray-CGH, we performed genomic qPCR targeting 7 CNA regions, gains on 1q23.3, 16q23.1, 18q21.1 and 19q13.43, and losses on 6p12.3, 10q26.13 and 22q13.31 which were detected by oligoarray only, and also targeting one non-CNA region (2q35) (Figure 3). All the genomic qPCR profiles of these 7 CNAs and one non-CNA region were consistent with the oligoarray-CGH results under our criteria.

Figure 3
figure 3

Genomic qPCR results of 7 CNA regions identified by the oligoarray only (gains on 1q23.3, 16q23.1, 18q21.1 and 19q13.43, and losses on 6p12.3, 10q26.13 and 22q13.31), and a non-CNA region (2q35).

Integrated analysis of copy number and expression profiles

Combined interpretation of both DNA copy number and RNA expression profiles can provide new insights into biological effect of copy number alterations. To explore this possibility, we performed global gene expression analysis for the same cell line using the same oligoarray platform. As a whole, mean DNA copy number levels were significantly correlated with the mean RNA expression levels (R2 = 0.92) (Figure 4A). Figure 4B and C show examples of the correlations at amplified region (17p12-p11.2) and deleted region (9p22.1-9p21.2). In the peak of the amplified region (17p12), a putative oncogene PMP22 is located, of which RNA expression was 7.6 times higher than normal breast tissue. In the deletion region on 9p22.1-9p21.2, CDKN2A&B tumor suppressor genes are located and their RNA expression was 6.8 times lower than normal breast tissue.

Figure 4
figure 4

Integrated analysis of copy number and expression profiles. (A) Correlation analysis of copy number status and expression levels. X axis represents the array-CGH signal intensity ratio (tumor/normal) in log2 scale and Y axis represents the expression signal intensity ratio (tumor/normal) in log2 scale. Tumor, MDA-MB-231; Normal, normal female genomic DNA (B) Example of the correlation at an amplified region (17p12-p11.2). The arrow indicates the highest value of expression signal intensity ratio (7.6 in log2 scale, PMP22 gene). (C) Example of the correlation at a deleted region (9p22.1-p21.2). The arrow indicates the lowest value of expression signal intensity ratio (-6.8 in log2 scale, CDKN2AB gene). In B and C, upper box represents whole chromosome arm plot and lower box represents copy number-expression signal intensity ratios in the region selected from the upper box. Both DNA copy number signal intensity ratio (solid bar) and RNA expression signal intensity ratio (gray bar) of each probe are represented in log2 scale. X axis represents the chromosomal position in mb scale and Y axis represents the signal intensity ratio in log2 scale.

Discussion

In this study, we established the oligoarray-based, whole-genome copy number-gene expression analysis system and evaluated its accuracy and reliability. As a first step, we compared the CNAs detected by this 30k oligoarray and the 3k BAC array, because the validity of the BAC array has been well evaluated in identifying CNAs in various tumors (Kim et al., 2005, 2006, 2008). This oligoarray identified the single copy difference more accurately than the BAC array and all CNAs detected by the BAC array were detected by the oligoarray, which suggests the reliability of the oligoarray we used.

Seventeen CNAs detected by both platforms are the major genomic aberrations in MDA-MB-231. Many of these CNAs such as gains of 5p15.33-13.1, 8q11.22-8q21.13, 17p11.2, and losses of 1p32.3, 8p23.3-8p11.21, and 9p21 were consistently identified in previous studies on breast cancer (Han et al., 2006; Jones et al., 2008). Interestingly, other 122 CNAs detected by the oligoarray only were approximately 10 times smaller than those detected by BAC-array. Among the 32 AMPs or HDs detected by the oligoarray, only 2 were detected by the BAC array, which reflects the higher resolution of the platform as reported previously (Carvalho et al., 2004; Brennan et al., 2004; Ylstra et al., 2006).

We further validated the CNAs identified by oligoarray using genomic qPCR and FISH. We selected 9 regions (8 CNA regions and 1 region without any CNAs) for validation. Of the 8 CNAs, 1 was identified in both platforms (6 mb-sized, 17p12) and the other 7 were detected only by the oligoarray. In case of the CNA detected by both arrays, both genomic qPCR and FISH analysis demonstrated the consistent copy number gain in this region. Genomic qPCR targeting the 7 oligoarray-only CNAs and 1 non-CNA region also showed the results consistent with the array-CGH result. It suggests that higher resolution oligoarray can detect smaller CNAs accurately which would be missed by the BAC-array.

A key advantage of this integrated system is the feasibility of integrated interpretation of both DNA copy number and RNA expression profiles. If genomic DNA and cDNA from the same sample is used for copy number analysis and transcriptome analysis, respectively, using the same platform, systemic errors could be minimized when exploring the relationship between copy number status and gene expression. When we assessed this possibility using MDA-MB-231, mean DNA copy number and RNA expression levels showed highly significant correlation. This result agrees with the previous reports which have demonstrated the correlation between DNA copy number alterations and gene expression in diverse cancers, although all previous studies assessed copy number alterations and expression patterns separately using different arrays (Heidenblad et al., 2005; Chin et al., 2006; Dehan et al., 2007; Vincent-Salomon et al., 2008). In this study, we used normal Korean breast tissue RNA as reference of MDA-MB-231 RNA expression, because the matching RNA to Promega DNA was not available, which means that the reference DNA for array-CGH and the reference RNA for expression profiling of MDA-MB-231 were from different sources. To validate whether the oligoarray-CGH profiles of MDA-MB-231 using Promega genomic DNA and the DNA from the normal breast tissue are consistent with each other, we performed oligoarray-CGH of MDA-MB-231 using the normal breast DNA as reference in addition to oligoarray-CGH of MDA-MB-231 versus Promega DNA. The genome-wide CNA profiles obtained from MDA-MB-231 versus Promega DNA and from MDA-MB-231 versus normal breast DNA were largely consistent and the CNA-RNA expression correlation was highly significant too (Supplemental Data Figure S2 and S3).

There are several limitations in this study. First, we did not perform qPCR validation for every CNA detected by the oligoarray only. Although all the qPCR results from the regions we selected were consistent with oligoarray-CGH calls, we cannot assume that all the other CNAs which were not validated are true. Further validation will be required to calculate the exact validity of this oligoarray. Second, since we did not compare the performance of this oligoarray system with currently available higher resolution oligoarray chips such as Agilent 244k or NimbleGen 1M array, we may miss even smaller CNAs. However, since main purpose of this study is to establish a reasonable tool for screening the chromosome alterations in cancer and combined interpretation of copy number and expression profiles using the same array, the resolution of 100 kb probe interval of this oligoarray is suitable enough to detect most of the CNAs precisely, given that most chromosomal alterations in cancer are mb-sized. In addition, the extremely higher resolution arrays mentioned above cannot be directly applied for gene expression analysis and, due to the cost, they are not affordable for researchers with large cancer sets analysis. Third, although we improved the array-CGH conditions to get higher signal to noise ratios and lower SDs in this study, mean SDs in oligoarrays was still higher than those in BAC arrays. Therefore, more conservative CNA defining approach such as increasing the detection threshold might be preferable.

In conclusion, this 30k oligoarray-CGH system can be a reasonable approach for analyzing whole genome CNAs and RNA expression profiles at an affordable cost.

Methods

Cell culture and DNA extraction

We used the MDA-MB-231 breast cancer cell line obtained from American Type Culture Collection (ATCC, Manassas, VA). The cell line was maintained in RPMI1640 medium containing 10% FBS (Hyclone, Logan, UT) under 5% CO2. Genomic DNA was extracted using DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) and quantified using NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, DE). Female genomic DNA (Promega, Madison, WI) was also prepared as a normal control for array-CGH analysis.

Oligo Array-CGH

We used 30k whole-genome human oligoarrays with approximately 100 kb resolution (Human OneArrayTM, Phalanx Biotech, Palo Alto, CA). In brief, 2 µg of genomic DNA from the MDA-MB-231 cell line was labeled with Cy3-dCTP and female control DNA (Promega, Madison, WI) was labeled with Cy5-dCTP using BioPrime Labeling Kit (Invitrogen, CA) according to manufacturer's instructions. Dye labeled DNA was purified by BioPrime spin columns (Invitrogen, Carlsbad, CA) and precipitated with 100 µg of human Cot-1 DNA (Roche, Mannheim, Germany). The labeled DNA pellet was dissolved in 50 µl of DIG hybridization buffer (Roche, Mannheim, Germany), where 600 µg of yeast t-RNA (Invitrogen, Carlsbad, CA) was added. DNA was denatured for 10 min at 70℃ and incubated for 1 h at 37℃ before being applied on the oligoarray slide, which was pre-hybridized for 2 h at 37℃ with 50 µl of DIG hybridization buffer containing 540 µg of herring sperm DNA. The labeled DNA solution applied on the array was incubated in MAUI hybridization machine (BioMicro, Utah) for 48 h at 37℃. The slides were washed serially in solution I (2× SSC, 0.1% SDS) for 4 min (1 time) at room temperature, in solution II (0.1× SSC, 0.1% SDS) for 3 min (2 times) and in solution III (0.1× SSC) for 30 s (3 times) followed by a rinse in DW for 10 s. Finally, the slides were spin-dried for 3 min at 900 rpm.

BAC Array-CGH

To validate the copy number analysis results, an independent array-CGH was performed using a large insert clone array covering the entire human genome at 1 mb resolution. All the Array-CGH procedures including DNA labeling, pre-hybridization and hybridization using MAUI hybridization station (BioMicro Systems, Salt Lake city, UT) were performed as described elsewhere (Kim et al., 2006).

Scanning and CNA data analysis

Arrays were scanned using GenePix 4000B scanner (Axon Instruments, Sunnyvale, CA) and feature extraction was processed using GenePix Pro 6.0. Data processing, normalization, and re-aligning of raw array-CGH data were performed using CGHscape software (http://www.ircgp.com/software/CGHscape) (Jeong et al., 2008). Mapping of each oligo-probe and BAC clone was performed according to the genomic location in the Ensembl Homo_sapiens 43_36e build. Information on the whole oligo-probe set is available on the Phalanx biotech homepage (http://www.phalanxbiotech.com/Support/Files.html). We performed the standard deviation (SD)-based CNAs identification as described previously with some modifications (Kim et al., 2006, 2008). In brief, chromosomal gain or loss was assigned when the normalized log2 intensity ratio of each data point exceeded or fell below ± 5SD derived from normal control hybridizations. Regional copy number change was defined as DNA copy number alteration which stretches over 2 or more consecutive large insert clones, but not across an entire chromosomal arm. High-level amplification of clones was defined when their intensity ratios were higher than 1.0 in log2 scale, and vice versa for homozygous deletion.

Expression microarray analysis

Total RNA of the MDA-MB-231 breast cancer cell line and normal breast tissue was isolated using Trizol (Invitrogen, Carlsbad, CA) according to the manufacturer's instruction. The quantity and quality of extracted RNA was assessed using a Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA). First-strand cDNA was synthesized from total RNA using SuperScript III Reverse Transcriptase (Invitrogen, Carlsbad, CA). We labeled 5 µg of both cDNAs using BioPrime Labeling Kit (Invitrogen, Carlsbad, CA) and co-hybridized onto the same oligoarray for array-CGH. Hybridization, slide washing and data processing were basically same as array-CGH analysis. Gene expression analysis was performed using CGHscape software (Jeong et al., 2008).

Genomic quantitative PCR analysis

For the quantitative PCR (qPCR) validation, we chose four CNA regions and one region without CNA. A genomic region which showed no genomic alteration on array-CGH data (13q32.1) was used as internal control. Primer sequences of the target regions and internal control are shown in Supplemental Data Table S1. Genomic qPCR was performed using Mx3000P qPCR system and MxPro Version 3.00 software (Stratagene, La Jolla, CA). Twenty µl of real-time qPCR mixture contains 20 ng of genomic DNA, SYBR Premix Ex Taq TM II (TaKaRa Bio, Japan), 1× ROX, and 10 pmole of primers. H6ST3 was used as internal control in each procedure. Thermal cycling conditions consisted of one cycle of 30 s at 95℃, followed by 45 cycles of 5 s at 95℃, 10 s at 55-60℃ and 30 s at 72℃. After amplifying PCR reaction, melting curve analysis was performed to confirm specific amplification. Relative quantification was performed by the ΔΔCT method (Livak and Schmittgen, 2001). When the mean genomic dosage ratio of the region between the MDA-MB-231 and female control DNA (ΔΔCT of target and internal control) was >1.2 or <-0.8, we defined the region as copy number gain or loss, respectively.

FISH

For FISH analysis, target-specific probes (17p12) and control-specific diploid probes (2q35) were labeled with the Dig nick translation mix kit and Biotin nick translation mix kit (Roche, Mannheim, Germany). The labeled probes were mixed with salmon sperm DNA, human Cot-1 DNA in hybridization mixture (50% formamide, 10% dextran sulfate, 2× SSC). After denaturing at 75℃ for 10 min, the labeled probes were hybridized onto the denatured chromosomes and incubated overnight at 37℃. The slides were washed in 50% formamide in 2× SSC at 45℃ for 30 min and in 2× SSC for 5 min. After a rinse in 4 ×SSC/0.1% Tween20, they were incubated under cover slips with fluorescein avid in DCS (Vector Laboratories Inc., Burlingame) and Anti-Digoxigenin-Rhodamin (Roche, Mannheim, Germany) at 37℃ for 1 h. After a rinse in 4×SSC/0.1% Tween20 for 15 min, the slides were washed sequentially and counterstained with 4',6-diamidino-2-phenylindole (Vector Laboratories Inc.). FISH images were observed with a DMRXA2 (Leica Microsystems, Wetzlar, Germany).

Statistical analysis

Difference in FISH signal numbers between target-specific probes and control-specific probes were examined using Mann-Whitney test because the copy number is a discreet variable. For CNA-expression correlation analysis, we used an average CGH and expression level for each tenpercentile of probes. Stata version 10 (Stata, College Station, TX) was used for all statistical analyses and P value lower than 0.05 was considered significant.