Array-based karyotyping is a powerful new technique for assessing chromosomal copy number changes that provides information not previously obtainable by fluorescent in situ hybridization (FISH) or conventional cytogenetics—which can be both a blessing and a challenge. In this issue, Gunn et al.1 present atypical 11q deletions identified by array-based karyotyping of chronic lymphocytic leukemia (CLL) that may be missed by FISH panels used for prognostic stratification of this disease. Array-based karyotyping is gaining acceptance as a clinical tool, and physicians should be prepared to judiciously interpret results from these platforms. The advantages of array-based karyotyping are many and vary somewhat depending on what kind of array is used, but they include high-resolution, genome-wide copy number assessment in one assay; a permanent, numeric result that does not fade over time like the fluorescent signals of FISH; the ability to karyotype formalin-fixed paraffin-embedded tissues and the simultaneous capture of loss-of-heterozygosity (LOH) status if using a single-nucleotide polymorphism (SNP)-based array. However, because arrays can display the genome at high resolution, it is becoming apparent that the individual molecular lesions identified earlier by FISH are actually heterogeneous in both genomic length and copy number. The clinical meaning of these different subtypes of genetic lesions, if any, is yet to be determined. In addition, many genomic changes of uncertain clinical significance are identified by these platforms, and there are no standards for reporting such lesions or for archiving them, so that the reports can be amended as our knowledge of these lesions evolve.

Array-based karyotyping can be carried out with several different platforms, both laboratory developed and commercial. The arrays themselves can be genome-wide with probes distributed over the entire genome, or targeted with probes for genomic regions known to be involved in a specific disease or group of diseases, or a combination of both. Furthermore, array-based karyotyping can be carried out with ‘copy number only’ arrays or SNP arrays, which can provide both copy number and LOH status. The probe types used for ‘copy number only’ arrays include cDNA, BAC clones and oligonucleotides (for example, Agilent (Santa Clara, CA, USA), Nimblegen (Madison, WI, USA). Commercially available SNP arrays can be solid phase (Affymetrix, Santa Clara, CA, USA) or bead-based (Illumina, San Diego, CA, USA). Some arrays may contain both polymorphic (SNP-containing) and non-polymorphic (copy number only) probes, such as the Affymetrix SNP 6.0 array. The actual resolution of the virtual karyotype will depend primarily on the probe density, the probe performance, the quality of the DNA and the analysis software. Despite the diversity of platforms, ultimately they all use genomic DNA from disrupted cells to recreate a high-resolution karyotype in silico. The end product does not yet have a consistent name and has been called virtual karyotyping,2, 3 digital karyotyping,4 molecular allelokaryotyping5 and molecular karyotyping.6 Other terms used to describe the arrays used for karyotyping include SOMA (SNP oligonucleotide microarrays)7 and CMA (chromosome microarray).8, 9 Some consider all platforms to be a type of array comparative genomic hybridization (arrayCGH), whereas others reserve that term for two-dye methods, and still others who segregate SNP arrays because they generate more and different information than two-dye arrayCGH methods.

Regardless of the name of the assay or the probe types used, array-based karyotyping is becoming standard of care for many genetic applications and is now on the verge of bursting into clinical oncology. CLL is an ideal neoplasm to study with copy number arrays because (1) the genetic lesions with known clinical relevance are chromosomal gains and losses rather than balanced translocations and inversions, (2) DNA from a fresh sample is generally available making the analysis more straightforward than that for DNA obtained from formalin-fixed paraffin-embedded tissue, (3) the tumor burden is known from the flow cytometry results and can help guide downstream analysis, (4) the tumor burden tends to be relatively high in the peripheral blood and (5) enrichment for B cells or CLL cells is simple, cost effective and amenable to routine clinical use, which minimizes the effect of ‘normal clone contamination.’ Several groups have recently published manuscripts using copy number or SNP arrays to study CLL or to validate them for clinical use,5, 10, 11, 12, 13 and the technique has been successfully applied to several solid2, 3, 4, 14, 15 and liquid16, 17, 18, 19 tumors.

In CLL, it is typical to perform a standard FISH panel to determine copy number at key regions of the genome with well-established clinical significance, including 6q, 11q, chromosome 12, 13q14 and 17p. In this issue, Gunn et al.1 use a clinically validated array customized to interrogate all known CLL prognostic loci (179 probes) and 914 FISH-mapped linearly distributed clones for whole-genome coverage at an average resolution of approximately 2.5 Mb. Their manuscript highlights four atypical 11q deletions. These 11q lesions were considered atypical because they did not include the ATM gene, and two of the deletions were missed by the commercial FISH probe used for this locus (11q22.3 Vysis LSI ATM probe, Abbott Molecular, Des Plaines, IL, USA). The loss of the ATM tumor suppressor gene (TSG), located at 11q22, is often implicated in the pathogenesis of CLL. However, it is unlikely to be the sole cause of the 11q abnormality, as the minimally deleted region of 11q houses other potential candidate TSGs, such as RDX and FDX1 genes,20 and not all CLL patients with deletions of 11q have evidence of an ATM mutation of the remaining allele.21 Other groups have reported data suggesting that there is a slightly more telomeric but overlapping region that does not include ATM.21, 22, 23 This finding was corroborated by Lehmann et al.5 who also observed a second commonly deleted region at 11q that is telomeric to the ATM gene when they used SNP arrays to karyotype CLL samples. It is postulated that this region may harbor another TSG associated with the development of CLL, and that concurrent deletion of ATM and this other TSG may contribute to a poor prognosis. It is therefore likely that a single FISH probe cannot capture all clinically relevant lesions at this locus. This article underscores how copy number arrays not only allow us to detect lesions missed by FISH, but also enable us to further refine the break points of 11q deletions and help to determine the clinical relevance, if any, of the subtypes of 11q lesions.

Aside from variation in the genomic length of the 11q deletions described by Gunn et al.,1 other types of genetic heterogeneity at clinically relevant loci in CLL have been elucidated by array-based karyotyping. Sargent et al.10 report clinical validation data for a custom CLL oligonucleotide array and highlights length heterogeneity at the 13q14 locus. Patel et al.24 report their clinical validation of a custom CLL BAC array and also underscore the length heterogeneity seen at the 13q14 locus. Ouillette et al.25 used 50 K Affymetrix SNP arrays to subtype 13q14 lesions based on copy number (bi- or mono-alleleic loss) and break points, and they propose a subclassification of 13q14 lesions based on length and gene dose heterogeneity of this locus.

Although the lesions detected by FISH have generally been thought of as monolithic, most physicians can readily conceptualize length heterogeneity of these deletions. However, karyotyping with high resolution using arrays is bringing less easily conceptualized genetic lesions into play, such as copy neutral LOH (acquired uniparental disomy (UPD)) and regional copy number heterogeneity. UPD refers to a chromosomal region in which both copies of that region are acquired from the same parent, resulting in a copy number of two but with LOH (thus, copy neutral LOH). This can occur through mitotic recombination, chromosome non-dysjunction and loss of one parental chromosome with duplication of the other. Copy neutral LOH can act as the ‘second hit’ of the Knudson two hit hypothesis of tumorigenesis, similar to a deletion, resulting in the removal of the remaining wild-type allele of a TSG.26 Copy neutral LOH is reported to constitute 20–80% of the LOH seen in human cancers, both solid and liquid.16, 27, 28, 29, 30 Using Affymetrix SNP arrays, Pfeifer et al.12 and Lehmann et al.5 identified copy neutral LOH at clinically relevant loci in CLL. This is noteworthy because conventional cytogenetics, FISH or ‘copy number only’ arrays cannot detect this type of lesion.

Array-based karyotyping can also readily detect heterogeneity of copy number state and gene dosage within a particular chromosomal region—so called ‘regional copy number heterogeneity.’ For example, a region of 13q14 can be deleted in both chromosomes (homozygous deletion), whereas an adjacent, intervening or overlapping genomic region is deleted in only one chromosome (heterozygous deletion). Figure 1a depicts clonal evolution of a disease-associated locus, such as 13q14, showing a single chromosome pair in each cell. In the unevolved cells, there is a small, heterozygous deletion (light blue) involving only a few genes and underlying the FISH probe site. The cells that have undergone clonal evolution have lost a larger region of their other chromosome (heterozygous deletion, light blue) encompassing dozens of genes, which overlaps the original heterozygous deletion and converts it into a region of homozygous deletion. Although FISH can show the presence of a mixed population of heterozygously deleted cells (unevolved population) and homozygously deleted cells (cells that have undergone clonal evolution), it cannot determine the gene dosage effects acquired in the original deletion or during the clonal evolution. By defining the break points of each lesion, array-based karyotyping can identify, for example, that an evolved clone has lost both copies of SETDB2, RCBTB1 and ARL11, while still maintaining one copy of DLEU1, DLEU2, mir-16-1 and mir-15a. An example of a virtual karyotype showing regional copy number heterogeneity of the 13q14 locus in CLL is shown in Figure 2 (unpublished data). The karyotype was generated using DNA from peripheral blood, Affymetrix 250 K Nsp SNP array and CNAG3.0 analysis software.31 The dark blue bar in the Hidden Markov Model indicates a region of homozygous deletion (centromeric end) and the light blue band indicates a region of heterozygous deletion (telomeric end) within the same genetic lesion at 13q14. FISH for this sample showed a heterozygous 13q14 deletion in 13% and homozygous deletion in 43% of interphase cells, and most likely represents clonal evolution. However, regional copy number heterogeneity can be due to either clonal evolution or the existence of two separate clones in the sample (Figure 2b), and neither FISH nor array-based karyotyping can distinguish between the two possibilities. Regional copy number heterogeneity has been reported by others5, 13 and it can be either terminal, as in this example, or interstitial. The ability to detect the genetic heterogeneity illuminated by array-based karyotyping is exciting, but the clinical relevance of this heterogeneity—in genomic length, UPD or regional copy number—has yet to be vetted.

Figure 1
figure 1

Cellular basis of regional heterogeneity seen with array-based karyotyping. Heterozygous deletions are light blue and homozygous deletions are dark blue as depicted within a chromosome pair in each cell. (a) Clonal evolution, (b) two separate clones, both scenarios will result in apparent regional copy number heterogeneity when subjected to array-based karyotyping.

Figure 2
figure 2

13q14 regional copy number heterogeneity and efficacy of B-cell enrichment. (a) Single-nucleotide polymorphism (SNP) array-based karyotype of the 13q14 locus of a chronic lymphocytic leukemia (CLL) sample with 17% CD5+/CD19+ CLL cells by flow cytometry. (b) Array-based karyotype of the same sample as (a), but processed with a density-based B-cell enrichment step before DNA extraction revealing a 13q14 aberration that appears partially heterozygously deleted (light blue) and partially homozygously deleted (dark blue). HMM, Hidden Markov Model. Dark blue indicates copy number of zero, light blue indicates copy number of one and yellow indicates copy number of two. The log2 ratio plot is shown as a smoothed average over 10 SNPs.

This new way of assessing copy number raises the question, ‘What is the gold standard for copy number detection?’ Each technique has inherent strengths and weaknesses, and in many instances they complement rather than replace one another. Conventional cytogenetics has a very coarse resolution but can identify a multitude of structural lesions, if they are large enough and there is fresh tissue available. Copy number arrays can be used on fresh or formalin-fixed paraffin-embedded samples, but cannot detect balanced translocation, inversions or lesions in regions of the genome not represented on the array. In addition, many copy number analysis software programs used to generate array-based karyotypes will falter with less than 25–30% tumor cells in the sample. However, this limitation can be minimized by tumor enrichment strategies and/or software optimized for use with oncology samples. The analysis algorithms are evolving rapidly, and some are even designed to thrive on ‘normal clone contamination’31 so it is anticipated that this limitation will continue to dissipate. FISH, on the other hand, has a reported sensitivity of approximately 5–7%,10 but can assess only the specific region of the genome for which it is targeted and it cannot provide information about length of the deletion/duplication. In addition, it is not uncommon for the arrays to detect copy number changes in regions that FISH called normal/diploid because the lesions fell completely or partially outside of the region covered by the FISH probe.5, 24, 25 Whether array-based karyotyping will be carried out in lieu of FISH or conventional cytogenetics remains to be seen, and will most likely have to be determined on a tumor-by-tumor or case-by-case basis.

The results of the studies using array-based karyotyping to evaluate CLL have consistently reported high concordance with FISH panel results, and instances of non-concordance were explained by low tumor burden (<25–30% CLL cells in the sample), the presence of small subclones, the relatively low resolution of the arrays used in the study and/or differences in the populations of cells used for each assay. As alluded to above, the presence of normal cells admixed with the tumor population will dilute the signal from the tumor. Without tumor enrichment steps, genotyping and copy number algorithms will fail in the face of ‘normal clone contamination.’ The exact point of failure, in terms of the minimal percentage of neoplastic cells, will depend on the particular platform and algorithms used. Enrichment strategies for CLL include FACS cell sorting of CD5+/CD19+ cells,32 magnetic bead separation and density separation of B cells24 (Stem Cell Technologies, Vancouver, BC, USA). In our hands, enrichment for B cells not only evens the playing field with regard to comparing log2 ratios between samples, but also resolves lesions in samples with low tumor burden or small subclones. Figure 2 shows virtual karyotypes of chromosome 13 from a sample with 17% CD5+/CD19+ cells by flow cytometry. In the unenriched sample (top), the 13q14 deletion was barely discernable and not called by the Hidden Markov Model or segment reporting tools. After density-based B-cell enrichment, the lesion is readily evident to the automated calling tools of the software and to the eye of the observer. In this sample, 90% of the B cells were CLL cells, which is often the case in CLL. We recommend that samples with less than 30% CLL cells should either be enriched before DNA extraction or triaged directly to FISH. Additionally, flow cytometry sorting may be needed for hematopoietic neoplasms with multilineage involvement such as myelodysplastic syndromes. We have recently shown that multiple distinct clones may co-exist in different lineages in myelodysplastic syndromes by using SNP arrays to study the flow sorted marrow samples.33

As expected, the genome-wide copy number arrays not only see the loci interrogated by the FISH probes, but also see the rest of the genome. Subsequently, genetic lesions of ‘uncertain clinical significance’ as well as copy number polymorphisms are often detected—which can be both a blessing and a curse. The higher the density of the array used, the greater the number of lesions of uncertain significance/copy number polymorphisms that will be detected. In a research setting, this can be a rich source of data for an eager graduate student (or it can be like drinking from a fire hose). However, as copy number arrays move into routine clinical use for CLL and other tumors, the medical professionals performing these assays and signing out the reports will have to decide what ‘lesions’ will be included in the final report, and the ordering physicians will have to decide what to do, if anything, with such information. At this time, there is no consensus about what ‘lesions’ to include in a clinical report for oncology samples studied by copy number arrays. Some may advocate reporting only those lesions with well-established clinical relevance, whereas others may advocate listing all genetic aberrations identified in the sample. In CLL, listing the clinically relevant lesions in the diagnostic line and the others in the body of the report is a reasonable approach. This strategy becomes less tenable when reporting on paraffin-embedded solid tumors, which will have many more genetic lesions and more false-positive ‘chatter’—the reports could become quite unwieldy. However, it is imperative for laboratories to annotate and archive all genomic aberrations identified by the arrays, whether or not they are included in the final report. This is critical not only for the purpose of novel biomarker research, but also to be able to readily recall specific lesions, as our knowledge about them advances from ‘uncertain clinical significance’ to association or lack thereof with prognosis, diagnosis, or response to therapy, and to issue amended reports as indicated. At the time of this writing, the commercial tools available for this purpose are scarce and quite primitive.

Lastly, each type of array and analysis algorithm has inherent strengths and weakness, and the result obtained can be different depending on such variables as the array density, probe type and assumptions underlying the algorithms. These types of considerations are unfamiliar territory for most oncologist and pathologists. For example, UPD of key genomic regions in CLL has been reported at 13q145, 12 and 17p,5 and a SNP-based array can detect UPD, whereas arrayCGH cannot. So, one must consider whether a negative result at a given locus is negative because there is no deletion/UPD present or because the laboratory used an array that cannot detect UPD. But, be careful what you wish for. If you detect UPD at a clinically relevant locus, there are more questions to answer. Acquired UPD may represent two copies of a mutated TSG or it could represent two unmutated copies. Additional testing, such as sequence analysis, would have to be carried out to definitively answer that question.

Importantly, clinicians need to be aware of the strengths and limitations of the different types of copy number arrays being used clinically. Although a platform performance comparison using CLL samples has been published,34 conclusions about algorithms are often out of date by the time they go to press and new arrays are continually on the market. In this rapidly evolving field, both the arrays and the analysis software continue to mature. There are currently several platforms that have been clinically validated and are amenable to routine clinical use for detecting copy number alterations in CLL. The complexities of in silico karyotyping may fluster many physicians, but array-based karyotyping is most likely to be an increasingly used tool in CLL and other neoplasms. It is worth the effort to familiarize oneself with these platforms. These arrays are a seemingly endless source of novel potential biomarkers to investigate and are inching us ever closer to personalize medicine.