Analysis of differential gene expression in colorectal cancer and stroma using fluorescence-activated cell sorting purification

Tumour stroma gene expression in biopsy specimens may obscure the expression of tumour parenchyma, hampering the predictive power of microarrays. We aimed to assess the utility of fluorescence-activated cell sorting (FACS) for generating cell populations for gene expression analysis and to compare the gene expression of FACS-purified tumour parenchyma to that of whole tumour biopsies. Single cell suspensions were generated from colorectal tumour biopsies and tumour parenchyma was separated using FACS. Fluorescence-activated cell sorting allowed reliable estimation and purification of cell populations, generating parenchymal purity above 90%. RNA from FACS-purified and corresponding whole tumour biopsies was hybridised to Affymetrix oligonucleotide microarrays. Whole tumour and parenchymal samples demonstrated differential gene expression, with 289 genes significantly overexpressed in the whole tumour, many of which were consistent with stromal gene expression (e.g., COL6A3, COL1A2, POSTN, TIMP2). Genes characteristic of colorectal carcinoma were overexpressed in the FACS-purified cells (e.g., HOX2D and RHOB). We found FACS to be a robust method for generating samples for gene expression analysis, allowing simultaneous assessment of parenchymal and stromal compartments. Gross stromal contamination may affect the interpretation of cancer gene expression microarray experiments, with implications for hypotheses generation and the stability of expression signatures used for predicting clinical outcomes.

research and treatment. Gene expression profiles have been used to classify tumours (Bittner et al, 2000;Ramaswamy et al, 2001;Selaru et al, 2002;Shipp et al, 2002), study the biology of tumour progression and metastasis (Birkenkamp-Demtroder et al, 2002;Ramaswamy et al, 2003), predict clinical outcomes Huang et al, 2003;Iizuka et al, 2003), classify drug resistance (Hofmann et al, 2002;Chang et al, 2003) and identify novel drug targets (Marton et al, 1998). As the technology matures, it is pushing towards mainstream clinical application (Gershon, 2004;Jarvis and Centola, 2005;Cardoso et al, 2008). Surmountable challenges remain, such as standardisation and validation of the competing platforms and analysis techniques. However, the heterogeneity of clinical tumour samples remains a fundamental problem which must be addressed (Winegarden, 2003). All cell subpopulations in a sample contribute to the gene expression profile. Relatively homogenous cell populations yield optimal expression data, and increasing ratios of stromal cells may obscure the gene expression of parenchymal cancer cells (Emmert-Buck et al, 1996;Ross et al, 2000;Butte, 2002;Sugiyama et al, 2002). Stromal gene expression may cause misinterpretation of data, with important subtle cancer gene changes being masked by contaminating RNA, and increasing the potential for attributing incorrect functional gene associations (Smith et al, 2003). However, interactions between stromal cells and tumour parenchyma are increasingly recognised as an important factor in tumour biology and clinical outcome, and the delineation and retention of stromal gene expression has considerable value (Fukino et al, 2007;Patocs et al, 2007).
Laser capture microdissection (LCM) allows the selection of specific cells from tissue and could potentially circumvent many of these problems (Emmert-Buck et al, 1996). Sugiyama et al (2002) examined the expression profile of LCM dissected tissue and whole tissue. They demonstrated significant differences in expression profiles, finding that the overall difference in the gene expression profile was related to levels of stromal contamination. However, LCM is a costly, laborious and highly skilled procedure which yields small quantities of RNA, which renders clinical application impractical (Liu, 2007). It has also been demonstrated by Michel et al (2003) that the LCM process introduces a systematic bias into gene expression profiles. Another problem encountered in the estimation of stromal content histologically, and therefore by both LCM and macrodissection, is the 'reference trap'. A two-dimensional microscopic view of a complex three-dimensional structure, such as a tumour, leads to irreversible qualitative and quantitative loss of information (Nyengaard, 1999). This means that fractions of cells can be grossly under-or overestimated if unbiased sampling methods (i.e. stereological methods) are not used.
Fluorescence cytometry (FC) and fluorescence-activated cell sorting (FACS) allow simultaneous quantitation and multiparametric assessment of the phenotype of cells by staining with fluorochrome-conjugated antibodies (Afanasyeva et al, 2004). These are generally used to examine or sort peripheral blood samples (Waguri et al, 2003), to identify tumour cells in malignant effusions, to isolate clones and infrequently for the separation of specific cell populations from solid tissue (Afanasyeva et al, 2004;Suzuki et al, 2004). Advances in laser technology, speed of sorting and range of fluorochromes, make FACS a potentially useful method for identifying and purifying cell populations from solid tumours for analysis. Fluorescence-activated cell sorting overcomes many of the problems associated with LCM and macrodissection, by allowing systematic sampling of a large number of parenchymal tumour cells, allowing confirmation of the purity of targets and potentially, a better average of gene expression in cancer cells. We aimed to evaluate the feasibility of using FACS for producing homogenous cell populations for gene expression microarray analysis of colorectal tumour samples. Specifically, we wished to compare the differences in gene expression profiles elicited from whole tissue and sorted cells.

Colorectal carcinoma tissue
Colorectal carcinoma (CRC) tissue samples were obtained, with informed consent, from patients undergoing curative bowel resection (Table 1). All patients were Irish Caucasians. None of the patients received preoperative chemotherapy or radiotherapy. Three primary cancers were used to compare the gene expression of sorted cells and whole tumour samples, after optimisation of our FACS methodology. Collection of tissue was approved by the Clinical Research Ethics Committee of the Cork Teaching Hospitals.
Generation of a single cell suspension from colorectal cancer tissue Tumour samples were washed in Dulbecco's modified Eagle's medium (DMEM; BioWhittaker, Wokingham, UK) and macroscopic necrotic tissue was excised with a scalpel and washed in DMEM. A portion of the biopsy was immediately snap frozen in liquid nitrogen and stored at À801C until RNA extraction and approximately 1 g of tumour tissue was mechanically disaggregated, and subsequently enzymatically digested with bovine collagenase II, IV (Sigma-Aldrich, Dublin, Ireland) and DNAse I (Roche, Clarecastle, Ireland) at concentrations of 2 and 1 mg ml À1 , at room temperature (221C) for 45 -60 min. This mix was then filtered through 70 mm pore mesh (Becton Dickinson, Oxford, UK). All suspensions were generated under standardised environmental conditions, being kept at 41C, except for the enzyme digestion stage.

Flow cytometry
Dissociated cells in suspension were incubated on ice with mouse anti-human epithelial antigen (HEA) monoclonal antibody (mAb) conjugated with FITC (clone BER-EP4; Dako, Glostrup, Denmark), anti-CD14 mAb conjugated with phycoerythrin (PE) and anti-CD45 conjugated with PerCP mAb (both from BD Pharmingen, Erembodegem-Aalst, Belgium) or relevant isotype controls, for 45 min. The labelled cells were analysed and separated using FACS Vantage with CellQuest Pro software (Becton Dickinson). Establishment of the gates was based on the staining profiles of the negative controls, positive controls (SW-620 cells, labelled with HEA) and to eliminate low forward scatter signal events, eliminating debris, red cells and apoptotic cells.
The mAb BER-EP4 binds to a partially formol-resistant epitope on the protein moiety of two 34-and 39 kDa glycopolypeptides on human epithelial cells. It does not bind to any non-epithelial cells (Latza et al, 1990). Specifically, it does not bind to mesenchymal or lymphoid tissue. However, in large cell populations antibodies can bind in a non-specific manner. To control for this, we blocked antibodies with 1% fetal calf serum and used isotype control antibodies as negative controls. There is also a possibility of immune cells expressing the HEA antigen after ingestion of apoptotic cells. As immune cell infiltrate is a large component of stromal tissue, we decided to use antibodies to allow us to quantify and negatively select immune cells to avoid contamination in our sorted epithelial fraction. Phagocyte numbers have been found to increase from 1.5-to 2.5-fold in Duke's B and C tumours, respectively, and T cells by 1.4-fold in colorectal tumours Hogg, 1985, 1987). We decided that a combination of antibodies binding to CD14, which is the LPS receptor and is expressed strongly on the surface of monocytes, weakly on the surface of granulocytes and by most tissue macrophages, and CD45, a tyrosine phosphatase a critical requirement for T-and B-cell antigen receptor-mediated activation, which is expressed, typically at high levels, on all haematopoietic cells (expression is at a higher density on lymphocytes, approximately 10% of surface area is CD45), would be the ideal combination. It has previously been demonstrated that using positive selection of HEA-expressing cells and negative selection of CD45-or CD14-expressing cells yields using immunomagnetic cell sorting lead to high yields of epithelial cells from cell solutions (Zigeuner et al, 2000;Guo et al, 2004).

Cell sorting and confirmation of cell phenotype
A one-step, three-colour, sorting approach was used. Our goal was to positively select colorectal parenchyma and negatively select for stromal cells. A diagram illustrating the method is shown in Figure 1. Sorting gates were set for positive selection of HEA þ CD14 À CD45 À and negative selection of HEA À CD14 þ CD45 þ cells. Unstained cells, cells stained with isotype controls, were used for all samples, and SW-620 cells were used as a positive control for CRC cells. At least 7 million HEA þ cells were sorted, as below this level we found RNA quantity was variable. Cells were sorted into BD polypropylene flow tubes coated with 4% bovine serum albumin (Sigma-Aldrich). Purity of the sample was checked after sorting by reanalysing HEA þ CD14 À CD45 À fraction, on the same machine, after a full cleaning protocol. Purity greater than 90% was deemed acceptable. After sorting, cells were confirmed to be colorectal tumour cells by microscopic examination, by cytospinning on to Superfrost Plus microscope slides (BDH Laboratory Supplies, Poole, UK) followed by ethanol fixation and staining with Rapi-Diff (Cytocolor, Hinckley, OH, USA) or immunocytochemistry. Immunocytochemistry was performed using Dako MNF-116 anti-pan-cytokeratin antibody, using the standard EnVision kit protocol (Dako).

R2
Step 1. Gating out debris and identification of HEA positive population A D B C R2 R1 Step 3. R3 is set as target gate for cell sorting Step 2 This figure demonstrates our method for separating tumour stroma and parenchymal cells using fluorescence-activated cell sorting. Briefly, debris is gated out, target populations identified and positively (parenchyma) or negatively selected (stroma). Dot plots represent 10 000 events, and show side scatter and forward scatter plots in Step 1 and fluorescence plots in Steps 2 and 3. Cells have been simultaneously stained with mouse anti-human anti-HEA, anti-CD14 and anti-CD45 monoclonal antibodies conjugated with FITC, PE and PerCP, respectively, which are resolved on FL-1, FL-2 and FL-3.
Step 1 demonstrates our method of gating out debris and identification of the HEA positive population with scatter plots.
Step 2 demonstrates the identification of CD45 (A) and CD14 (B) positive populations, followed by gating of the HEA+ CD45-population (C) and gating of the HEA+ CD14-CD45-population (D).
Step 3 is the identification and the estimation of the pre-sorting parenchymal content (HEA+ CD14-CD45-population, R3 region), which is estimated at only 43% of cells in this sample, and the selection of the R3 region as a cell sorting gate. After cell sorting the flow cytometer is cleaned, and the post-sorting populations evaluated.
Step 4 shows a histogram demonstrating the post-sorting populations of cells. The histogram plots the fluorescence of HEA þ cells on FL-1 (region M2) against counts on the yaxis. There is clear separation of the tumour parenchyma (M2 region) and the stromal (M1 region) cells. We estimate the populations have a purity of over 90%.
We have also applied this method to successfully purify tumour parenchyma in CRC liver metastases, primary breast tumours, and with modification, to sort breast cancer bone marrow micrometastases.

RNA isolation
RNA was extracted from three corresponding whole tumour samples and FACS-purified tumour parenchyma using a modification of the Tri Reagent (Molecular Research Center, Cincinnati, OH, USA) protocol (Curtin and Cotter, 2004). RNA with an absorbance ratio A260/240 41.8 and no evidence of RNA degradation by gel electrophoresis was accepted. We then checked the RNA quality using the Agilent Bioanalyzer (Agilent, Santa Clara, CA, USA) runs. We used RNA with a RIN (RNA integrity number) value X8 (Schroeder et al, 2006).

cRNA preparation
The labelling of the total RNA was performed according to the 'Small Sample Labeling Protocol vII' (Affymetrix, Santa Clara, CA, USA). Total RNA (100 ng) was used as starting material for the first round of cDNA preparation. The first and second strand cDNA synthesis was performed using the Superscript II system (Invitrogen, Dublin, Ireland) according to the manufacturer's instructions except using an oligo-dT primer containing a T7 RNA polymerase promoter site. The first round of in vitro transcription (IVT) was performed using the MEGAscript T7 kit (Ambion, Warrington, UK). The second round of cDNA preparation was done as first round except now random hexamers replaced the oligo-dT primer.
Labelled cRNA was prepared using the BioArray High Yield RNA Transcript Labeling Kit (Enzo, Farmingdale, NY, USA). Biotin labelled CTP and UTP (Enzo) were used in the reaction together with unlabelled NTPs. During the labelling, the IVT product and also the fragmented IVT product were checked by gel electrophoresis. Following the IVT reaction, the unincorporated nucleotides were removed using RNeasy columns (Qiagen, Crawley, UK).

Oligonucleotide array hybridisation and scanning
Fragmented cRNA was loaded onto the GeneChip HU133 Plus 2.0 probe array cartridge (Affymetrix). The washing and staining procedure was performed in the Affymetrix Fluidics Station 450 (Affymetrix). The biotinylated cRNA was stained with a streptavidin -PE conjugate, and the probe arrays were scanned at 560 nm using a confocal laser-scanning microscope (Affymetrix Scanner 3000; Affymetrix). After hybridisation and scanning, we checked several quality parameters: scaling factor p3-fold difference within a study; 3 0 /5 0 ratio for probe sets for GAPDH p3; present (P) calls in the same range for all samples in the study and RawQ below 100. All of our arrays passed all stages of the quality control. The readings from the quantitative scanning were analysed by the Affymetrix Gene Expression Analysis Software (Affymetrix).

Statistical analysis of gene expression data
Affymetrix GeneChip array data were normalised, pre-processed and analysed using R and Bioconductor statistical software (Gentleman et al, 2004). Raw CEL file data from human whole genome Affymetrix U133Plus2 gene GeneChips of purified 'of whole primary' colon cancer samples (n ¼ 3) and tumour parenchyma samples purified by FACS (n ¼ 3) were imported into R. Initial exploratory data analysis performed using the overview function in the package made4 (Culhane et al, 2005) suggested that the assumption of a constant sum across all microarray samples may not be valid for these data. Moreover, there were significantly more MAS 5.0 P calls in the whole samples than in the  Step 4. After cell sorting purity of the cell populations is confirmed Figure 1 Continued. Differential gene expression in CRC parenchyma and stroma FACS-purified samples (paired t-test, Po0.05). Therefore, data were normalised using the Li and Wong's invariant set method using the 'expresso' function in the Affy package in Bioconductor (Li and Hung Wong, 2001;Gentleman et al, 2004). Normalised data were log2 transformed and assessed initially using use two exploratory data analysis approaches: hierarchical cluster analysis (1-Pearson correlation coefficient distance with average linkage joining) and dimension reduction using correspondence analysis (COA; Eisen et al, 1998;Fellenberg et al, 2001). Figures were created using the made4 package in Bioconductor (Culhane et al, 2005).

Detection of genes differentially expressed in purified tumour
Given the low number of replicates in this study, it is challenging to estimate of gene mean and variance; therefore, rank-based nonparametric methods may be more efficient in these data. It is   There was a significant difference in genes called as present in the wholetissue samples (P ¼ 0.0407, Student's paired t-test). These genes are therefore expressed in non-parenchymal (or stromal) tissue.
reported that rank product performs comparably or outperforms t-statistic-based methods when replicates numbers are very low (less than five) (Breitling et al, 2004;Jeffery et al, 2006). Rank products analysis is a non-parametric statistic that detects genes that are consistently highly ranked in lists, that is genes that are consistently upregulated genes in a number of replicate experiments. Rank products analysis does not require a measure of gene-specific variance and is therefore particularly powerful when only a small number of replicates are available. Rank products analysis was performed using the Bioconductor package RankProd. False discovery rates were estimated using 100 permutations. To aid interpretation of these genes lists, we used DAVID to assess which Gene Ontology biological and functional categories . This is despite the fact that they are paired samples. Correspondence analysis also demonstrates that the sorted samples and whole-tissue samples cluster together (B). The first axis (horizontal) splits the whole and FACS-purified samples. The second axis (vertical) split E1 and E1W from E6, E12, E12W and E6W. Genes that separate the samples are shown with HUGO classification.
were overrepresented in this list of genes (Dennis et al, 2003). We used the highest stringency level, for other analyses we used an EASE of 0.01, and false discovery rate of 1000. Heatmap images of gene expression profiles were generated using the made4 package in Bioconductor. The Human Genome Organisation (HUGO) gene symbols for Affymetrix probe sets were retrieved using the annaffy Bioconductor package and the annotation library hu133plus2 (build Tuesday, 4 October 2005, 20:53:27).

Patient demographics and FACS
Three matched patient biopsies were taken immediately after resection. No patients received pre-operative chemoradiotherapy.
Metastases (m1) were observed intra-operatively in one patient. The other patients were free from metastases (mx). Histological examination of the E1 and E12 samples demonstrated moderately differentiated adenocarcinomas with strand-like infiltrative pattern of malignant glands through muscularis propria, interspersed by stroma. The E6 sample demonstrated a welldifferentiated adenocarcinoma with closely packed glands invading into muscularis propria, with stroma between the glands. After generation of single cell suspensions from our experimental tumour biopsies, stromal content was estimated to range from 37 to 60%, and sorted to greater than 90% purity as described earlier (Figure 2). The parenchymal component of the tumours was estimated to range from 50 to 80% on histological assessment. The E6 sample demonstrated the biggest discrepancy in estimation of parenchymal content (FC estimate 37 vs 80% histological assessment), which may be explained by the fact that the sample contained a large muscularis propria component, which could account for the high proportion of non-staining FC events. Sorted cells were subsequently confirmed as tumour parenchyma by light microscopy assessment (Figure 3). We found the sorted cell population was homogenous and had the morphological appearances consistent with CRC cells after staining with Rapi-Diff and comparison to SW-620 colorectal cell line. The cells also stained positive for the cytokeratin MNF-116, confirming they were epithelial in nature. Using FC we found that the population of HEA þ cells fluoresced in the same region as cells stained with MNF-116.

Differential gene expression
There were significantly more MAS 5.0 P calls in the whole samples than in the FACS-purified samples (paired t-test, Po0.05; Figure 4). Ordination was used to explore the data, and correspondence analysis was applied to the data using the made4 package in Bioconductor (Culhane et al, 2005). Correspondence analysis is a useful dimension reduction method for observing the w 2 or associations between genes and samples. The dendrogram showed that the whole and the purified samples could be portioned into two distinct clusters ( Figure 5A). These clusters were also observed on the most variant or first axis of a COA of these data (Fellenberg et al, 2001) ( Figure 5B). Interestingly, the second most variant axis (F2, vertical) separated the metastatic (E1) and metastatic-free samples. Although we have few replicates in this study, it appeared that metastatic and metastatic-free tumour samples were more defined in the purified samples when compared to the whole samples. The discrimination between metastatic and metastatic-free samples accounted for more variance than difference between tumour stage. Expression of 289 genes were detected in whole-biopsy samples but not in purified samples (Po0.05). Of these, 50 differentially expressed genes were highly significant (Po0.01; Table 2). Expression of 103 genes were detected in purified samples, but significantly downregulated in whole samples (Po0.05), of which 33 of these were significant at Po0.01 (Table 3) which are displayed in the Heatmaps in. Heatmaps of the highly significant differentially expressed genes were generated, and displayed in Figure 6.

Functional annotation
We used DAVID to classify gene function in the whole tumour sample (Table 4). Most functional classes are consistent with stromal as opposed to tumour cell function (e.g., proteinaceous extracellular matrix P ¼ 2.37 Â 10 À08 , extracellular matrix P ¼ 2.49 Â 10 À08 , collagen triple helix repeat P ¼ 7.12 Â 10 À07 ), although genes known to be expressed in tumour epithelium were identified (e.g., GREM1). Looking at individual genes, upregulation of connective tissue genes was prevalent (e.g., COL6A3, COL1A2, COL12A1, COL5A2, COL3A1, CTHRC1, SULF1), as were genes involved in extracellular matrix function (e.g., LAMA4, PI3, POSTN, TIMP2) and cell adhesion (e.g., TNS1). Genes involved in endothelial function (e.g., TIMP2) and specifically, colon cancer tumour endothelium, such as the anthrax toxin receptor (ANTXR1), were also upregulated (Liu et al, 2008). In contrast, the 31 highly significantly expressed genes in the FACS-purified cells do not display characteristics of stromal gene expression, and may be representative of tumour parenchyma gene expression. Genes involved in cell signalling, such as SQSTM1, which regulates activation of the nuclear factor-kB (NF-kB) signalling pathway, receptor internalisation, and protein turnover, and RRAD, a member of the Ras/GTPase superfamily, are also overexpressed (Moyers et al, 1997;Seibenhener et al, 2007). Genes known to be expressed in CRC were also significantly upregulated, such as HOX2D and RHOB, which mediate apoptosis in neoplastic cells, and are targets for novel antitumour agents, such as farnesyltransferase inhibitors (Vider et al, 1997;Delarue et al, 2007).
Comparison with Kwong et al expression signature Kwong et al (2005) examined the expression signature derived from 60 tumours (normal mucosa, adenoma, tumour and liver metastases) and identified an expression profile that was able to differentiate between normal and neoplastic samples, but not individual tumour stages. They suggested that stromal genes may obscure the subtle molecular changes in tumours of differing pathological stage. Examination of their gene list, specifically the 34 upregulated genes in their signature, reveals the expression of extracellular matrix proteins such as, collagen, type I, a1 (COL1A1) and fibronectin. The authors believe this represented gene expression derived from infiltrating lymphocytes and other stroma. The gene list they identified shares similarities to gene expressed in our whole tumour sample (ANTXR1, COL12A1, COL5A2, CTHRC1, POSTN).

DISCUSSION
Our study demonstrates that it is feasible to reproducibly separate and purify tumour parenchyma and other cell populations from a single cell suspension generated from a solid tumour using FACS. We also found that the gene expression profile elicited from the whole tumour was significantly different from that of the purified tumour parenchyma, and that source of this differential expression may be tumour stroma. When tumour parenchymal purity is necessary, FACS may be an alternative to LCM, in particular in tumours such as CRC, melanoma and other nonsclerotic tumours amenable to the generation of a single cell suspension.
In our samples, we noted a large variance in estimated quantity of tumour parenchyma and stroma. Using FC we estimated that the parenchymal component of the tumours ranged from 37 to Row Z-score Colour key A Figure 6 Genes detected by rank products analysis (Po0.01). Heatmaps showing the differentially expressed genes detected in whole but not purified samples (A) and purified but not whole samples (B) using rank product analysis (Po0.01) using Z-score normalised values (row centred). Red tiles represent upregulated genes and blue represent downregulated genes.
60% and stroma from 40 to 63%. This was enriched to over 90% (range 90 -96%) in each case with one sorting run per sample after calibration of the machine and settings for each sample. Verification of cell type is straightforward with standard staining techniques. We believe this level of stroma would grossly affect the gene expression profile taken from the biopsy and would be in keeping with the findings of Sugiyama et al (2002). For publication we specifically used rank products as this has been shown to be reliable in small sample sizes in microarray experiments (Breitling et al, 2004;Jeffery et al, 2006). We found that the expression profile in the whole elicited also made biological sense. We are confident that there are real differences in the gene identified, which are related to the stromal gene expression, and that this has implications for clinical application of gene expression microarrays in CRC.
The MammaPrint assay, which is a clinical application of the van't Veer 70-gene breast cancer expression profile, relies on a single fresh sample of tumour to predict prognosis Cardoso et al, 2008). The samples are examined, and a stromal content of o50% is deemed acceptable for the test. This would arguably eliminate two of our samples from analysis, which we were able to enrich to 490% purity. However, Wang et al (2004), who derived an expression profile predicting recurrence of Duke's B colorectal carcinoma, included only samples that were enriched to over 85% purity. We believe that in CRC samples, the stroma will contaminate the sample causing problems with patient classification, but that can be overcome with parenchymal purification. The optimal tumour/stroma ratio for gene expression studies is yet to be determined and may vary depending on the tumour type.
To ensure good quality expression data, we used several layers of quality control, starting with RNA gel electrophoresis and then checking RNA integrity with the Agilent Bioanalyzer (Agilent). Subsequently, after hybridisation and scanning, we checked several quality parameters such as scaling factor, 3 0 /5 0 ratios, P calls and RawQ values. Quantitative RT -PCR was not employed as all of our arrays passed all stages of the quality control, and we do not believe that it will be used in clinical practice. This has been borne out by the current application of the MammaPrint assay and is a similar approach to other clinical microarray studies (Wang et al, 2004;Glas et al, 2006;Ach et al, 2007;Cardoso et al, 2008).
We found a significant difference in the total number of probes called as present using the Affymetrix MAS5 signal algorithm. This shows that the whole tissue expressed a larger number of genes than sorted cells, demonstrating the wide range of genes expressed by non-tumour cells. We used Li and Wong's invariant set method for normalisation of the data sets (Li and Hung Wong, 2001 Figure 6 Continued Differential gene expression in CRC parenchyma and stroma uses a set of non-differentially expressed genes to normalise data that are identified by using an iterative procedure. Gene expression common to the sorted cells and whole tissue should be reasonably uniform. Any genes found to be expressed in the whole tissue can be presumed to be contained in stromal tissue. We then sought to examine how similar the gene expression profile of each sample was. Using hierarchical clustering, all three sorted samples clustered together, as did the whole-tissue samples clustered together. Correspondence analysis also showed a clear separation of sorted samples and whole-tissue samples. The E1 and E1W sample also separated from the other samples. This is not surprising as this sample was from a more advanced tumour than the others. The stromal component of the whole-tissue sample was the biggest determinate in differences and could easily separate all samples. This may explain why some studies show very similar GEP throughout tumour stages, and the similarity of some of the genes in our whole tumour sample to that of Kwong et al (Birkenkamp-Demtroder et al, 2002;Kwong et al, 2005). Qualitatively the expression of the whole tumour samples were consistent with tumour stroma, with genes highly specific for colorectal tumour stroma (e.g., ANTXR1), and DAVID analysis identifying highly significant functional groups involved in extracellular matrix function. Conversely, FACS-purified parenchyma expressed genes specifically associated with colorectal neoplasia, such as HOX2D and RHOB. The ability to examine the samples in parallel affords increased precision in analysis of tumour fraction gene expression and offers new opportunities to examine tumour -stroma interactions. Fluorescence-activated cell sorting parenchymal purification has several advantages over LCM. Disaggregation of a tumour sample generates a large random sample of tumour cells and may elicit a more representative and relevant gene expression profile than LCM, without the need for RNA amplification. Previous studies have assessed the differences between LCM acquired tissue, macrodissection and whole-tissue samples for microarray studies (Sugiyama et al, 2002;Michel et al, 2003;de Bruin et al, 2005). Sugiyama suggests that if the stromal compartment is 430% LCM should be used, and showed marked differences in GEP in LCMderived tissue compared with bulk biopsy. Similarly, Michel et al (2003) demonstrated that the LCM process introduces a bias into GEP profiles. Although they found that large expression changes were maintained, many genes changed with lower expression levels may be lost. This is problematic for several reasons -particularly as smaller changes in mRNA expression may have larger effects downstream than larger ones. Also it makes comparisons difficult between studies. Although LCM aims to overcome such problems, the very premise it is built on may introduce bias. A twodimensional microscope view of a complex three-dimensional structure such as a tumour leads to irreversible qualitative and quantitative loss of information (Nyengaard, 1999). This means that fractions of cells can be grossly under-or overestimated if unbiased sampling methods such as stereological methods are not used (the 'reference trap'). At worst it can lead to a gene expression profile of a tiny fraction of tumour being misinterpreted as expression of the whole tumour. Macrodissection is subject to similar compromises. Fluorescence-activated cell sorting overcomes many of these problems by allowing systematic sampling of cells, providing a large sample of cells, which allows confirmation of purity of targets and also a better average of gene expression in a tumour.
In conclusion, FACS is effective in producing homogenous cell populations for gene expression microarray experiments in solid tumours and is viable alternative to macrodissection, LCM and whole tumour sampling in microarray experiments. Fluorescenceactivated cell sorting overcomes many of the practical and theoretical problems associated with LCM. The gene expression profile of FACS-purified tumour parenchyma is significantly different to that of clinically resected tumour biopsies. Our analysis suggests that stromal gene expression is responsible for the differential expression and makes a significant contribution to the gene expression profile of whole tumour CRC biopsies. Therefore, one should consider a purification strategy when planning solid tumour gene expression microarray experiments. Although many of the sources of technical noise and variation in gene expression microarray technology have been overcome, there remain challenges, such as the approach to tumour heterogeneity, which need to be overcome before it is accepted into clinical practice.