Technology Insight: tuning into the genetic orchestra using microarrays—limitations of DNA microarrays in clinical practice
Ambreen Abdullah-Sayani, Jolien M Bueno-de-Mesquita and Marc J van de Vijver* About the authors
Correspondence *Department of Pathology, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
Email m.vd.vijver@nki.nl
Summary
Scientific advances in the field of genetics and gene-expression profiling have revolutionized the concept of patient-tailored treatment. Analysis of differential gene-expression patterns across thousands of biological samples in a single experiment (as opposed to hundreds to thousands of experiments measuring the expression of one gene at a time), and extrapolation of these data to answer clinically pertinent questions such as those relating to tumor metastatic potential, can help define the best therapeutic regimens for particular patient subgroups. The use of microarrays provides a powerful technology, allowing in-depth analysis of gene-expression profiles. Currently, microarray technology is in a transition phase whereby scientific information is beginning to guide clinical practice decisions. Before microarrays qualify as a useful clinical tool, however, they must demonstrate reliability and reproducibility. The high-throughput nature of microarray experiments imposes numerous limitations, which apply to simple issues such as sample acquisition and data mining, to more controversial issues that relate to the methods of biostatistical analysis required to analyze the enormous quantities of data obtained. Methods for validating proposed gene-expression profiles and those for improving trial designs represent some of the recommendations that have been suggested. This Review focuses on the limitations of microarray analysis that are continuously being recognized, and discusses how these limitations are being addressed.
Review criteria
The information for this Review was compiled by searching the PubMed and MEDLINE databases for articles published up to 15 November 2005. Electronic early-release publications were also included. Only articles published in English were considered. The search terms used included "DNA microarray" in association with other search terms: "reviews", "RNA", "DNA", "gene expression profiling", "classifier", "molecular markers", "tumor markers", "cancer biomarker", "prognosis", "prognostic profile", "predictive profile", "multivariate analysis", "classification", and "risk assessment". References were selected based on the best clinical or laboratory evidence, especially if the work had been corroborated by published work from other centers. Priority was given to studies in high impact factor journals when available. When possible, primary sources have been quoted.
Keywords:
breast cancer, classifier, gene-expression profile, limitations
Introduction
The discovery of 25,000 human genes as a result of the completion of the Human Genome Project has greatly accelerated the research into genotypic–phenotypic correlations in an aim to elucidate the functional taxonomy of genes in both normal tissues and disease states. The use of microarray technology to perform gene-expression profiling is an important adjunct to our current knowledge of genetics, because microarrays allow the study of thousands of genes simultaneously in a single, standardized, and cost-efficient experiment. Gene expression is a broad term used to describe the transcription of information encoded within DNA sequences into mRNA, and the subsequent translation of the mRNA information into proteins that regulate cell function. The gene expression of all cells is a highly dynamic process that alters in response to cell requirements and environment changes, and as a consequence of disease. A brief summary of the applications of gene-expression profiling by microarray analysis and additional application of microarray platforms are described in Table 1.
Table 1 Examples of current applications of microarray technology.a
Full tableFigures & Tables indexDownload Power Point slide (254K)
Gene-expression profiling has been used to study infectious and immunological diseases, but the predominant focus of microarray-based research has been in the study of cancer. Microarray analysis has made it possible to identify groups of genes according to their expression pattern or 'genetic signature', and this could potentially ameliorate the clinical management of a patient. Despite the vast amount of research in this field, a disparity exists between the quantity of publications and the quality of their interpretation. The quantity of articles that document the discovery of new gene profiles is as plentiful as the number of publications that scrutinize their interpretation.1
In this article we will briefly describe the methods of microarray-based gene-expression profiling and highlight the technique's limitations, demonstrate the various levels at which both inter-experimental and intra-experimental variability can occur, summarize the suggestions that have been proposed to limit these problems, and describe the current status of genetic profiling in oncological practice.2 The different steps involved in gene-expression-profiling using microarray technology and the major limitations of this method are depicted in Figure 1.
Figure 1 Key considerations when performing gene-expression profiling using microarray analysis.
The different steps in this process and their limitations are summarized. There are seven major steps in the development of a gene-expression signature using microarrays before it is ready for implementation as a test in clinical practice. The first step consists of defining the scientific question that should be answered by the output data. The accompanying hypothesis will help to construct a solid study design and to determine the type of samples and the applicable inclusion criteria (steps 2 and 3). Step 4 includes the actual microarray experiment by which the gene-expression data are obtained. The statistical analysis of the data generated in the experiment and the construction of the actual gene-expression profile is addressed in step 5. Step 6 includes the independent validation of the results, which is indispensable for the transition to step 7, the clinical implementation of the gene-expression profile. Figure modified with permission from reference 21 © (2002) Nature Publishing Group.21
Full figure and legend (71K)Figures & Tables indexDownload Power Point slide (278K)DNA microarray technology
A DNA microarray is an ordered arrangement of equidistant microscopic DNA spots attached to a solid surface, such as a glass, plastic or silicon chip. Hybridization is performed using corresponding probes that recognize and attach to the solid support; these can be complementary DNAs (cDNAs), oligonucleotides of varying length, or genomic sequences that are either radioactively or fluorescently labeled (Figure 2). An array containing thousands of spots immobilized at predetermined locations can be generated by applying the DNA (e.g. cDNA or oligonucleotides) to the array using pins3 or inkjet technology,4 or by in situ photolithographic synthesis of oligonucleotides.5 For example, if inkjet technology is being used, an adapted color inkjet printer is used to synthesize oligonucleotides: each of the four color reservoirs is filled with a solution containing one of the four nucleotides from which DNA is made. The oligonucleotide sequences for each gene can then be assembled using a simple text file. Thus, the probes of a microarray are made of DNA and these probes are then used to detect the abundance of specific mRNA (transcripts) from the genes that correspond to the sequence of the probes on the array.
Figure 2 Gene-expression profiling using microarray analysis.
The main steps involved and limitations of microarray analysis will be encountered. Two stages can be distinguished in microarray experiments: a pre-microarray experiment phase (tissue handling) and the microarray experiment phase (labeling of the RNA). In a microarray experiment the glass slide with the labeled DNA of interest plays a key role. As can be seen, a solid surface (in this example a glass microscope slide) contains thousands of spots. Each spot contains a large number of identical DNA fragments. Fluorescently labeled RNA from the samples are subsequently hybridized to the arrays. In this way, the amount of DNA fragments per spot indicates the expression level of a gene. The expression level of thousands of fluorescently labeled genes or spots on one microscope slide can be visualized with a fluorescent scanner. For each gene on the array, the amount of fluorescently labeled RNA bound represents the expression level of that gene in the tumor sample. The intensity of fluorescent signal can be measured and used in the statistical analysis. For each spot, the DNA fragments are derived from one specific gene. Figure courtesy of Dr R Kerkhoven. Abbreviations: DHFR, dihydrofolate reductase; E2F1, E2F transcription factor 1; RB, retinoblastoma; SRC, sarcoma.
Full figure and legend (79K)Figures & Tables indexDownload Power Point slide (285K)The concept of DNA or oligonucleotide arrays began in the mid-1980s. The first DNA arrays comprised nylon filters on a glass slide containing cDNA probes; these filters contained a much lower number of probes than the arrays that are currently used and were typically used with radioactively labeled targets. The introduction of pin-based robotic systems made it possible to dispense smaller volumes of DNA (approximately 150 microns) onto a glass slide, thus enabling a higher throughput system that represented one of the first microarrays.6, 7 A summary of the developments in microarray-based gene-expression profiling is provided in Table 2.
Table 2 The evolution of microarrays for gene-expression analysis over the years.6
Full tableFigures & Tables indexDownload Power Point slide (241K)
Microarray analysis of cDNAs spotted onto glass slides was developed at Stanford University and has been used to study the expression levels of large numbers of mRNAs in cell lines and tumors.8 In brief, mRNA is isolated from cells and reverse transcribed in the presence of red-fluorescent-labeled nucleotides. The resulting fluorescent cDNA is then mixed with a green-fluorescent-labeled reference cDNA and the mixture is hybridized to the microarray. The reference mRNA is usually prepared from a mixture of cell lines, tumor samples, or normal tissues. Using a fluorescent scanner, the fluorescence level is digitized, and for each cDNA on the microarray the level of gene expression, relative to the reference, is determined and transferred to a linked computer database.
Similar methodology is used for oligonucleotide-based arrays. With this system of microarray analysis, the expression of each gene in a sample is also measured relative to the expression of the same gene in the reference mRNA. The photolithographic synthesis of oligonucleotides is a procedure developed by Affymetrix (Santa Clara, CA); for each gene, several different oligonucleotides are present on the array.5 In this technique, the hybridization on the array platform is carried out using RNA from the sample to be analyzed, without the use of a reference RNA; instead, oligonucleotides containing one mismatch in their sequence are used to correct for background hybridization. An array containing 25,000 probes can provide information on the expression of all genes present in the genome. For many genes, multiple splice variants have been identified, and for sufficient array detection an increasing numbers of probes are being developed.
Gene-expression profiles can provide an enormous amount of information on cell function; however, it is important to realize that these profiles can only be used to interpret cellular changes that affect mRNA synthesis. After translation of mRNA into protein, many secondary protein modifications also play an important role in cell regulatory processes. Therefore, high-throughput analysis of proteins ('proteomics') will also contribute greatly to cancer research, an issue that was recently reviewed by Gulmann et al.9 Proteomic experiments can be tedious and have a lower throughput than RNA-based experiments;10 however, recent advances in proteomics have greatly increased the throughput of proteomic experiments.
Recently, small RNA molecules that do not code for proteins, such as microRNA, RNA interference, small interfering RNA (siRNA), and small modulatory RNA, have been discovered; these small RNAs have an important role in gene regulation.11 Genes that regulate the expression of molecules such as microRNA are found in the non-coding regions (or introns) of the genome. Although these RNAs do not code for proteins, they can control gene expression at the post-transcriptional level by degrading or repressing mRNA,11 thus influencing critical functions and biological processes.12 Although microRNA consists of only 1–5% of human genes (which equates to about 1,000 microRNA genes), each microRNA might regulate as many as 200 genes, which implies that over one-third of the genes that encode for proteins are regulated by microRNAs.11, 13 MicroRNA expression profiles can be used to classify human cancers12 because they reflect the developmental lineage and differentiation state of the tumors. Compared with normal tissue, microRNAs in tumors are generally downregulated and could thereby provide promising insight into tumor development and recurrence. For example, microRNA profiles are better predictors of metastatic origin for carcinomas of unknown primary (CUPs) than are conventional gene-expression signatures using mRNA.14
Statistical analysis of gene-expression data
Although not necessarily based on a gene-specific mechanistic hypothesis, good gene-expression profiling experiments should be planned and conducted with a clear objective, design and statistical analysis strategy.15 The main strategies used to identify categories of tumors by gene-expression profiling are unsupervised and supervised classification (Box 1). An important difference between these two methods is that for supervised classification clinical or pathologic information is used to find correlations with gene-expression patterns, whereas with unsupervised methods the tumors are grouped on the basis of their gene-expression pattern independent of the clinicopathological status.16, 17 A commonly used unsupervised method for examining gene-expression data is two-dimensional hierarchical cluster analysis. In this method, the tumors and genes are ordered according to similarities in their gene regulation, resulting in the clustering of tumors according to similarity in gene expression. This ordering often results in clustering of genes responsible for cellular processes, such as proliferation or inflammation. In this manner, several 'signatures' can be analyzed that reveal specific properties that the tumor cells and non-tumor cells contribute to the tumor mass. With time it is expected that an increasing number of signatures will be identified, making the correlation between gene-expression signatures in tumors and clinical behavior a powerful prognostic and predictive approach.
Box 1 Definitions of terms used in microarray technology.
Unsupervised clustering (unsupervised classification)
The purpose of any clustering method is to group entities based on similarity of features. Gene-expression data derived from multiple microarray experiments, genes and tumor samples can be clustered on the basis of similar expression profiles. Hierarchical clustering produces a representation of the data in the shape of a binary tree, in which the most similar patterns are clustered in a hierarchy of nested subsets, resulting in the grouping together of tumors that are similar in their overall gene-expression profiles. Binary trees are two-dimensional diagrams also known as dendrograms, which illustrate the fusions or divisions made at each successive stage of analysis.
Supervised clustering (supervised classification)
Supervised clustering is a statistical method used to assess similarities between certain tumor types. This method can be defined as grouping of variables (e.g. genes) controlled by information about the Y variables (i.e. variable of interest), such as the tumor types of the tissues or clinical outcome of a disease. Supervised clustering is applied to classified samples with the objective of identifying clusters that have a high probability of sharing certain features characteristic of the class of interest.
Metagenes
Metagenes are linear combinations of individual gene-expression values, and have the potential to classify and predict cellular phenotypes resulting from deregulation of oncogenic pathways.
Adjuvant systemic therapy
A treatment that is used in addition to the primary therapy, such as surgery, to ensure that all microscopic disseminated cancer cells are destroyed. Adjuvant therapy for cancer can include cytotoxic chemotherapy and/or hormone therapy.
To find gene-expression patterns that can predict the clinical behavior of tumors (as in class comparison and class prediction), it is more appropriate to use a supervised classification that can distinguish specimens on the basis of predefined clinical and pathologic information, because the classes are already known. Several methods for supervised classification have been developed, and although these techniques were not specifically designed for the analysis of gene-expression data they have already shown their usefulness for this purpose.16 Fundamental to each of these techniques is the identification of the pattern of expression of a combination of genes that can predict tumor behavior (e.g. the risk for development of distant metastases, or responsiveness to specific treatments).
A third classification method is functional annotation of gene-expression algorithms based on data obtained from in vitro and in vivo experiments. An example of such an algorithm is the 'wound response signature'. Chang et al. compared the gene-expression profile of fibroblasts exposed to 10% serum with fibroblasts growing at low serum conditions; this gene-expression signature was termed 'wound response signature' because of the fact that fibroblasts are only exposed to serum in a wound.18 Based on the expression pattern in this 'wound response signature', breast carcinomas can also be divided into 'wound response signature activated' and 'wound response signature quiescent' tumors. Other gene-expression signatures discovered using in vitro experiments can in a similar way be applied to gene-expression data in malignant tumors. Chang et al.19 have combined three algorithms—the molecular subtypes or portraits (derived from unsupervised hierarchical clustering of breast carcinomas),20 'wound response signature',18 and the '70-gene signature'21, 22—to reveal links between wound healing and cancer progression in breast cancer. A gene-expression profile based on the wound response signature provides the basis for prospectively assigning a prognostic score to patients and can be scaled to suit different clinical purposes. The wound response signature improves risk stratification independently of known clinicopathological risk factors such as tumor size, histologic tumor grade and nodal status, and previously established prognostic signatures based on unsupervised hierarchical clustering (i.e. 'molecular subtypes'20) or supervised predictors of metastasis (i.e. the '70-gene signature'21, 22).
Even when distinct subgroups are defined using gene-expression profiles, considerable heterogeneity within the broadly defined groups can confound the ability of the classifier to accurately predict outcomes for individual patients. One strategy by which to circumvent this heterogeneity is to combine or integrate multiple gene-expression patterns rather than a single gene-expression pattern, which provides a more powerful and robust classifier. Such defined sets of genes have also been termed 'metagenes' (Box 1); such metagenes can be combined with clinicopathological data to obtain optimal prognostic and predictive classifiers.23 Moreover, this framework provides a mechanism for combining multiple forms of data—both genomic and clinical—to characterize most individual patients' gene profile and achieve the goals of personalized treatment.
Limitations of microarray technology
The complexity of biological systems
Gene-expression profiling studies of human malignancies are highly complex, and the successful execution of such a study requires close collaboration between surgeons, pathologists, molecular biologists and bioinformaticists.24, 25 The high-throughput nature of this technology combined with the expected plethora of data results in a high opportunity for errors.26 To ensure the accuracy and reliability of the resulting data, it is essential, therefore, that experiments are tightly regulated and quality controlled.27 At present, the sophistication of microarrays renders this a costly technology and consequently it is only available to specialized institutions.6, 28 Microarray technologies, however, are rapidly improving and the costs of the technique continue to fall, thus paving the way for wider access and more-generalized usage.
The tissue sample
Frozen tumor tissue is indispensable for microarray experiments because the rapid fixation method ensures that the RNA quality remains optimal, with the result that subtle alterations in levels of differentially expressed genes can be accurately detected. Frozen tumor samples, however, are not widely available, because in most hospitals the tumor samples are directly fixed in formalin and embedded in paraffin blocks; storage of tissue in this way has been shown to result in RNA degradation.29 Paraffin-embedded tissue is not, therefore, ideally suitable for finding new gene-expression signatures, but may be more adequate for the validation of new gene-expression profiles for which the differentially expressed genes are already known.
Experiments making use of mRNA have distinct problems. To begin with, there may be very small amounts of mRNA within the cell itself,30 which can be a significant problem in smaller sized tumors.28 mRNA is a very fragile molecule that can degrade within minutes of surgical manipulation,31 drastically affecting the interpretation of microarray data.32 Likewise, subtle variations in tissue handling and method of RNA extraction from samples can result in different levels of gene expression.1 Although there are no systematic studies of this subject, it is clear that measurements of specific mRNAs with a short half-life will be greatly affected by the interval between surgical removal and freezing of the specimen. These technical issues are also compounded by the added complexity of the heterogeneity of the tumor,31 and this heterogeneity expands as we consider the different individuals and populations that harbor the same genetically unstable tumor.1 In addition, the expression pattern in tumor cells is also determined by gene expression in the many other cell types present in varying degrees in clinical tumors, such as fibroblasts, endothelial cells, and inflammatory cells. Rigid adherence to experimental technique, consistency in the timing and sampling of tissue,27 careful documentation of the amount of tumor cells in a sample and duplication of the experiment using a single reference RNA1 are all means by which sources of error can be eliminated. Technological advances also enable standardization of this process.
The search for differentially expressed genes
The hunt for differentially expressed genes involves a number of steps, including chip production, probe hybridization, image quantification, normalization and data interpretation, before the biological question can be addressed.33 Until a consensus is reached to standardize each process, inter-experimental variability will remain commonplace, inhibiting the transfer of microarrays from the bench to the bedside.33 Pitfalls that clinicians should be aware of include unstable molecular signatures and gene misclassification.
The list of genes included in a molecular signature (based on a training set and the proportion of misclassification seen in one validation set) depends not only on the statistical methods (discussed in the next section), but also to an even larger extent on the selection of the patients in the training sets. Michiels et al. found that five of the seven largest published studies addressing cancer prognosis did not classify patients better than chance.34 Another important issue is that results from different microarrays are expressed as levels in relation to a nonstandardized reference RNA. It has been recognized that the implementation of standardized units for the level of gene expression will be an important advance towards regulation of microarray data.2 Not all microarray problems, however, are of such a large scale. Simple experimental errors such as the mishandling of plates can give rise to subsequent errors in data. The importance of tracking all levels of an experiment to allow evaluation and correction (if needed) has been shown by investigators in Stanford University who created an algorithm called MuFu (MixupFixup) to facilitate recognition of production errors prior to hybridization.26 Technical errors in an experiment can give rise to missing data, and this will eventually affect the statistical outcome of a study.35 Tools such as LinCmb,35 GMC (Gaussian mixture clustering),36 and BPCA (Bayesian principal component analysis)37 have been developed to imput the missing values of an experiment and improve the significance of a study.
Statistical analysis
The proposed association of a genetic signature with disease outcome has been observed to be significantly stronger in preliminary studies than in subsequent research.38 This outcome can be partially explained by 'overfitting', which is one of the major limitations of supervised clustering methods. Overfitting indicates that the number of parameters in a model is too great relative to the number of cases or specimens available. Because the gene-expression pattern is optimized to predict tumor behavior, the model will fit the original data but might predict poorly for independent data. Consequently, it is essential to obtain an unbiased estimate of the true error rate of the predictive power or accuracy of a gene-expression pattern. Methods for obtaining improved predictors include 'leave-one-out' cross-validation; with this method, a gene-expression predictor for a specific clinical behavior is built by leaving out one or more samples. Cross-validation involves repeatedly splitting the data into a training set containing most of the samples and a test set containing the remaining samples, and this process is repeated for the many training-set and test-set partitions. The resulting error rates observed for the test sets are then averaged, and the predictions for each test set are unbiased estimates of the true prediction error because the test-set samples are not used in the development of the model used for their prediction.39 Subsequently, it is tested whether the gene-expression pattern correctly predicts the clinical behavior of the left-out sample(s). A predictor that is produced as a result of a small, properly cross-validated error rate for a collection of tumor specimens is a potentially important finding, but one that still requires further validation.16 Although in studies with a small sample size (which can be arbitrarily defined as <50 cases) cross-validated error estimates are nearly unbiased, they tend to have large variance and confidence intervals.
A study with a sample size in the thousands is better equipped to understand the degree of biological variation within an experiment,31 and the resulting genetic signature will also have greater statistical significance and clinical applicability.40 It is essential to validate a predictive gene-expression pattern in a sufficiently large independent series of patients. Unfortunately, a major limitation is the unavailability of large amounts of frozen tumor samples. One limitation of unsupervised cluster analyses is that it provides qualitative and not statistically valid quantitative information on differences in expression level between genes or classes. Although unsupervised clustering gives insight into the differentially expressed genes (quality), it does not quantify the level of upregulated or downregulated expression (quantity). Many authors make the mistake of performing Kaplan–Meier statistical analyses on series of samples that do not belong to a cohort of unselected patients. For a good and profound validation of prognostic or predictive gene-expression profiles, an unselected patient cohort is indispensable.
The applicabilty of microarrays in clinical practice
Microarrays have been used to study several tumor types, most notably breast,21, 22, 41, 42 ovary,43 colon,44 gastric, leukemias,45, 46 malignant lymphoma,47 prostate,48, 49 lung,50, 51 CUP,52 and malignant melanoma.53 Analysis and accurate clinical interpretation from the currently available data is a challenging task, as there are numerous inter-experimental variations that can significantly influence the interpretation of results. Proper statistical analysis and independent validation in large series of patients is the only way to address this dilemma. To understand the current status and relevance of the gene-expression profiles that have been developed, we have chosen to highlight a few important examples. Since current clinical trials evaluating the use of microarrays in the management of patients with breast cancer are ongoing, we have chosen to separate this subject for purposes of detailed discussion in a separate section.
Hematologic malignancies
Acute leukemia is divided into histologic subgroups of clinical relevance—for example, lineage (B-cell or T-cell) and molecular subtypes—that differ with respect to clinical outcome and response to treatment. Current methods of cytogenetic and morphological analysis, however, are tedious and not widely available. Microarrays can revolutionize this diagnostic process, because all relevant histopathological abnormalities can be assayed in a more standardized manner.54 Recent studies using microarrays have revealed more-refined classification of disease subtypes than provided by currently used methods.
Yeoh et al.45 studied 360 cases of pediatric acute lymphoblastic leukemia. Using unsupervised hierarchical cluster analysis, these researchers were able to correctly identify the leukemia subtypes of prognostic significance, namely T-cell acute lymphoblastic leukemia, E2A-PBX1, BCR-ABL, TEL-AML, mixed-lineage leukemia rearrangement, and hyperdiploid with more than 50 chromosomes. The authors also identified another subgroup with a unique genetic signature, the significance of which is yet unknown. The same group was able to develop a class discriminator that had a diagnostic accuracy of 97% upon reanalysis of their data.46 This gene-expression profile holds great promise in clinical practice, and efforts should be made to refine the signature for general application. Gene-expression profiling is not only aimed at better classifying disease entities, but can also lead to the identification of novel targets for therapy. For example, mixed-lineage leukemia cells have distinctly elevated levels of the receptor tyrosine kinase FLT3, and thereby represent an opportunity for targeted therapy. Armstrong et al.55 studied the effect of the small molecule inhibitor of FLT3, and showed that it halted tumor progression.
Gene-expression profiling of diffuse large B-cell lymphoma (DLBCL) has allowed the categorization of this cancer into groups based on cellular origin;47 in one group the gene expression was characteristic of germinal-center B cells (i.e. germinal-center B-like DLBCL), whereas the other group consisted of genes normally induced during in vitro activation of peripheral blood B cells (i.e. activated B-like DLBCL). Patients with germinal-center B-like DLBCL have significantly better overall survival (5-year overall survival of 76%) than those with activated B-like DLBCL (5-year overall survival of 16%; P <0.01); this knowledge can allow stratification of patients for clinical management.16 Attempts at validation of these results have shown that the BCL6 and HGAL (GCET2) genes are specifically expressed in the germinal-center B cells and predict overall survival,56 while other differentially expressed genes, such as CD10 (also known as MME), do not predict for overall survival.57
In 2004, Lossos et al.58 used information from microarray studies to assess a set of six genes that were predictive of outcome in patients with DLBCL using reverse transcription polymerase chain reaction (RT-PCR) assays. This is the same group that previously defined the active and germinal-center B-cell prognostic groups; however, in this current study these researchers showed that only two genes from the germinal-center B-cell signature—LMO2 and BCL6—and three genes from the activated B-cell signature—BCL2, CCND2 and CCL3 (previously named SCYA3)—had any predictive value.
Prostate cancer
Prostate cancer is an extremely difficult disease to judge clinically. The prevalence of this cancer is vast, and in men over 80 years of age a prevalence of as high as 80% has been reported in autopsy series.59 The majority of these men will die from causes unrelated to the cancer itself, indicating that many of these tumors are indolent. The introduction of prostate-specific-antigen screening has increased the identification and therefore the perceived incidence of early-stage prostate tumors, many of which are destined to remain indolent. Identifying patients with indolent tumors can spare many unnecessary radical treatment and its associated morbidity. Yu et al. studied 152 prostate tissue specimens, including frozen tissue samples from tumor, adjacent non-tumor and donor tissue.49 Using an Affymetrix® platform (Affymetrix, Inc., Santa Clara, CA), they analyzed 37,777 probe elements, and, using a combination of principal component analysis, supervised hierarchical clustering and 10-fold cross-validation, they developed a 70-gene expression profile predictor of aggressiveness with an accuracy of 93%. The results were independently validated in a small group of 23 patients.
In the same year, Lapointe et al. studied 121 frozen tissue samples that comprised 62 cases of prostate cancer, 41 normal tissue specimens, and 9 specimens of lymph node metastasis.48 Using a GenePix® platform (Astros Acquisition Sub II, LLC, Sunnyvale, CA), they studied 39,711 gene probes and used unsupervised hierarchical clustering to analyze the groups. These investigators were able to distinguish malignant from normal tissue, and further categorize the malignant tumors into groups based on risk of recurrence. They were able to streamline immunohistochemical analysis for MUC1 on 225 independent prostate tumors, and showed that elevated levels of MUC1 were correlated with aggressiveness of the prostate tumors in the training set and concurrent high risk of recurrence in the independent validation set (P= 0.003).
These data show that gene-expression profiling cannot only identify those genes that discriminate indolent from aggressive tumors, but can also quickly screen for genes of interest, which can be adapted to widely available clinical tests such as immunohistochemistry. Although gene-expression profiling in prostate cancer is possible, it is not yet ready for clinical use as its validation is still lacking.
Carcinoma of unknown primary
The accurate diagnosis and targeted management of patients with (adeno)CUP (also called tumors of unknown origin) can sometimes influence patient survival. Standard histopathological measures are often incapable of solving the problem of diagnosing tumors of unknown origin, and this makes CUP a very attractive area for gene-expression profiling, as this technique could reveal more information about the tumor origin and help define suitable treatment options. At present, no literature on gene-expression profiling using microarrays in CUPs is available, although recent studies using Serial Analysis of Gene Expression (SAGE) have been able to generate expression patterns that can be used to trace the origin of the tumors. Buckhaults et al. used SAGE to select a group of genes that could substantially discriminate between tissues of different origins.52 A five-gene panel was developed that could distinguish between ovarian, breast and colon tumors and pancreatic adenocarcinoma, with a predictive accuracy of 81%; this model was validated in an independent series of 62 samples using quantitative real-time PCR. One of the major problems with SAGE analysis is that huge effort is required to study just one sample, thereby limiting the applicability of the technique for clinical use. Companies such as Arcturus and Agendia have already established commercially available gene-expression profiles using paraffin-embedded material, but rigorous scientific experimentation and validation is required for proper placement of this test in clinical practice. It is likely that microarray tests based on microRNAs will be superior to those based on mRNA.
Gastrointestinal tumors
The ability to accurately assess a patient's risk of metastasis can profoundly alter clinical management. This statement is especially true for intermediate-stage tumors such as colon cancers of Dukes stages B for which treatment options are not clear. A patient categorized into a high-risk group can undergo treatment with adjuvant systemic therapy (Box 1); patients identified as low risk can be spared the side effects of an unwarranted treatment. Eschrich et al. studied 78 colon cancer specimens, and developed a 43-gene prognostic classifier that could predict with 90% accuracy the likelihood of survival at 36 months (93% sensitivity; 84% specificity).44 These results hold promise for tailored therapy for individual patients, which is currently not possible on a biological level. Unfortunately, we feel that not enough work has been done in this important field as very few relevant articles can be found in the literature, and thus more promising research in this clinically relevant field is warranted.
For patients with esophageal carcinoma, lymph-node metastasis is a determinant of the therapeutic strategy. Current methods such as CT scanning and endoscopic ultrasound, however, do not provide accurate information on patient status. Kan et al.60 studied 28 primary esophageal squamous cell carcinomas and applied a supervised classification technique called 'artificial neural networks' to develop a gene signature that showed 86% accuracy in predicting lymph-node metastasis. Tamoto et al.61 performed a similar analysis using 36 esophageal tumor specimens. These investigators developed a 44-gene signature that was predictive of lymph-node metastasis in approximately 90% of cases. Some genes present in this signature are known to have a biological role in metastasis. Like many current microarray experiments, the differences occur in similar types of studies because of a lack of consensus amongst the researchers. For example, in the above-mentioned studies in esophageal cancer, one pertinent difference is that Kan et al.60 used normal esophageal tissue as a reference RNA, whilst Tamoto et al.61 used tumor tissue. An apparent problem with current microarray studies is that the results have been obtained from relatively small groups of patients. Studies can have discordant data because the sample size in either the study or reference group is too small to allow the generation of a classification scheme that represents the biological complexity of a tumor.
Breast cancer
Results form gene-expression profiling studies for breast cancer are gradually being implemented in the clinic. The fact that prognostic factors have been used for the last 20 years to guide adjuvant systemic treatment of breast cancer means that gene-expression profiling can become an important adjunct to the known prognostic factors. For breast cancer, three relevant gene-expression profiles associated with prognosis have been identified: a 70-gene classifier,21, 22 a 21-gene signature,41 and a 76-gene expression profile.42 Table 3 shows the relevant similarities and differences between the three signatures. New additional signatures are being developed all the time, such as the recently identified signature from Pawitan and colleagues.62
Table 3 Comparison between the three classified expression profiles: the 70-gene classifier,21, 22 the 76-gene expression profile,42 and the 21-gene signature.41
Full tableFigures & Tables indexDownload Power Point slide (326K)
In 2002, a 70-gene prognostic signature was identified by the Netherlands Cancer Institute in Amsterdam.21, 22 The investigators initially studied the frozen samples of 78 tumors on an Agilent® platform (Agilent Technologies, Inc., Palo Alto, CA; 25,000 probes).21 Using supervised classification, they found that in patients under the age of 53 years the expression of a set of 70 genes best correlated to distant metastases as a first event in node-negative cancer. This 70-gene expression signature was subsequently validated in a second partially independent validation series of 295 breast cancer patients from the same institute.22 For the node-negative patients in the validation series, 61 tumor samples overlapped with the training series. The 70-gene signature also proved to correlate with the outcome of 144 node-positive breast cancer patients (10-year overall survival rates for 'poor' and 'good' prognosis signature were 59.5% and 92.0%, respectively).22 This 70-gene signature outperformed the traditional clinicopathological features, with a hazard ratio of 4.6 (95% CI 2.3–9.3, P < 0.001) in a multivariable analysis. A 'good' prognosis signature was present in 39% of the patients, and was associated with a 94.7% probability of freedom from distant metastases in the first 10 years, and an overall survival of 97.4% independent of nodal status. A 'poor' signature was present in 61% of patients, and at 10 years 60.5% of these patients remained free of distant metastasis and the overall survival was 74.1%. A gene-expression-profiling-based commercial test, called MammaPrint®, is offered by Agendia (Amsterdam, The Netherlands).
In 2004, the American National Surgical Adjuvant Breast and Bowel Project (NSABP) in cooperation with the company Genomic Health identified a recurrence score including 21 genes that quantified the likelihood of distant recurrence in tamoxifen-treated patients with node-negative, estrogen-positive breast cancer.41 Fixed, paraffin-embedded tumor tissue was assessed and gene-expression measured,63 resulting in the development of the Oncotype DX® assay (Genomic Health, Inc., Redwood City, CA). The list of 21 genes (16 cancer-related genes and five controls) in this assay and the recurrence-score algorithm were designed by analyzing data from three independent preliminary studies involving 447 patients and 250 candidate genes identified in earlier (including microarray-based) studies. The 16 cancer-related genes were selected based primarily on the correlation of their performance with outcome in three trials—one of the trials is the NSABP B-20 trial, but the other two are not specified.41 Detailed descriptions of how the 16 genes were selected are lacking; the weight attributed to each of the 16 genes differs, but how the investigators arrived at these different weights has not been described in any detail.
To test the prognostic value of the recurrence score, RT-PCR was successfully used in 668 paraffin-embedded tumor blocks out of a larger study population of tamoxifen-treated patients in the NSABP B-14 study.63 Tumor and patient characteristics in this subpopulation were similar to those of the total study population. The patients were of all ages and had tumors that were pT1–T2 estrogen receptor (ER)-positive, and were treated with tamoxifen. Using this 21-gene recurrence score, 338 patients (51%) had a low-risk, 22% an intermediate-risk, and 27% a high-risk profile. Multivariable analysis showed that the hazard ratio of this recurrence score was 2.81 (95% CI 1.70–4.64, P < 0.001). Clinicians in the US are using the 21-gene recurrence score in clinical practice based on validation studies described on the Genomic Health website (www.genomichealth.com). Only one study, however, has been published in the literature, and ironically this study failed to validate the recurrence-score algorithm.64
In 2005, the Erasmus Medical Centre in Rotterdam in cooperation with the American company Veridex identified a signature of 76 genes that could identify node-negative breast cancer patients at high risk of distant recurrence who were therefore eligible for adjuvant systemic therapy.42 Using a retrospective study design, the Affymetrix chip U133A, which contains 22,000 genes, was used on frozen samples of 286 untreated node-negative T1–T3/4 breast cancer patients of all ages.42 This prognostic signature was identified using a training series of 171 tumors and consists of two separate profiles, one for ER-positive (60 genes) and one for ER-negative (16 genes) breast carcinomas. The gene-expression levels were analyzed using the log rank test, and validated in an independent set of 115 tumors without any overlap with the training set. After 60 months, the distant-metastasis-free survival in the 65% of patients with a 'poor' profile was 53%, and this number dropped to 49% after 80 months. A 'good' profile was observed in 35% of patients, the disease-free survival of whom was 93% after 60 months, and 88% after 80 months. After 60 months the patients with a 'poor' profile had an overall survival of 70%; the rate decreased to 63% after 80 months. Conversely, for patients with a 'good' profile the overall survival rates for 60 months and 80 months were 97% and 95%, respectively.
Although the 70-gene profile from the Amsterdam group and the 76-gene profile of the Rotterdam group have only three genes in common, most genes are involved in the same regulatory pathways. A reason why the overlap between these two signatures is small is that different microarray platforms were used (i.e. Agilent® and Affymetrix®). Among the differences between these two platforms is the fact that different DNA probes are used for the detection of the expression of specific mRNAs—the Agilent® platform is a dual color hybridization using a reference RNA for each sample, whereas the Affimetrix® platform only uses RNA from the tumor sample.
There are disadvantages in assessing gene-expression profiles identified from frozen tumor tissue. First, frozen tumor tissue is not widely available in clinical practice, and second, the assay is difficult to perform. These factors give the RT-PCR assay of Genomic Health an advantage over the other assays, since this assay can be performed more easily on readily available paraffin-embedded material. For samples that can be analyzed using paraffin-embedded tumor tissue, validation of the prognostic value can be established retrospectively from clinical trial material. For tests requiring frozen material, prospective studies are needed for validation. At present, the 70-gene signature21, 22 is being used in The Netherlands in the RASTER trial (Netherlands Cancer Institute collaboration with the Dutch Health Care Insurance Board), and soon the MINDACT trial (TRANSBIG consortium in collaboration with the European Organisation for Research and Treatment of Cancer) is expected to start using this signature.
All three signatures outperform classical clinicopathological parameters in a multivariable analysis, but insufficient independent validation has been performed to assess the reproducibility of the results.21, 22, 41, 42 It will be interesting to determine whether or not the three profiles categorize patients into the same prognostic groups. Although gene-expression profiling using microarray technology is very promising, there are still issues regarding its clinical applicability, and it is not yet known whether these profiles will genuinely improve the selection of those patients best eligible for adjuvant systemic treatment.
A key concern is when these gene-expression-based prognostic classifiers will be ready for clinical use. In the past 15 years there have been hundreds, even thousands, of studies attempting to identify novel prognostic factors in breast cancer, often employing molecular techniques. The factors used to predict prognosis and guide adjuvant treatment in most clinical guidelines are nodal status, tumor size, histologic grade, patient age and hormone-receptor status; recently, HER2 status has also been included. The reason that none of the factors has been implemented in clinical practice is that validation of the prognostic value often failed, and few prospective studies were properly planned. It appears that the gene-expression-based classifiers are more appealing to clinicians and patients, leading to a pressure to implement them in the clinic as soon as possible. Prospective validation of each of these classifiers in sufficiently large representative patient cohorts, however, is required before these tests can be used in the clinic. Premature use of these tests could lead to inappropriate advice on adjuvant systemic treatment, resulting in both undertreatment and overtreatment of patients with breast cancer.
Future directions
Microarray analysis is a rapidly evolving technology that has yielded enormous quantities of data, which are not yet fully understood. Leading scientific journals require investigators carrying out DNA microarray research to deposit their data in an appropriate international database,65 following a set of guidelines (MIAME guidelines, or Minimum Information About a Microarray Experiment;2 Table 4). This approach offers an opportunity to propose alternative analyses for these data. Michiels et al. took advantage of this repository to analyze different datasets from published studies of gene expression as predictors of cancer outcome.34 They observed that most molecular signatures and misclassifications are unstable, and their main conclusion was that the list of genes included in a gene-expression profile (based on one training set and the proportion of misclassifications seen in one validation set) depends greatly on the selection of patients in the training sets.
Table 4 Useful websites with URLs for microarray-based research.
Full tableFigures & Tables indexDownload Power Point slide (260K)
With the recognition of the limitations of microarray analysis, but also of the measures that can be taken to reduce these limitations, and an understanding of the power of this technology, there is no reason why microarrays should not become a useful, if not essential, clinical tool. Until optimal gene-expression profiling is suitable for the clinic, the prognostic value of published microarray results in cancer studies should be interpreted with caution.
Technical issues should be addressed before this technology is translated into clinical practice. New methods of data analysis will increase the possibility of finding reliable prognostic and predictive profiles. Reproducibility and quality control are essential issues when using this technology as a standard assay in hospitals. In our opinion, the best way to improve analysis of microarray data is to start out with a clear study objective (e.g. class comparison, class prediction or class discovery study). If a prognostic or predictive profile is being built (class comparison or prediction), studies should be performed with statistical rigor and be reported clearly and with unbiased statistics.66 Usage of the REMARK tumor guidelines for reporting of tumor marker studies could be a good start.67 Currently, many studies have included only a small number of patients; however, for the future, adequate study size in combination with appropriate clinical design, adjustment for known predictors and proper validation will yield more reliable results and less overestimation of the predictive value of a classifier.38, 40 These features are essential in order for this highly promising technology to move from bench to bedside.
Simon et al. recommend that supervised methods rather than cluster analyses be used for class prediction and class comparison studies, as cluster analyses are less powerful than supervised methods for distinguishing predefined classes, and do not provide valid statistical identification of differentially expressed genes.15 If cross-validation is being used to estimate the prediction accuracy, the entire model-building process that includes selection of informative genes should be repeated for each cross-validation training set. Even after a successful cross-validation strategy is used, independent validation in an unselected patient cohort is indispensable.34 If a separate dataset is being used for validation, it should be an unselected cohort that is sufficiently large to provide meaningful confidence intervals for prediction accuracy. Inadequate validation leads to the publication of overoptimistic results. Finally, the investigators should be urged not to make strong claims about the value of new prediction algorithms without comparing them against standard prediction methods.
Conclusion
Microarray technology is a very promising and quickly developing area of expertise that is leading to the discovery of numerous new signatures in a very short period of time. Although the use of gene-expression profiles in clinical practice is very appealing, we should be very cautious in interpreting all the high-throughput data and results before we start using any of the identified gene-expression signatures in daily clinical practice. Lack of frozen material from patients that are treated has been an important limiting factor preventing proper validation of reported gene-expression profiles. An ideal way to solve this problem is by incorporating gene-expression profiling into prospective randomized clinical trials. With proper validation and standardization of the microarray process, we expect that tests based on discoveries through gene-expression profiling will gradually enter the clinic in the next 10 years as an important adjunct to current clinical parameters.
Key points
- Gene-expression microarrays are tools that can be used to simultaneously measure the level of gene expression of all genes within a cell
- Recent advances in technology and science are propelling microarray-based tests into the forefront of medicine
- In the last 5 years, numerous gene-expression profiles predicting prognosis and response to specific therapies for several malignancies have been reported
- Current limitations in the technology as well as in the biostatistical methods of analysis of the large quantities of data generated from microarray tests predispose the results to misinterpretation
- Standardization of protocols and validation of current profiles will have to ensure that gene-expression profiles are reliable and reproducible, and therefore ready for clinical implementation
Acknowledgments
We would like to thank L Wessel for advice and critical reading of the manuscript. We thank R Kerkhoven for his artwork for Figure 2.
References
- Simon R et al. (2002) Design of studies using DNA microarrays. Genet Epidemiol 23: 21–36 | Article | PubMed | ISI |
- Brazma A et al. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat Genet 29: 365–371 | Article | PubMed | ISI | ChemPort |
- Schena M et al. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467–470 | Article | PubMed | ISI | ChemPort |
- Hughes TR et al. (2001) Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol 19: 342–347 | Article | PubMed | ISI | ChemPort |
- Lipshutz RJ et al. (1999) High density synthetic oligonucleotide arrays. Nat Genet 21: 20–24 | Article | PubMed | ISI | ChemPort |
- Bertucci F et al. (1999) Sensitivity issues in DNA array-based expression measurements and performance of nylon microarrays for small samples. Hum Mol Genet 8: 1715–1722 | Article | PubMed | ISI | ChemPort |
- Van de Goor TA (2005) A history of DNA microarrays. Pharm Discovery 5: 42–45 | ChemPort |
- DeRisi J et al. (1996) Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 14: 457–460 | Article | PubMed | ISI | ChemPort |
- Gulmann C et al. (2006) Array-based proteomics: mapping of protein circuitries for diagnostics, prognostics, and therapy guidance in cancer. J Pathol 208: 595–606 | Article | PubMed | ISI | ChemPort |
- Lockhart DJ and Winzeler EA (2000) Genomics, gene expression and DNA arrays. Nature 405: 827–836 | Article | PubMed | ISI | ChemPort |
- Chen CN et al. (2005) Gene expression profile predicts patient survival of gastric cancer after surgical resection. J Clin Oncol 23: 7286–7295 | Article | PubMed | ISI | ChemPort |
- Lu J et al. (2005) MicroRNA expression profiles classify human cancers. Nature 435: 834–838 | Article | PubMed | ISI | ChemPort |
- Meltzer PS (2005) Cancer genomics: small RNAs with big impacts. Nature 435: 745–746 | Article | PubMed | ChemPort |
- Alizadeh AA et al. (2001) Towards a novel classification of human malignancies based on gene expression patterns. J Pathol 195: 41–52 | Article | PubMed | ISI | ChemPort |
- Simon R et al. (2003) Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 95: 14–18 | Article | PubMed | ChemPort |
- Knudsen S (2002) A Biologist's Guide to Analysis of DNA Microarray Data. New York: John Wiley & Sons
- Eisen MB et al. (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 95: 14863–14868 | Article | PubMed | ChemPort |
- Chang HY et al. (2004) Gene expression signature of fibroblast serum response predicts human cancer progression: similarities between tumors and wounds. PLoS Biol 2: E7 | Article | PubMed |
- Chang HY et al. (2005) Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA 102: 3738–3743 | Article | PubMed | ChemPort |
- Perou CM et al. (2000) Molecular portraits of human breast tumours. Nature 406: 747–752 | Article | PubMed | ISI | ChemPort |
- van 't Veer LJ et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415: 530–536 | Article | PubMed | ISI | ChemPort |
- van de Vijver MJ et al. (2002) A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347: 1999–2009 | Article | PubMed | ISI | ChemPort |
- Nevins JR et al. (2003) Towards integrated clinico-genomic models for personalized medicine: combining gene expression signatures and clinical factors in breast cancer outcomes prediction. Hum Mol Genet 12: R153–R157 | Article | PubMed | ISI | ChemPort |
- Van 't Veer LJ and De Jong D (2002) The microarray way to tailored cancer treatment. Nat Med 8: 13–14 | Article | PubMed | ChemPort |
- Miller LD et al. (2002) Optimal gene expression analysis by microarrays. Cancer Cell 2: 353–361 | Article | PubMed | ISI | ChemPort |
- Tu IP et al. (2004) A method for detecting and correcting feature misidentification on expression microarrays. BMC Genomics 5: 64 | Article | PubMed | ChemPort |
- Forster T et al. (2003) Experiments using microarray technology: limitations and standard operating procedures. J Endocrinol 178: 195–204 | Article | PubMed | ChemPort |
- Perez EA et al. (2004) Improving patient care through molecular diagnostics. Semin Oncol 31: 14–20 | Article | PubMed | ChemPort |
- Paik S et al. (2005) Technology insight: application of molecular techniques to formalin-fixed paraffin-embedded tissues from breast cancer. Nat Clin Pract Oncol 2: 246–254 | Article | PubMed | ChemPort |
- Brown I et al. (2003) From peas to "chips" - the new millennium of molecular biology: a primer for the surgeon. World J Surg Oncol 1: 21 | Article | PubMed |
- Ramaswamy S and Golub TR (2002) DNA microarrays in clinical oncology. J Clin Oncol 20: 1932–1941 | PubMed | ISI | ChemPort |
- Russo G et al. (2003) Advantages and limitations of microarray technology in human cancer. Oncogene 22: 6497–6507 | Article | PubMed | ChemPort |
- Pusztai L and Hess KR (2004) Clinical trial design for microarray predictive marker discovery and assessment. Ann Oncol 15: 1731–1737 | Article | PubMed | ISI | ChemPort |
- Michiels S et al. (2005) Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet 365: 488–492 | Article | PubMed | ISI | ChemPort |
- Jornsten R et al. (2005) DNA microarray data imputation and significance analysis of differential expression. Bioinformatics 21: 4155–4161 | Article | PubMed | ChemPort |
- Ouyang M et al. (2004) Gaussian mixture clustering and imputation of microarray data. Bioinformatics 20: 917–923 | Article | PubMed | ChemPort |
- Oba S et al. (2003) A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19: 2088–2096 | Article | PubMed | ChemPort |
- Ioannidis JP et al. (2003) Genetic associations in large versus small studies: an empirical assessment. Lancet 361: 567–571 | Article | PubMed | ISI |
- Simon R (2004) When is a genomic classifier ready for prime time? Nat Clin Pract Oncol 1: 4–5 | Article | PubMed |
- Ntzani EE and Ioannidis JP (2003) Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet 362: 1439–1444 | Article | PubMed | ISI | ChemPort |
- Paik S et al. (2004) A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 351: 2817–2826 | Article | PubMed | ISI | ChemPort |
- Wang Y et al. (2005) Gene expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365: 671–679 | PubMed | ISI | ChemPort |
- Schaner ME et al. (2003) Gene expression patterns in ovarian carcinomas. Mol Biol Cell 14: 4376–4386 | Article | PubMed | ISI | ChemPort |
- Eschrich S et al. (2005) Molecular staging for survival prediction of colorectal cancer patients. J Clin Oncol 23: 3526–3535 | Article | PubMed | ChemPort |
- Yeoh EJ et al. (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1: 133–143 | Article | PubMed | ISI | ChemPort |
- Ross DT et al. (2000) Systematic variation in gene expression patterns in human cancer cell lines. Nat Genet 24: 227–235 | Article | PubMed | ISI | ChemPort |
- Alizadeh AA et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503–511 | Article | PubMed | ISI | ChemPort |
- Lapointe J et al. (2004) Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc Natl Acad Sci USA 101: 811–816 | Article | PubMed | ChemPort |
- Yu YP et al. (2004) Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. J Clin Oncol 22: 2790–2799 | Article | PubMed | ISI | ChemPort |
- Garber ME et al. (2001) Diversity of gene expression in adenocarcinoma of the lung. Proc Natl Acad Sci USA 98: 13784–13789 | Article | PubMed | ChemPort |
- Beer DG et al. (2002) Gene expression profiles predict survival of patients with lung adenocarcinoma. Nat Med 8: 816–824 | Article | PubMed | ISI | ChemPort |
- Buckhaults P et al. (2003) Identifying tumor origin using a gene expression-based classification map. Cancer Res 63: 4144–4149 | PubMed | ISI | ChemPort |
- Bittner M et al. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406: 536–540 | Article | PubMed | ISI | ChemPort |
- Ebert BL and Golub TR (2004) Genomic approaches to hematologic malignancies. Blood 104: 923–932 | Article | PubMed | ChemPort |
- Armstrong SA et al. (2003) Inhibition of FLT3 in MLL: validation of a therapeutic target identified by gene expression based classification. Cancer Cell 3: 173–183 | Article | PubMed | ISI | ChemPort |
- Lossos IS et al. (2003) HGAL is a novel interleukin-4-inducible gene that strongly predicts survival in diffuse large B-cell lymphoma. Blood 101: 433–440 | Article | PubMed | ChemPort |
- Lossos IS et al. (2001) Expression of a single gene, BCL-6, strongly predicts survival in patients with diffuse large B-cell lymphoma. Blood 98: 945–951 | Article | PubMed | ISI | ChemPort |
- Lossos IS et al. (2004) Prediction of survival in diffuse large-B-cell lymphoma based on the expression of six genes. N Engl J Med 350: 1828–1837 | Article | PubMed | ISI | ChemPort |
- Etzioni R et al. (1998) Asymptomatic incidence and duration of prostate cancer. Am J Epidemiol 148: 775–785 | PubMed | ChemPort |
- Kan T et al. (2004) Prediction of lymph node metastasis with use of artificial neural networks based on gene expression profiles in esophageal squamous cell carcinoma. Ann Surg Oncol 11: 1070–1078 | Article | PubMed |
- Tamoto E et al. (2004) Gene expression profile changes correlated with tumor progression and lymph node metastasis in esophageal cancer. Clin Cancer Res 10: 3629–3638 | Article | PubMed | ISI | ChemPort |
- Pawitan Y et al. (2005) Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res 7: R953–R964 | Article | PubMed | ISI | ChemPort |
- Cronin M et al. (2004) Measurement of gene expression in archival paraffin-embedded tissues: development and performance of a 92-gene reverse transcriptase-polymerase chain reaction assay. Am J Pathol 164: 35–42 | PubMed | ChemPort |
- Esteva FJ et al. (2005) Prognostic role of a multigene reverse transcriptase-PCR assay in patients with node-negative breast cancer not receiving adjuvant systemic therapy. Clin Cancer Res 11: 3315–3319 | Article | PubMed | ISI | ChemPort |
- Ball CA et al. (2002) A guide to microarray experiments-an open letter to the scientific journals. Lancet 360: 1019 | Article | ISI |
- Golub TR et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531–537 | Article | PubMed | ISI | ChemPort |
- McShane LM et al. (2005) REporting recommendations for tumour MARKer prognostic studies (REMARK). Nat Clin Pract Oncol 2: 416–422 | Article | PubMed | ISI | ChemPort |
- National Center for Biotechnology Information [http://www.ncbi.nlm.nih.gov]
Competing interests
The authors declared no competing interests.
Contact the journal about this article or read the Article Responses associated with this article.
Subject areas under which this article appears: Genetics/Genomics

