Main

Over the last decade, gene expression microarray technology has had a profound impact on cancer research. The ability to analyse the expression of thousands of genes in a single experiment has been systematically used to derive prognostic and predictive markers for many cancer types (Shedden et al, 2008; Sotiriou and Pusztai, 2009; Gomez-Raposo et al, 2010; Oberthuer et al, 2010). Numerous of these ‘signatures’ show good prognostic power, but surprisingly gene-wise overlap between them has been minimal (Ein-Dor et al, 2006; Fan et al, 2006; Chen et al, 2007; Lau et al, 2007), which increases the difficulty of introducing microarrays in clinical practice. Moreover, studies comparing data originating from different microarray platforms have reported poor inter-platform correlations (Kuo et al, 2002; Tan et al, 2003). Nevertheless, multiple studies in breast and non-small-cell lung cancer (NSCLC) have shown that most of these signatures exhibit similar prognostic performance and identify identical patients (Fan et al, 2006; Haibe-Kains et al, 2008). These data suggest that, although gene-wise overlap is small, the signatures track common underlying biology that determine patient outcome. Among others, Weigelt et al (2010) have suggested that proliferation genes drive the prognostic power of these signatures (Whitfield et al, 2006; Desmedt et al, 2008; Haibe-Kains et al, 2008). A large meta-analysis by Wirapati et al (2008) supports this concept.

To determine if this result could be clinically useful, we previously developed a signature based on 104 proliferation genes (Starmans et al, 2008). This signature was derived from two in vitro gene expression data sets. Genes were selected that showed a cycling pattern after synchronisation in one data set and responded to serum stimulation in the other. Our proliferation signature exhibited strong prognostic power in several large transcriptome data sets representing different cancer types (Starmans et al, 2008). Further, the proliferation signature and multiple other signatures identified similar patients as having good or poor prognosis (Starmans et al, 2008). These results substantiate the hypothesis that many published signatures act as surrogates of proliferation.

The clinical applicability of gene expression signatures remains controversial; studies seem to lack consistency and external validation is not straightforward (Michiels et al, 2005; Ein-Dor et al, 2006; Dupuy and Simon, 2007; Boulesteix and Slawski, 2009; Boutros et al, 2010; Subramanian and Simon, 2010). Many gene expression signatures were developed since the introduction of gene expression microarray technology, however, so far only in breast cancer two prognostic gene profiles are tested in large prospective trials (Bogaerts et al, 2006; Sparano, 2006; Wirapati et al, 2008; Weigelt et al, 2010). The dimensionality of gene expression microarrays makes statistical analysis complex, and large numbers of samples are required for reproducible results (Zien et al, 2003; Ein-Dor et al, 2006). An approach to only evaluate a select number of transcripts may therefore provide an efficient alternative to high-throughput expression profiling. The use of a PCR-based test to evaluate the proliferation signature would assist in the application to a clinical setting (Zhou et al, 2010). Furthermore, a PCR-based technique does not necessitate the availability of fresh-frozen tissue, whereas this is recommended for gene expression microarrays (Tumour Analysis Best Practices Working Group, 2004). Many more samples might thus be available to validate classifiers with a PCR-based technique.

Initially, we examined whether it was possible to reduce the number of genes in the proliferation signature, without deteriorating its prognostic value. The original signature consisted of 104 genes, and so reducing this number would make data collection, analysis and transfer to a PCR-based approach simpler and more transparent. To further facilitate translation of this reduced proliferation signature to a PCR-platform a series of in vitro and ex vivo validation experiments were performed. We reduced the proliferation signature to 10 genes and validated it in 1820 breast cancer and 862 NSCLC patients. Lastly, the reduced proliferation signature was applied to another independent, 129-patient breast cancer cohort with qPCR to demonstrate clinical utility.

Materials and methods

Gene size reduction

In our original study (Starmans et al, 2008) we developed a 104-gene proliferation signature and evaluated it in five different microarray data sets representing breast, lung and renal cancers (Beer et al, 2002; van de Vijver et al, 2002; Miller et al, 2005; Wang et al, 2005; Zhao et al, 2006). To reduce the size of the signature, each gene was tested for its univariate prognostic value in each data set. This was for disease-specific (Miller et al, 2005; Zhao et al, 2006), metastasis-free (Wang et al, 2005) or overall (Beer et al, 2002; van de Vijver et al, 2002) survival, depending on what was reported for the data set. Only genes that had fewer than 25% missing values were included. Expression of each gene was used as a continuous variable as input for receiver operator curve (ROC) analysis. Genes were ranked within each data set by the area under the ROC (Supplementary Data File S1), and then subjected to a rank-based filtering. The filtering criteria were dependent on the number of data sets that included a certain gene, and were:

  • Present in 1/5 data sets: discard the gene

  • Present in 2/5 data sets: select when ranked in top 20 for both data sets

  • Present in 3/5 data sets: select when ranked in top 20 in all three data sets

  • Present in 4/5 data sets: select when ranked in top 20 in 3 out of 4 data sets

  • Present in 5/5 data sets: select when ranked in the top 20 in 4 out of 5 data sets

Quantitative PCR

RNA was reverse-transcribed using I-script (Bio-Rad, Veenendaal, The Netherlands) and quantitative PCR was performed in ABI 7500 (Applied Biosystems, Bleiswijk, The Netherlands). Gene abundance was detected using power SYBR Green I (Applied Biosystems). Primer sequences are provided in Supplementary Data File S2. Relative abundance of every gene per sample (Xgene, sample) was calculated using standard curves and normalisation to 18S rRNA signal (Equation 1). This was followed by median scaling per gene for each data set (Barsyte-Lovejoy et al, 2006).

A multi-gene signature score was subsequently calculated for the reduced proliferation signature as follows:

In which N is the number of genes in the multi-gene marker. The parameter geneexpr,n for a sample equals the value 1 if the sample has a level of gene n above the median for all samples in the data set and −1 otherwise. All data analyses were performed in R (v2.12.1).

In vitro validation

To validate the involvement in proliferation of the genes in the reduced signature, serum starvation experiments were performed in five cancer cell lines (MCF7, HeLa, HT-29, U-2 OS and DU145). Cells were grown either in normal serum containing medium (10% foetal bovine serum, FBS), the control situation, or in low serum containing medium (0.1% FBS, starvation condition) for 48 h. RNA was isolated for both conditions for three biological replicates. The multi-gene signature score (Equation 2) was calculated for each sample. Scores were then compared between normal and serum starvation conditions with a two sample two-tailed unpaired Student’s t-test (R v2.12.1).

Ex vivo validation with qPCR

A large set of xenografts (n=168) was used to assess whether it is feasible to evaluate the reduced proliferation signature in tumour material. Material was isolated from xenografts grown from different cancer cell lines (HeLa, HT-29, U-87, LS 174T, HCT 116 and Hep G2) obtained from previous studies in which tumour volume doubling times (VDTs) were calculated (Oostendorp et al, 2008; Dubois et al, 2009a, 2009b; Rouschop et al, 2010). The multi-gene signature score (equation 2) was calculated and used to median dichotomise the xenograft samples. Differences in tumour VDTs were then assessed between the two groups with a two sample two-tailed unpaired Student’s t-test (R v2.12.1).

Validation in independent microarray data sets

The reduced proliferation signature was further validated in independent public mRNA abundance data sets. Several breast cancer (Pawitan et al, 2005; Bild et al, 2006; Chin et al, 2006; Sotiriou et al, 2006; Desmedt et al, 2007; Loi et al, 2008; Bos et al, 2009; Zhang et al, 2009; Li et al, 2010; Sabatier et al, 2010; Symmans et al, 2010) and NSCLC (Bhattacharjee et al, 2001; Bild et al, 2006; Raponi et al, 2006; Shedden et al, 2008; Lu et al, 2010) data sets were used to assess the prognostic power of the reduced proliferation signature. For NSCLC the data sets reported on adenocarcinoma and/or squamous cell carcinoma. Considering these are completely different disease types, separate analyses were performed per subgroup. When overall survival was provided this was used as end point, otherwise disease-specific survival (or the closest variant available) was used. All data sets used Affymetrix microarrays, which were normalised using the RMA algorithm (Irizarry et al, 2003) (R packages: affy v1.26.1) combined with updated ProbeSet annotations (Dai et al, 2005b) (R packages v12.1.0: hgu95av2hsentrezgcdf, hgu133ahsentrezgcdf, hgu133bhsentrezgcdf and hgu133plus2hsentrezgcdf). Genes were matched across data sets based on Entrez Gene IDs. Median scaling and housekeeping gene normalisation (to the geometric mean of ACTB, BAT1, B2M and TBP levels) was performed (Barsyte-Lovejoy et al, 2006). The multi-gene signature score (Equation 2) was used to median dichotomise the patients in a data set. Patients predicted as having good or poor prognosis in any of the data sets were pooled into different groups. This was done for the breast cancer and lung cancer data sets separately. Prognostic performance of the reduced proliferation signature was evaluated by Cox proportional hazard ratio (HR) modelling followed by the Wald test (R survival package v2.36-2). For breast cancer 15-year survival was used as end point and 5-year survival for NSCLC.

Validation in independent patient cohort

The reduced proliferation signature was evaluated using qPCR in a breast cancer patient cohort of the breast tumour bank of the Radboud University Nijmegen Medical Centre (Nijmegen, The Netherlands) as described previously (Span et al, 2004). Patients underwent modified radical mastectomy or a breast-conserving lumpectomy between November 1987 and December 1997. Postoperative radiotherapy was administered, to the breast after an incomplete resection or after breast-conserving treatment, or to the parasternal lymph nodes when the tumour was medially localised. Patients did not receive (neo-) adjuvant systemic therapy according to the standard treatment protocol at the time. RNA was available from 129 lymph node-negative breast cancer patients.

Quantitative PCR was carried out to evaluate the reduced proliferation signature in this patient cohort. Subsequently the multi-gene signature score (Equation 2) was calculated and patients were either assigned to the low- or high-expression group. Patients in the low expression group are predicted to have good prognosis, whereas patients in the high-expression group are predicted to have poor prognosis. Disease-free survival was used as follow-up end point. Univariate and multi-variate Cox proportional HR modelling followed by the Wald test was used to evaluate the reduced proliferation signature (R survival package version 2.36-2). For a subgroup of the cohort histological grade was unknown, median imputation was applied for those patients (R e1071 package v1.5-24). Moreover, multi-variate models with and without the signature were evaluated with the C-index (R survival package v2.36-2).

Results

To reduce the number of genes in the proliferation signature, genes were ranked according to their individual prognostic power in each of the five data sets used in the original study (Supplementary Data File S1) (Starmans et al, 2008). After filtering and gene ranking, the final reduced proliferation signature consisted of 10 genes, which is a reduction of 90% (Table 1).

Table 1 The reduced proliferation signature genes

The original basis of the proliferation signature was in gene expression studies carried out in vitro. To ensure that the remaining genes accurately reflect proliferation status per se, especially when assessed by qPCR, we evaluated the reduced signature both in vitro and ex vivo. First, five different cancer cell lines (MCF7, HeLa, HT-29, U-2 OS and DU145) were cultured in either normal or serum-starved conditions. Figure 1A shows that expression of the reduced proliferation signature was significantly lower upon serum starvation compared with control growing conditions (P=1.52 × 10−11, t-test). Individual genes showed a similar pattern (Supplementary Figure S1).

Figure 1
figure 1

In vitro validation: difference in reduced proliferation score in normal vs starvation conditions (A). Ex vivo validation: Corresponding volume doubling times (VDTs) for a xenograft data set (n=168) dichotomized with the reduced proliferation signature in a low and high-proliferation group (B).

We then assessed expression of the 10 genes in a panel of tumour xenografts with known VDTs (Oostendorp et al, 2008; Dubois et al, 2009a, 2009b; Rouschop et al, 2010) originating from different cancer cell lines. Xenografts were assigned to either the low- or high-proliferation group based on expression of the reduced proliferation signature. Although proliferation rate is not expected to be the only parameter, which influences gross tumour growth (e.g., rates of cell turnover are also important), we hypothesised that VDTs in the group with high proliferation should be reduced compared with xenografts with low proliferation. Figure 1B confirms this hypothesis: a significant difference in VDTs between high- and low-proliferation signature xenografts was observed (P=5.32 × 10−6, fold-change =1.60; t-test).

To demonstrate its prognostic power, the reduced signature was evaluated in two large gene expression-based meta-data sets of 1820 breast and 862 NSCLC patients. None of these data sets were included in the original study; all were fully independent. Patients were stratified based on the reduced proliferation signature and Cox proportional hazards modelling was used to assess performance. Patient classification with the reduced proliferation signature could stratify breast (Figure 2A: HR=1.63; 95% CI: 1.39–1.92; P=1.42 × 10−9 Wald test) and NSCLC patients (Figure 2B: HR=1.35; 95% CI: 1.10–1.66; P=34.47 × 10−3 Wald test) into groups with distinct prognostic profiles. High expression of the reduced proliferation signature correlated with poor survival in all patient groups. For NSCLC subgroup analyses were performed for the adenocarcinoma and squamous cell carcinoma patient groups, for these are significantly distinct disease states. Non-small-cell lung cancer adenocarcinoma patients could be grouped into cohorts with significantly different survival properties (Figure 2C: HR=1.64; 95% CI: 1.30–2.06; P=3.01 × 10−5 Wald test). However, in the squamous cell carcinoma cohort the reduced proliferation signature had no prognostic power (Figure 2D: HR=0.66; 95% CI: 0.41–1.04; P=7.14 × 10−2 Wald test). In Supplementary Figures S2 and S3 Kaplan–Meier survival curves for the individual data sets are provided. These data indicate that reduction of the proliferation signature was successful; the reduced signature could stratify patients into groups with significant differences in survival.

Figure 2
figure 2

Validation of the reduced proliferation signature in a breast cancer (A) and non-small-cell lung cancer (B) meta-data set, for NSCLC a subgroup analysis was performed for adenocarcinoma (C) and squamous cell carcinoma (D): patients with high proliferation have significant worse survival than patients in the low proliferation group. Abbreviations: HR=hazard ratio; P=P-value Wald test.

To confirm the prognostic performance of the reduced proliferation signature when evaluated by qPCR, we tested its performance in a further independent cohort of 129 lymph node-negative breast cancer patients. This patient group is distinct from those used for model development and from those in the meta-data set analysis. Table 2 displays patient and treatment characteristics. The reduced proliferation signature stratified the cohort into groups predicted to have either good (low proliferation) or poor prognosis (high proliferation). Figure 3A shows that the patient group predicted to have poor prognosis had significantly worse disease-specific survival than the good prognosis group (HR=2.25; 95% CI: 1.01–4.99; P=4.60 × 10−2 Wald test). The majority of this cohort were stage I patients, therefore a subgroup analysis was performed. The reduced proliferation signature could stratify stage I patients in two groups with highly significant differences in prognosis (Figure 3B: HR=5.92; 95% CI: 1.62–21.59; P=7.03 × 10−3 Wald test). To investigate whether the signature’s prognostic power was independent of other clinical factors multi-variate Cox proportional HR modelling was used. In the whole-patient cohort the reduced signature performed comparable to stage (Supplementary Table S1), however, it did, like the other factors included, not reach statistical significance (HR=1.73; 95% CI: 0.73–4.12; P=0.215 Wald test). In stage I patients the reduced proliferation signature was the top prognostic factor (Table 3, HR=7.23; 95% CI: 1.65–31.95; P=8.57 × 10−3 Wald test). C-indexes for multi-variate Cox proportional HR models of clinical parameters with and without the reduced proliferation signature were calculated. Both in the whole cohort and stage I patient group adding the signature increased prognostic power, in the stage I patient group the signature alone outperformed the model comprising clinical parameters (Supplementary Table S2).

Table 2 Baseline demographics of breast cancer patient cohort in low and high-risk group assessed with the reduced proliferation signature (full characteristics were represented previously (Span et al, 2004))
Figure 3
figure 3

Validation of the reduced proliferation signature with qPCR in a breast cancer patient cohort; high-prognostic power is achieved (A), which is most pronounced in the stage I patient group (B). Abbreviations: HR=hazard ratio; P=P-value Wald test.

Table 3 Results multi-variate Cox regression model in stage I patient group (78 patients)

Discussion

We previously reported a microarray-based proliferation signature with high-prognostic power in several large microarray data sets encompassing different cancer types. Here, we successfully reduced the number of genes in the proliferation signature to a more appropriate scale for low-throughput technologies. This could greatly facilitate the translation into a clinically applicable test (Zhou et al, 2010). In two large independent gene expression meta-data sets for breast and NSCLC the reduced signature separated the patients into groups with significant distinct survival properties.

A subgroup analysis for the NSCLC cohort showed high-prognostic power in adenocarcinoma patients, whereas in squamous cell carcinoma patients no prognostic power was observed. Earlier studies have shown similar data for other measures of proliferation; high proliferation was significantly associated with incidence of metastasis and worse survival in adenocarcinomas, but not in squamous cell carcinomas (Komaki et al, 1996; Hommura et al, 2000). Recapitulating decreasing the number of signature genes resulted in a new marker with high performance across different cancer types.

Several genes in the signature have previously been implicated in cancer outcome (Glinsky, 2006; Whitfield et al, 2006; Ryu et al, 2007; Hao et al, 2008; Marie et al, 2008). UBE2C (ubiquitin-conjugating enzyme E2C) expression was correlated with malignant progression in thyroid carcinomas and demonstrated prognostic power in ovarian cancer (Pallante et al, 2005; Berlingieri et al, 2007; van Ree et al, 2010), in which high expression was associated with worse survival. Overexpression of RRM2 (ribonucleotide reductase M2) showed association with chemotherapy resistance (Boukovinas et al, 2008). Furthermore, a large fraction of the published gene expression signatures include clusters of proliferation-associated genes and several of the reduced proliferation genes are represented in these clusters (Whitfield et al, 2002; Dai et al, 2005a; Shedden et al, 2008; Weigelt et al, 2010).

As a last step the reduced proliferation signature was evaluated with qPCR in an independent breast cancer patient cohort. This patient group consisted entirely of patients without axillary lymph node metastases, and who did not receive systemic adjuvant therapy, making it possible to distinguish a pure prognostic value of the proliferation signature. The reduced proliferation signature stratified patients into groups with different survival properties and showed high-prognostic power especially in stage I patients. A high disease-specific survival was observed in the stage I patients identified as having low risk. This suggests the reduced proliferation signature might be useful in identifying high-risk stage I breast cancer patients that could benefit from additional therapy like chemo-radiation or accelerated radiotherapy, whereas the low risk group would not.

Currently two large prospective trials have been started to address the predictive performance of two gene expression signatures in early breast cancer (Bogaerts et al, 2006; Sparano, 2006). Both these signatures include a subset of proliferation genes and several meta-analyses show evidence that the prognostic value of these signatures is mostly attributed to this process (Wirapati et al, 2008; Weigelt et al, 2010). Therefore, a signature reflecting merely proliferation could make its interpretation easier. Furthermore the prognostic power of the reduced proliferation signature was not limited to breast cancer; it also had a high performance in a NSCLC adenocarcinoma meta-data set.

Thus, we here show that the array-based proliferation signature could be reduced to 10 genes. This reduced proliferation signature can be applied in small-tissue samples, including possibly FFPE material, adding to its clinical applicability. The pure prognostic power of the signature was validated in an independent breast cancer patient cohort, where it was shown to be particularly useful to select patients that would benefit from more aggressive therapy. To fully grasp the potential prognostic or predictive role of the signature it further should be tested in prospective trials and translated from a relative to an absolute measure.