Introduction

Colorectal cancer is the third most common cancer in the world and the second leading cause of cancer-related death in the western world.1,2 Around a quarter of colorectal cancer patients are incurable at diagnosis and half of the patients who undergo potentially curative surgery will ultimately develop metastatic disease.

In many cases, chemotherapy is used in treating colorectal cancer, which aims to slow tumor growth, shrink tumor size and reduce the likelihood of metastasis development. The standard treatment for advanced colorectal cancer is based on the administration of fluoropyrimidines (5-fluorouracil (5-FU) or capecitabine) combined with oxaliplatin, the topoisomerase I (TOP1) inhibitor CPT-11 (Irinotecan) and the monoclonal antibodies cetuximab, bevacizumab or panitunumab.2,3 Although most patients with advanced colorectal cancer are initially responsive to the combined chemotherapy treatment, they later experience disease relapse due to eventual tumor recurrence and emergence of drug-resistant tumor cells.

Gaining insight of the mechanisms underlying drug resistance is important to develop more effective therapeutic approaches.2 Human cancers may be resistant to therapy at the time of drug presentation (innate drug resistance). Some cancers become resistant after an initial response (acquired drug resistance). Both innate and acquired drug resistance involve multiple mechanisms, such as altering drug metabolite potency, increasing drug efflux or decreasing drug toxicity or inhibiting cell death.4 In colorectal cancer, higher level of thymidylate synthase were found associated with tumor insensitivity to 5-FU-based therapy.5 Similarly, higher levels of TOP1 is correlated with greater sensitivity of colon tumors to camptothecin derivatives compared with normal colonic mucosa.5 Glucuronidation, involved in xenobiotic detoxification, regulates innate resistance to TOP1 inhibitors in colon cell lines and tumors.6 The resistance to oxaliplatin involves decreased drug accumulation, increased detoxification and repair, enhanced tolerance to damage, alteration in pathways involved in cell cycle kinetics and apoptosis inactivation.7 In addition, overexpression of specific drug transporters (ABCB1/P-gp, lung resistance-related protein or multidrug resistance-related protein) was shown by flow cytometry and fluorescence microscopy to occur in human colon adenocarcinoma cell lines resistant to TOP1 inhibitors. Despite the mechanisms identified, implication of these biomarkers in clinic was not confirmed. The only clinically used biomarker is K-ras. Patients harboring a K-ras mutation are excluded from being treated with epidermal growth factor receptor antibodies, as they are less likely to benefit from epidermal growth factor receptor-targeted treatment.8

Microarray technology has been widely used in biomarker discovery and clinical outcome prediction.9,10 We analyzed microarray data derived from colon cancer cells resistant to oxaliplatin, SN38 (the active metabolite of irinotecan) and 5-FU, respectively. In order to prioritize the genes regulating cancer cell response to these anti-cancer drugs, we combined microarray data of drug resistance with data of patient survival. Three gene signatures were identified for these three respective drugs. Score systems were developed for each drug based on the gene signatures. The score systems were able to stratify cancer patients into high- or poor-survival groups.

Materials and methods

Microarray data

Microarray data derived from oxaliplatin-resistant HCT116 (HCT116-Oxa), 5-FU-resistant HCT116 (HCT116-FU), SN38-resistant HCT116 (HCT116-SN) and corresponding parent cell lines were downloaded from ArrayExpress (E-MEXP-390 and E-MEXP-1171).11,12 The information of the cell lines were described previously.11,12 Microarray data with patient survival information were obtained from Gene Expression Omnibus using accessing number GSE17536 (training data set) and GSE14333 (validation data set).13,14

Gene expression analysis

Microarray files were processed using Robust Multi-array Average algorithm and imported into Partek software.15 Principle Component Analysis and sample clustering were used to inspect for the existence of batch effects, outlier and errors. Batch effects were removed using software package Partek. To determine differentially expressed genes, data were analyzed using two-way analysis of variance. The differentially expressed genes (P<0.005) were further subjected to Gene Ontology analysis. Pathway analysis was performed using Ingenuity software (www.ingenuity.com). To detect survival-related genes, univariate Cox regress analyses for the differentially expressed genes were performed using Partek.

Development and validation of a drug-resistant score system

To generate a score system for drug resistance (Table 1), we subject the candidate gene lists to BRB-arrayTools (developed by Dr Richard Simon and BRB-ArrayTools Development Team) to calculate the regression coefficient for each gene using the training data set. The drug-resistant score is the sum of the product of the expression level of a gene and its corresponding regression coefficient (drug-resistant score=sum of coefficient of Gene Gi × expression value of Gene Gi). The patients were dichotomized into groups at high or low risk using the 50th percentile (median). The coefficient derived from the training data set was directly applied to the validation data set.

Table 1 Coefficient and weight of each gene for 5-FU, oxaliplatin and SN38

Results

Differentially expressed genes derived from the 5-FU-resistant cell line

Genes (363) (P<0.005) were found to be differentially expressed between HCT116-FU and HCT116 parent cell lines. Hundred of these genes are known to regulate drug resistance in cancer, with 32 upregulated and 68 downregulated. Upregulated genes include IGF1R, NQO1, ABCC3, FOXO3, MLL, LGALS3, TOX, GNAS, IGFBP4, AKR1C2, MSTN, SULF2 and EGR2. Downregulated genes include CFLAR, FANCA, TRPM2, SLC19A1, RRM1, FOXA2, CASR, MTAP, BACE1, ELOVL6, NF2, FBXW7, DPH5 and FOLR1.

Gene Ontology analysis (Figure 1a1) indicated that these 363 genes are enriched in functions involving metabolic process (53.6%), growth (27.7%), pigmentation (12%), viral reproduction (3.8%) and cell proliferation (2.3%). Ingenuity pathway analysis indicated that the top five enriched signaling networks (Figure 1a2) were cellular assembly and organization, cellular movement, gastrointestinal disease; cell death, cellular development, hematopoiesis; cellular growth and proliferation, protein synthesis, cell cycle; cell-mediated immune response, cellular development, cellular function and maintenance; and free radical scavenging, cellular movement, gene expression. The core nodes in the signaling networks include ERK1/2, NFKB complex, Akt, PI3K complex and Ras (Supplementary Data 1, Figure 1).

Figure 1
figure 1

(a1) The differentially expressed genes between 5-fluorouracil (5-FU)-resistant and control HCT116 cells were subjected to oncology analysis. These genes can be categorized based on their functions into metabolic process (53.6%), growth (27.7%), pigmentation (12%), viral reproduction (3.8%) and cell proliferation (2.3%). (a2) The differentially expressed genes between 5-FU-resistant and control HCT116 cells were subjected to ingenuity pathway analysis and the top five enriched signaling networks were obtained. (b1) Gene Ontology analysis indicated that these differentially expressed genes between oxaliplatin-resistant and control HCT116 cells were enriched in functions involving rhythmic process (31.7%), viral reproduction (27.2%), cellular process (26.2%), cell proliferation (7.7%) and metabolic process (6.1%). (b2) The differentially expressed genes between oxaliplatin-resistant and control HCT116 cells were subjected to ingenuity pathway analysis and the top five enriched signaling networks were obtained. (c1) Gene Ontology analysis indicated that the differentially expressed genes between SN38-resistant and control HCT116 cells were enriched in functions involving cell proliferation (39.7%), response to stimulus (16.1%), growth (15.9%), rhythmic process (11%), viral reproduction (8.6%) and cellular process (8.4%). (c2) The differentially expressed genes between SN38-resistant and control HCT116 cells were subjected to ingenuity pathway analysis and the top five enriched signaling networks were obtained.

PowerPoint slide

Differentially expressed genes derived from the oxaliplatin-resistant cell line

Genes (373) (P<0.005) were found to be differentially expressed between HCT116-Oxa and HCT116 parent cell lines. Genes (104) are known to regulate drug resistance in cancer, with 37 upregulated and 67 downregulated genes. Upregulated genes are CD24, CD80, FANCD2, IL2, MMP2, HLF, CYP1B1, DGAT1, PRKCQ, STAT5B, AKT3, RICTOR, CCKBR and FLT1. Downregulated genes includes DHFR, DIABLO, EIF4E, PPP2CB, LYN, ADO, CD59, LDLR, SMAD4, TYMS, ARNT, NR3C1, MINA, MTAP, WEE1, B2M, PRKAR1A, MGST1, RAB18, HOXA9, MCAM, TUBB, ENO1, SMARCA4, BCL2L11, ELP3, CREM, MERTK, CENPV and ARG2.

Gene Ontology analysis (Figure 1b1) indicated that these 373 genes were enriched in functions involving rhythmic process (31.7%), viral reproduction (27.2%), cellular process (26.2%), cell proliferation (7.7%) and metabolic process (6.1%). Ingenuity analysis indicated that the top five enriched signaling networks (Figure 1b2) were cellular assembly and organization, cell morphology, cellular movement; cell cycle, lipid metabolism, small molecule biochemistry; cell cycle, gene expression, cell-to-cell signaling and interaction; cellular assembly and organization, DNA replication, recombination, and repair, connective tissue disorders; and RNA post-transcriptional modification, cell death, protein synthesis. The core nodes in the signaling networks include PI3K complex, ERK1/2, Akt and IL2 (Supplementary Data 1, Figure 2).

Figure 2
figure 2

Venn diagram analysis. (a) Venn diagram of upregulated genes in 5-fluorouracil (5-FU), oxaliplatin or SN38-resistant HCT116 cells. (b) Venn diagram of downregulated genes in 5-FU, oxaliplatin or SN38-resistant HCT116 cells.

PowerPoint slide

Differentially expressed genes derived from the SN38-resistant cell line

Genes (692) (P<0.005) were found to be differentially expressed between HCT116-SN and HCT116 parent cell lines. Genes (196) are known to regulate drug resistance in cancer, with 108 upregulated genes and 88 downregulated genes. Upregulated genes include HBA1, SRC, ICAM1, CD44, ANXA8, APOE, TRH, GALE, CSF1R, IGF1, NQO1, NCAM1, LOX, ERBB3, SAT1, LDLR, RET, GRP, NOS1, CDK1, IGH and CASP10. Downregulated genes include IRS1, HPRT1, GAL, VEGFA, CYP3A5, MSH3, MTR, GHR, CYP2B6, ZAK, MYB, LDLR, SOD2, DTNB, NCOR1, NR3C1, GCLC, MINA, TFPI, LPP, ADAM17, MDM4, RIPK1 and AQP3.

Gene Ontology analysis (Figure 1c1) indicated that these 496 genes were enriched in functions involving cell proliferation (39.7%), response to stimulus (16.1%), growth (15.9%), rhythmic process (11%), viral reproduction (8.6%) and cellular process (8.4%). Ingenuity analysis indicated that the top five enriched signaling networks (Figure 1c2) were: cell death, cellular compromise, antimicrobial response; cellular function and maintenance, molecular transport, cell signaling; gene expression, DNA replication, recombination, and repair, cell-to-cell signaling and interaction; cell cycle, reproductive system development and function, cell-to-cell signaling and interaction; and cellular growth and proliferation, skeletal and muscular system development and function, lipid metabolism. The core nodes in the signaling networks include NFKB complex, CACNA1A, SMARCA4, PI3K complex and MAPK (Supplementary Data 1, Figures 3, 4, 5).

Figure 3
figure 3

Kaplan–Meier curves were drawn for conditions as follows. (a) Patients in the training data set were dichotomized into high (H_FU) and low (L_FU) drug-resistant groups based on the resistance score of 5-fluorouracil (5-FU) using the 50th percentile of the score as cutoff. (b) Patients in the training data set were similarly dichotomized into high (H_Oxa) and low (L_Oxa) drug-resistant groups based on the resistance score of oxaliplatin. (c) Patients in the training data set were similarly dichotomized into high (H_SN) and low (L_SN) drug resistance groups based on the resistance score of SN38.

PowerPoint slide

Figure 4
figure 4

Kaplan–Meier curves were drawn for conditions as follows. (a) Patients in the validation data set were similarly dichotomized into high (H_FU) and low (L_FU) drug-resistant groups based on the resistance score of 5-fluorouracil (5-FU) (b) Patients in the validation data set were similarly dichotomized into high (H_Oxa) and low (L_Oxa) drug-resistant groups based on the resistance score of oxaliplatin. (c) Patients in the validation data set were similarly dichotomized into high (H_SN) and low (L_SN) drug-resistant groups based on resistance score of SN38.

PowerPoint slide

Figure 5
figure 5

Kaplan–Meier curves were drawn for conditions as follows. (a) Patients in the training data set were grouped into four groups based on the number of high-resistance scores the patients have (3, the patients are in three high-resistance groups; 2, the patients are in two high-resistance groups; 1, the patients are in one high-resistance group; 0, the patients are in low-resistance groups). (b) Patients in the validation data set were grouped into four groups based on the number of high-resistance scores the patients have (3, the patients are in three high-resistance groups; 2, the patients are in two high-resistance groups; 1, the patients are in one high-resistance group; 0, the patients are in low-resistance groups).

PowerPoint slide

Difference between the three gene lists

There are limited overlaps between differentially expressed genes derived from these three drug-resistant cell lines (Figure 2a). Only one overlapping gene (ANO8) was found between the 134 upregulated genes derived from HCT116-FU and 361 upregulated genes derived from HCT116-SN cells. Only two overlapping genes (C15orf28 and FBXW2) were found between the 229 downregulated genes derived from HCT116-FU and 331 downregulated genes derived from HCT116-SN cells. In the 143 upregulated genes derived from HCT116-Oxa cells and 361 upregulated genes derived from HCT116-SN cells, only two genes (LOC257358 and C1orf61) overlap. Between the 230 downregulated genes derived from HCT116-Oxa cells and 331 downregulated genes derived from HCT116-SN cells, there were only five genes (MINA, CLASP2, SFRS7, ZBTB20 and C2orf69). In the 134 upregulated genes derived from HCT116-FU cells and 143 upregulated genes derived from HCT116-Oxa cells, there was no overlapping gene. In the 229 downregulated genes derived from HCT116-FU cells and 230 downregulated genes derived from HCT116-Oxa cells, there was only one overlapping gene (MTAP). Among these 11 overlapping genes, the functions of 4 genes are unknown, including LOC257358, C1orf61, C15orf28 and C2orf69, whereas the other 7 genes exhibited widespread functions. ZBTB20 exhibits RNA polymerase II core promoter proximal region sequence-specific DNA-binding transcription factor activity involved in negative regulation of transcription and DNA binding.16 SFRS7 is a member of the serine/arginine-rich family of pre-mRNA-splicing factors, which constitute part of the spliceosome.17 CLASP2 exhibits microtubule plus-end binding and galactoside 2-alpha-L-fucosyltransferase activity.18 MINA is a c-Myc target gene that may have a role in cell proliferation or regulation of cell growth.19 FBXW2 regulates chaperonin-mediated protein folding and Wnt signaling pathway and pluripotency.20 MTAP is associated with metabolic pathways and cysteine and methionine metabolism, and exhibits S-methyl-5-thioadenosine phosphorylase activity and phosphorylase activity.21 ANO8 is related to stimuli-sensing channels and solute-carrier-mediated transmembrane transport, and possesses intracellular calcium-activated chloride channel activity.22

Gene ontology analysis (Figure 1a) indicated that there are different levels of enrichment in functions including metabolic process (53.6% for HCT116-FU cells and 6.1% for HCT116-Oxa cells), growth (28% for HCT116-FU cells and 15.9% for HCT116-SN cells), viral reproduction (3.8% for HCT116-FU cells, 27% for HCT116-Oxa cells and 8.6% for HCT116-SN cells), cell proliferation (2.3% for HCT116-FU cells, 7.7% for HCT116-Oxa cells and 39% for HCT116-SN cells), rhythmic process (32% for HCT116-Oxa cells and 11% for HCT116-SN cells) and cellular process (26% for HCT116-Oxa cells and 8.4% for HCT116-SN cells). Network analysis indicated that PI3K–Akt and MAPK–ERK pathways are altered in all three drug-resistant cell lines and the NFKB pathway is altered in HCT116-FU and HCT116-SN cells. However, the genes in each signaling network are totally different between cells with different drug resistance (Figures 1b and d).

Genes correlate to colorectal cancer patient survival

We analyzed the differentially expressed genes for their association with patients’ survival. In the 363 differentially expressed genes between HCT116-FU and its corresponding cell lines, 10 genes are significantly associated with patient survival (P<0.01), with 6 positively associated and 4 negatively associated. The positively correlated genes are CLIC4, CTHRC1, COL3A1, IER5L, IGFBP4 and EGR2. The negatively correlated genes are FERMT1, FN3K, ASB2 and TRPM2. In the 373 differentially expressed genes between HCT116-Oxa and its corresponding cell lines, 18 are significantly associated with patient survival (P<0.01), with 16 positively correlated and 2 negatively correlated. The positively correlated genes are FLT1, CYP1B1, KCNJ8, AKT3, STC1, DOCK4, LTBP1, CD59, RAB31, SERPINH1, MCAM, GLG1, FERMT2, RASSF8, CHST14 and NOG. The negatively correlated genes are SCFD2 and WDR77. In the 692 differentially expressed genes between HCT116-FU and its corresponding cell lines, 19 are significantly associated with patient survival (P<0.01), with 11 positively correlated and 8 negatively correlated. The positively correlated genes include RAB3B, PLAUR, APOE, VEGFA, LDLR, EPYC, RHOQ, LOX, FTH1, NUAK1 and TCEAL3. The negatively correlated genes include LRRC36, NUAK2, DTYMK, TSPAN11, AXIN2, CCDC13, C1orf151 and ANAPC5.

Development of the drug-resistant score system

To assess contribution of drug resistance-related genes to patient survival, we applied survival prediction analysis in the BRB-arrayTools. Each gene in the respective gene signature was assigned a coefficient, and the score for each sample was calculated using the coefficient of these genes. The patients were then dichotomized into a high- and low-drug-resistant group for drug resistance using the 50th percentile cutoff of the score as the threshold value. The results indicated that patient survival rates were significantly lower in the patient group with a higher drug-resistant score using the drug-resistant score system for all three drugs (5-FU, 62.5 vs 87.7%, P<2.5E-4; oxaliplatin, 56.9 vs 93.2%, P<5.3E-7 and SN38, 52.8 vs 97.3%, P<2E-10). To further validate the score systems, we used an independent data set derived from 290 patients. The drug-resistant score was similarly calculated and the patients were similarly dichotomized into a high- and low-drug-resistant group using the 50th percentile cutoff of the score as the threshold value. Results indicated that the drug-resistant score system was able to separate patients with good prognosis (high survival rates) from these with poor prognosis (low survival rates) (5-FU, 65.8 vs 87.9%, P<2E-4; oxaliplatin, 66.7 vs 87%, P<4.2E-4 and SN38, 65.8 vs 87.9%, P<1.6E-4).

As colon cancer is usually treated with a combination of drugs, we assume that the more drug the cancer cells are resistant to, the more likely the patient exhibits a lower chance of survival. We therefore combined the drug-resistant scores for 5-FU, oxaliplatin and SN38 to see whether we could further separate the patients into subgroups with different risks. The patients were categorized into four groups based on the number of drugs they predicted to be at high risk of resistance (high risk of 3, 2, 1 or 0 drug resistance). In the 177-patient data set, the patients with high risk of three drug resistances exhibit the lowest survival rates (42.7%) and the patients with no risk of drug resistance exhibit highest survival rates. In the 290-patient data set, the patients with high risk of three drug resistances exhibit the lowest survival rates (54.1%) and the patients with no risk of drug resistance exhibit the highest survival rates (95.4%).

Discussion

Knowing the drug sensitivity of a given tumor for a particular agent could significantly impact decision making and treatment planning. Starting from the gene expression pattern of drug-resistant cells, we developed three drug-resistance-based score systems to rank the patients’ response to three first-line anti-cancer drugs (5-FU, oxaliplatin and irinotecan) for colorectal cancer treatment. Patients with a high drug resistance score exhibited poor survival. Patients with a high score of all three drugs exhibited the poorest survival.

The gene expression patterns of these drug-resistant cell lines are distinct, which provide a strong basis to develop specific drug-resistant gene signatures. As genetic background/mutation in cancer cell lines may contribute to resistance to the tested drugs, we used data derived from one single cell line HCT116. No overlapping gene was found in all the three gene lists. The distinct gene expression patterns also indicated that different drug resistance mechanisms for drugs with different cytotoxic mechanisms. Indeed, cytotoxicity of 5-FU is due to misincorporation of fluoronucleotides into RNA and DNA and to the inhibition of the nucleotide synthetic enzyme thymidylate synthase.23 Oxaliplatin (1,2-diaminocyclohexane-oxalate platinum) mainly forms intrastrand adducts between two adjacent guanine residues or guanine and adenine, disrupting DNA replication and transcription.24 Irinotecan interacts with cellular Top1-DNA complexes and has S-phase-specific cytotoxicity.25

A large portion of the differentially expressed genes is related to drug resistance, indicating that the identified genes recapitulate features of drug resistance. For example, IGF1R, ABCC3 and FOXO3 are associated with drug resistance26, 27, 28 and overexpressed in 5-FU-resistant cells. Expression levels of drug-resistant-related genes such as CD24, FANCD2, CYP1B1 and STAT5B were increased in oxaliplatin-resistant cells.29, 30, 31, 32 Similarly, SRC, ICAM1, CD44, IGF1, ERBB3, RET and CDK1 correlate with drug resistance33, 34, 35, 36, 37, 38, 39 and were overexpressed in SN38-resistant cells to regulate drug resistance. Interestingly, although gene expression patterns are distinct in these drug-resistant cell lines, pathway analysis indicated that the drugs share some common core nodes of significantly changed pathways. For example, PI3K–Akt, MAPK–ERK pathways were altered in all three drug-resistant cell lines and the NFKB pathway was altered in HCT116-FU and HCT116-SN cells. The PI3K–Akt signal transduction pathway comprises of the lipid kinase, phosphatidylinositol 3-kinase (PI3K) and the serine/threonine kinase, Akt (or PKB). Activation of this pathway has a pivotal role in essential cellular functions, such as survival, proliferation, migration and differentiation that underlie the biology of human cancer.40 The Raf/MEK/ERK pathway influences chemotherapeutic drug resistance as ectopic activation of Raf induces resistance to doxorubicin and paclitaxel in breast cancer cells.41 NFKB transcription factor induces drug resistance through MDR1 expression in cancer cells.42 However, there is a lack of explicit link of gene functions to corresponding resistance to a particular drug. Caution must be taken to interpret the results.

Other molecular signatures have been reported in colon cancer. For example, a molecular signature for oncogenic BRAF is reported in human colon cancer cells.43 Colon cancer molecular subtypes were identified by expression profiling and associated with stroma, mucinous type and different clinical behavior.44 Gene expression profiling of peritoneal metastases from appendiceal and colon cancer demonstrates unique biologic signatures and predicts patient outcomes.45 Intestinal adenomagenesis involves core molecular signatures of the epithelial–mesenchymal transition.46 The new discovered drug-resistant gene signatures add new knowledge to this field and a combination of these gene signatures may reveal new mechanisms in colorectal cancer. However, the cellular model-based gene signatures may not represent all features of clinical samples. The results are preliminary and require further validation in future studies involving tumor samples from patients. Combining the drug-resistant scores derived from each single agent might not reflect reality in patients who receive these drugs together. Caution should be taken to interpret the gene-outcome correlations.

In conclusion, we identified candidate genes, gene sets and pathways related to drug resistance in colorectal cancer, which warrant further investigation. The drug-resistant score systems are able to stratify patients into good or poor survival groups, which warrant further validation.