Main

With the development of therapies for metastatic colorectal cancer (CRC) targeting the epidermal growth factor receptor (EGFR), such as cetuximab and panitumumab, there has been significant effort to identify rational biomarkers of resistance or susceptibility. Initially, mutations in KRAS codons 12 and 13 were found to be predictive biomarkers of lack of response to anti-EGFR therapies (Amado et al, 2008; Karapetis et al, 2008; Van Cutsem et al, 2009; Bokemeyer et al, 2011), and more recently mutations in exons 3 and 4 of KRAS and in NRAS have also been identified as predictive of resistance to anti-EGFR therapies (Douillard et al, 2013). However, resistance to anti-EGFR therapy exists even among patients whose tumours are wild-type in KRAS and NRAS, demonstrating the need to find and characterise additional biomarkers of resistance.

The predictive impact of other mutations in signalling pathway components downstream of EGFR has been unclear. Among stage IV metastatic CRC patients treated with chemotherapy and cetuximab, BRAF mutation is associated with poor prognosis (De Roock et al, 2010; Tol et al, 2010), but is not necessarily predictive of resistance to anti-EGFR therapy. A pooled analysis of two studies that randomised KRAS wild-type metastatic CRC patients to chemotherapy or chemotherapy plus cetuximab found that BRAF mutant patients had inferior overall survival (OS) than BRAF wild-type patients, but the addition of cetuximab was associated with non-significant trend towards improved survival in the BRAF mutant patients (HR 0.62, 95% CI 0.36–1.06, P=0.076) (Bokemeyer et al, 2012). There are also conflicting results regarding the prognostic effect of PIK3CA mutations or PTEN expression loss on patients treated with anti-EGFR therapy (Laurent-Puig et al, 2009; Loupakis et al, 2009; De Roock et al, 2010; Tol et al, 2010; Karapetis et al, 2014). The role of these mutations continues to undergo investigation.

A well-described biomarker associated with sensitivity to anti-EGFR therapy is increased tumour gene expression of EREG and AREG, which encode the EGFR ligands epiregulin and amphiregulin (Khambata-Ford et al, 2007; Jacobs et al, 2009). EREG and AREG expression is inhibited by blockade of EGFR signalling and is stimulated by treatment with other EGFR ligands, and it has been hypothesised that higher expression of EREG and AREG may indicate tumour cell dependence on an autocrine EGFR-activating loop, and thus predict for increased susceptibility to anti-EGFR therapy (Khambata-Ford et al, 2007). However, mechanisms dictating varying levels of EREG or AREG expression remain unclear.

The site of the primary tumour also appears to be predictive of efficacy of anti-EGFR therapy. Clinical trials in patients with KRAS exon 2 wild-type metastatic CRC reveal that patients with left-sided colorectal primary tumours have superior progression-free survival (PFS) on treatment with cetuximab-based regimens compared with patients with right-sided primary tumours (Von Einem et al, 2014; Brule et al, 2015). Although the mechanism of this distinction is not defined, right-sided primary CRCs are known to have distinct pathobiology and characteristics, including higher rates of BRAF mutation, microsatellite instability (MSI-high), and high CpG island methylator phenotype (CIMP-high) (Yamauchi et al, 2012). CIMP-high tumours are marked by widespread DNA hypermethylation, which can epigenetically silence genes when occurring within promoter loci (Weisenberger et al, 2006). We sought to determine whether there was an association between global methylation status as assessed by CIMP status, the methylation status of CpG islands within the AREG and EREG promoters, the mRNA expression of AREG and EREG, and the site of the primary colorectal tumour in multiple independent data sets collected within the University of Texas MD Anderson Cancer Center (MDACC) and through The Cancer Genome Atlas (TCGA).

Materials and Methods

Patient cohorts and data analysis

A cohort of 179 patients at MDACC with stages I–IV CRC provided informed consent for biomarker analyses on tumour tissue and retrospective analysis of patient records for research purposes, and the study was approved by the institutional review board. Eligible tumour specimens required at least 30% tumour cells on central review by a certified pathologist, and macrodissection of fresh frozen primary tumour tissue was performed. Gene expression analysis was performed using Agilent microarrays (Agilent Technologies, Santa Clara, CA, USA). RNA was isolated and assessed for quality, and RNA of adequate quality was amplified, labelled, and hybridised to the microarray. Data are expressed as z-scores. Methylation profiling was determined using the Illumina Infinium HumanMethylation450 BeadChip Kit (Illumina, San Diego, CA, USA), using bisulphite sequencing of methylated sites to determine methylation status at over 480 000 CpG sites covering 99% of Ref-seq genes.

Two additional cohorts of patients with stages II–IV colon or rectal adenocarcinoma were analysed from data generated by TCGA Research Network (http://cancergenome.nih.gov/): one cohort, ‘TCGA27’ (n=218), had gene expression analysed by the Custom Agilent 244 K gene expression microarray (Agilent Technologies) and expressed as z-scores, and had methylation profiling performed using the Illumina Infinium HumanMethylation27 BeadChip as described (The Cancer Genome Atlas Network, 2012); and the second cohort, ‘TCGA450’ (n=356), had gene expression analysed by RNA sequencing (The Cancer Genome Atlas Network, 2012) and expressed as log2-transformed values, and also had methylation profiling performed using the Illumina Infinium HumanMethylation450 BeadChip. In order to evaluate the correlation of CpG island methylation and gene expression, the TCGA pan-cancer data set was utilised (The Cancer Genome Atlas Research Network et al, 2013).

A final independent cohort of 440 stage IV CRC MDACC patients who had been heavily pretreated was enrolled on the Assessment of Targeted Therapies Against Colorectal Cancer (ATTACC) protocol between August 2010 and October 2013 for screening and assignment to 10 individual phase I or II companion clinical trials based on testing of banked formalin-fixed, paraffin-embedded tumour tissue with gene sequencing, CIMP testing, and immunohistochemical staining. The patients in the ATTACC protocol provided informed consent for biomarker analyses on archived tumour tissue and retrospective analysis of patient records for research purposes, and the study was approved by the institutional review board. Of the 440 patients enrolled in ATTACC, 198 were KRAS exon 2 wild type by standard of care testing and were successfully tested for CIMP status. From the ATTACC specimens, bisulphite pyrosequencing of six well-defined, traditionally utilised CpG islands (Toyota et al, 1999) was performed, and specimens with 40% methylation of CpG islands were deemed CIMP-high. The PCR primers used at MD Anderson for bisulphite pyrosequencing CpG islands in p14, p16, MLH1, MINT1, MINT2, and MINT31 are listed in Supplementary Table 1A.

Definition of CIMP status

CIMP status was determined from methylation arrays by using the following two methodologies: the first method was based on assessment of the methylation status of the six CpG islands used to determine CIMP status on clinical specimens in the ATTACC cohort; and the second method was based on clustering of methylation profiles using methodology paralleling that in TCGA (The Cancer Genome Atlas Network, 2012).

To extrapolate the CIMP status based on the six-locus clinical panel from the Illumina Infinium HumanMethylation450 BeadChip results, the PCR primers for the bisulphite-treated sequences for p14, p16, MLH1, MINT1, MINT2, and MINT31 were obtained. A search for the corresponding sequence within the human genome was undertaken using BiSearch (http://bisearch.enzim.hu/) (Tusnady et al, 2005; Aranyi et al, 2006), and the identity of the sequence was verified by ensuring that the sequencing primer was located within the identified region. To identify the corresponding nucleotide location within GRCh37, NCBI Blast was performed (Supplementary Table 1B). Subsequently, the CpG islands from the Illumina Infinium HumanMethylation450 array that were located within the primer regions were determined, and 1–2 CpG islands within the PCR-amplified regions were found for the MLH1, MINT2, and MINT31 primers. For the MINT1, p14, and p16 primers, no CpG islands on the panel were found within the amplified regions, although 1–2 CpG islands located <300 nucleotides away for each of the primer pairs were found (Supplementary Table 1C). The beta-value of each of these CpG islands was transformed to an M-value (Du et al, 2010), and the distribution of M-values for each data set was plotted and found to be bimodal. For each CpG island, a threshold was determined at the point between the two modes to dichotomise the methylation status in the MDACC data. This threshold was applied to the M-value transformations of the TCGA data and was found to also reflect the point between the two modes (Supplementary Table 2A and B). Then, a voting scheme mirroring the clinical CIMP panel was enacted to determine CIMP status (CIMP-high represents >40% or 3/6 markers).

For the TCGA samples assayed with the Illumina Infinium HumanMethylation27 BeadChip instead of the larger Illumina Infinium HumanMethylation450 BeadChip, it was not possible to align the clinical panel with existing CpG island probes. Instead, a clustering method of methylation profiles was used, following the methods described in the TCGA manuscript (The Cancer Genome Atlas Network, 2012).

Cell line data sets

To determine the effect of hypomethylating agents azacitidine and decitabine on methylation of specific CpG island promoters, raw experimental data were obtained from ArrayExpress database (www.ebi.ac.uk/arrayexpress) using accession number E-MTAB-417. In this study, HCT116 colon cancer cells were treated with 1 μM azacitidine or decitabine for 24 h. Subsequently, the Illumina Infinium HumanMethylation27 BeadChip was used to determine methylation status (Hagemann et al, 2011).

To determine the effect of the hypomethylating agent azacitidine on expression of EREG and AREG, experimental data was obtained from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) using accession number GSE57341. In this study, a panel of 14 CRC cell lines was treated with 500 nM of azacitidine for 72 h. Cells were harvested at 1, 3, 7, 10, 14, 21, or 28 days, and expression was assayed using the Agilent 44K Expression Array (Li et al, 2014).

Statistical analysis

Gene expression microarray data were provided as z-scores. RNA-seq data were provided as raw values that were base 2 logarithmically transformed for statistical analysis. To identify association between clinical characteristics and expression or methylation levels, the Mann–Whitney U-test was utilised if variables were dichotomous, or the Kruskal–Wallis test was utilised if variables had >2 possible values. To perform survival analysis with PFS, the Kaplan–Meier method was used and significance testing was performed using the log-rank test. Additional univariate and multivariate Cox regression analyses were performed using IBM SPSS Statistics version 21 (Armonk, NY, USA). In addition, integrated Bayesian analysis of high-dimensional multiplatform genomics (iBAG) (Wang et al, 2013) data was performed using the nonlinear approach (Jennings et al, 2013) to determine the percent of variability in expression that is explained by methylation. This method involved fitting an additive regression model in which expression was regressed on methylation and copy number as additive nonparametric predictors, with residual error representing variability in expression not explained by methylation or copy number, that is, explained by some other upstream regulator. After model fitting, percent variability explained by methylation, copy number, and other causes were estimated through the corresponding coefficients of determination.

For the MDACC ATTACC cohort, to determine associations with PFS and OS, multiple imputation methodology was used, with predictor variables including age, sex, CIMP-high vs low/none, right- vs left-sided primary tumours, duration of first EGFR regimen, progression status, prior bevacizumab, presence or absence of additional cytotoxic chemotherapy, PTEN status, PIK3CA mutation status, BRAF mutation status, NRAS mutation status, and number of prior chemotherapy regimens. Any missing values were addressed using an assumption of missing at random and multivariate normality using the MCMC method in SPSS to create 20 imputed data sets. Subsequently, Cox proportional hazards models were used to perform univariate and multivariate regression analyses for PFS with the first anti-EGFR regimen or OS.

Results

EREG and AREG expression are strongly inversely associated with methylation of loci within the promoters of EREG and AREG

A cohort of 179 MDACC patients with CRC had primary tumour specimens assayed for EREG and AREG methylation and expression. Characteristics of the patient cohort are described in Supplementary Table 3. Eight out of nine CpG islands located within the EREG gene promoter or body were significantly inversely associated with expression, as were four out of five CpG islands located within the AREG gene and promoter (Supplementary Table 4A and B). The strongest correlation with EREG expression was with the CpG island cg19308222 (ρ=−0.726, P<10−6; Figure 1A). The strongest correlation with AREG expression was with the CpG islands cg26611070 (ρ=−0.520, P<10−6) and cg022334660 (ρ=−0.501, P<10−6; Supplementary Figure 1a and b). These findings demonstrate that there is a significant inverse association between DNA methylation and expression of EREG and AREG, and identify promoter methylation as a significant regulatory mechanism of expression of EREG and AREG.

Figure 1
figure 1

EREG methylation is inversely associated with expression, and is modulated by hypomethylating agents. (A–C) Scatter plots of methylation β-value at the EREG CpG island cg19308222 compared with EREG expression in the (A) MDACC cohort (n=179), (B) TCGA450 cohort (n=356), or (C) TCGA27 cohort (n=218). (D) Methylation at cg19308222 after treatment with 1 μM azacitidine or decitabine for 24 h. (E and F) Expression of EREG (A_23_P41344), but not AREG (A_23_P249071), is increased after treatment with 500 nM azacitidine for 72 h and cell harvesting after the indicated duration. *P<0.05 compared with control.

To confirm these findings, the association of EREG and AREG methylation and expression was determined in two independent cohorts of colon and rectal adenocarcinoma cancer specimens from the TCGA. In the TCGA450 cohort (n=356; Supplementary Tables 4A and B and 5), the strongest correlation with EREG expression was again with the CpG island cg19308222 (ρ=−0.671, P<10−6; Figure 1B), and the strongest correlation with AREG expression was again with the CpG island cg02334660 (ρ=−0.457, P<10−6; Supplementary Figure 1c and d). In the TCGA27 cohort (n=218; Supplementary Table 6), strong inverse correlation was again observed between EREG expression and cg19308222 (ρ=−0.659, P<10−6; Figure 1C). Because the panel of CpG islands tested for methylation was smaller in this array, there were no CpG islands tested within the AREG promoter.

To put this strong negative correlation in context, among colon adenocarcinoma specimens in the TCGA, the inverse correlation between methylation at cg19308222 and EREG expression was the 219th out of 18 945 most-negative correlation (top 1.2%) and between methylation at cg02334660 and AREG expression was the 1801th out of 18 945 (top 9.5%) among methylation/gene expression pairs (Broad Institute TCGA Genome Data Analysis Center, 2014b). Similarly, among rectal adenocarcinoma specimens in the TCGA, the inverse correlation between methylation at cg19308222 and EREG expression was the 155th out of 19 209 most-negative correlation (top 0.8%), and the inverse correlation between methylation at cg26611070 and AREG expression was the 1686th out of 19 209 most-negative correlation (top 8.8%) among methylation/gene expression pairs (Broad Institute TCGA Genome Data Analysis Center, 2014a).

iBAG analysis

Additional analysis was completed to determine the extent to which variation in expression of EREG and AREG was attributable to methylation, compared with copy number variation or other unspecified upstream regulators, using iBAG (Wang et al, 2013). In the MDACC cohort, 64.5% of EREG expression variation was explained by methylation of cg19308222, compared with 2.9% by copy number variation and 32.7% from other causes. In AREG, if considering cg02334660 alone, 33.8% of expression was explainable by variation in methylation, compared with 0.5% by copy number variation and 65.7% by other causes. If considering cg2661070 alone, 34.8% of expression was explained by variation in methylation, compared with 0.6% by copy number variation and 64.6% by other causes. In the TCGA27 cohort, 57.5% of EREG expression variation was explained by methylation of the cg19308222 locus, compared with 0.9% from copy number variation and 41.7% from other causes. The variation for AREG could not be determined as there was no AREG methylation locus in the Illumina Infinium 27K panel.

By comparison, in a large set of 799 genes analysed by iBAG in two CRC cohorts, both the TCGA27 cohort and the MDACC cohort, we found the median percent of expression explained by variation in methylation was 5.78%. Only 5 of these 799 genes (0.63%) had >60% of expression variability explained by methylation, placing EREG within the top 1% of genes whose expression was predominantly modulated by methylation.

Demethylating agents decrease methylation levels and increase expression of EREG

In order to evaluate whether methylation modulates EREG expression, we utilised RNA arrays and Illumina Infinium HumanMethylation27 BeadChip data sets for CRC cell lines treated with azacitidine and decitabine. EREG methylation at cg19308222 was significantly reduced by treatment with either azacitidine or decitabine (Figure 1D). In contrast, the EREG associated CpG site cg04941721 was not altered with either agent, consistent with the limited correlation of this site with EREG expression. No CpG probes for AREG are present on the Illumina 27K array, limiting evaluation of this target. Expression of EREG was evaluated at various time points after 72 h of treatment with azacitidine in a panel of 14 CRC cell lines, demonstrating a time-dependent increase in expression of EREG, which was not seen in AREG (Figure 1E and F).

Right-sided primary, CIMP-high, MSI-high, and BRAF-mutated cancers are associated with higher levels of gene methylation and lower levels of expression of EREG and AREG

Next, we examined whether additional clinical and pathological characteristics of CRC were associated with significant differences in methylation and expression of AREG and EREG. In the MDACC cohort, right-sided primary tumour, MSI-high status, BRAF V600E mutant status, and mucinous histology were all associated with significantly lower levels of AREG and EREG expression, and significantly higher levels of methylation of the AREG loci cg02334660 and cg26611070, and the EREG locus cg19308222 (Table 1 and Supplementary Figure 2). CIMP was defined by one of two different methods, and CIMP-high status by either method was associated with significantly lower AREG and EREG expression, and significantly higher methylation of EREG cg19308222 (Table 1 and Figure 2A and B). In the TCGA450 cohort, similar to the MDACC results, MSI-high status, BRAF mutant status, mucinous histology, and right-sided primary tumour were significantly associated with lower levels of AREG and EREG expression, and with higher levels of methylation at the AREG loci cg02334660 and cg26611070, and the EREG locus cg19308222 (Table 2 and Supplementary Figure 4). CIMP-high status, determined by clinical method, was again associated with lower AREG and EREG expression, and with higher methylation of the AREG and EREG loci (Figure 2C and D). In the TCGA27 cohort, MSI-high status, BRAF mutant status, mucinous histology, and right-sided primary tumour were all again associated with significantly lower levels of AREG and EREG expression, and significantly higher levels of methylation of the EREG locus cg19308222 (Supplementary Table 7). In addition, CIMP-high status, as assessed by clustering, was also significantly associated with lower levels of AREG and EREG expression, and higher levels of EREG methylation at cg19308222.

Table 1 Univariate analyses of clinical and pathological characteristics from the MDACC cohort (n=179)
Figure 2
figure 2

CIMP status is associated with EREG methylation and expression, and with the duration of progression-free survival with anti-EGFR therapy. (A and B) BLiP plots comparing (A) z-score of EREG expression and (B) methylation levels at cg19308222 in CIMP-high vs low/none as determined by clinical method in the MDACC cohort (n=179). (C and D) BLiP plots comparing (C) log2-transformed EREG expression and (D) methylation levels at cg19308222 in CIMP-high vs low/none as determined by clinical method in the TCGA450 cohort (n=356). (EG) Kaplan–Meier curves of PFS with first anti-EGFR regimen among patients in the MDACC ATTACC cohort (E) grouped by BRAF and NRAS mutation status (for NRAS mutant vs BRAF/NRAS WT, P=0.0007; for BRAF mutant vs BRAF/NRAS WT, P=0.0003); (F) grouped by CIMP-high vs CIMP-low/none status among all patients in the cohort (n=167); or (G) grouped by CIMP-high vs CIMP-low/none status among the subgroup of patients known to be wild type in BRAF and NRAS (n=88).

Table 2 Univariate analyses of clinical and pathological characteristics from the TCGA450 cohort (n=356)

Among the subgroup of 84 patients in the MDACC cohort known to be wild-type both in BRAF codon 600 and in KRAS codons 12 and 13, right-sided primary tumour location remained significantly associated with lower AREG and EREG expression, and higher methylation of the AREG loci cg02334660 and cg26611070, and the EREG locus cg19308222 (Supplementary Table 8 and Supplementary Figure 3). In this subgroup, CIMP-high status as determined by the clinical method was significantly associated with higher methylation levels of EREG cg19308222 and was non-significantly associated with lower EREG expression. Similarly, in the subgroup of 163 patients in the TCGA450 cohort who were wild-type both in BRAF and in KRAS codons 12 and 13, MSI-high status, mucinous histology, and right-sided primary tumour remained significantly associated with lower AREG and EREG expression, and higher levels of methylation at the AREG and EREG loci. CIMP-high status remained significantly associated with lower EREG expression and higher levels of methylation at the AREG and EREG loci.

Finally, among the subgroup of 134 patients in the TCGA450 cohort who were wild-type in BRAF, KRAS exons 2–4, and NRAS, MSI-high status, mucinous histology, right-sided primary tumour, and CIMP-high status remained significantly associated with lower AREG and EREG expression, and higher levels of methylation at the AREG and EREG loci (Supplementary Table 9 and Supplementary Figure 5). Similarly, among the 100 patients in the TCGA27 cohort who were wild-type in BRAF, KRAS, and NRAS, a significant association with lower EREG and AREG expression, and higher EREG cg19308222 methylation remained with CIMP-high status, right-sided primary tumour, and MSI-high status (Supplementary Table 7).

CIMP-high status is associated with inferior PFS to anti-EGFR therapy in KRAS wild-type patients

An independent cohort of 198 patients with KRAS wild-type metastatic CRC enrolled in the ATTACC protocol at MDACC was successfully tested for CIMP status (Supplementary Table 10). Of this group, 173 patients had previously been treated with an anti-EGFR therapy, and PFS with the first anti-EGFR regimen was retrospectively determined in 167 patients. Of this group, 26.3% (44/167) patients were CIMP-high. Compared with the CIMP-low/none group, the CIMP-high group was significantly more likely to have right-sided primary tumour (45.5% vs 23.6%, P=0.011), MSI-high status (18.5% vs 1.6%, P=0.009), BRAF mutation (42.9% vs 6.9%, P<0.0001), and male sex (77.3% vs 60.2%, P=0.045).

Kaplan–Meier analysis found that among the entire cohort of 167 KRAS wild-type patients, inferior PFS with the first anti-EGFR therapy regimen was significantly associated with CIMP-high status (median PFS 4.0 vs 6.5 mo, P<0.001), BRAF mutation (median PFS 2.8 vs 6.5 mo, P=0.004), NRAS mutation (median PFS 4.4 vs 7.2 mo, P=0.006), and right-sided primary tumour (median PFS 4.7 vs 6.5 mo, P=0.040) (Figure 2E and F and Supplementary Figure 6a). These findings were recapitulated on univariate analysis by Cox proportional hazards regression analysis. On multivariate Cox regression analysis by multiple imputations, NRAS mutation (HR 2.27, 95% CI 1.25–4.13, P=0.007), BRAF mutation (HR 2.50, 95% CI 1.22–5.13, P=0.012), and CIMP-high status (HR 2.00, 95% CI 1.11–3.64, P=0.022) remained significant. Right-sided primary tumour was not significant (HR 1.43, 95% CI 0.86–2.36, P=0.167) (Table 3). In the subgroup with BRAF and NRAS wild-type disease, inferior PFS with the first anti-EGFR regimen remained significantly associated with CIMP-high status (median PFS 5.6 vs 9.0 mo, P=0.023) and trended with right-sided primary tumour (median PFS 5.6 vs 9.0 mo, P=0.053) on Kaplan–Meier analysis (Figure 2G and Supplementary Figure 6b).

Table 3 Univariate Cox regression analysis and multivariate Cox regression analysis after multiple imputations of MDACC ATTACC cohort for PFS

Notably, on univariate analysis of OS among the 198 patients, with 162 events, CIMP-high status (HR 1.53, 95% CI 1.08–2.16), right-sided primary tumour (HR 1.45, 95% CI 1.04–2.01), BRAF mutation (HR 2.46, 95% CI 1.61–3.75), and NRAS mutation (HR 1.70, 95% CI 1.03–2.81) were significantly associated with inferior survival. However, neither CIMP-high status nor right-sided primary tumour were significantly associated with OS on multivariate analysis among the entire cohort or on univariate analysis among the subgroup of 109 patients known to be wild-type in BRAF and NRAS.

Discussion

Our study is one of the first to identify CIMP status, and more specifically methylation of loci within the EREG and AREG promoters, as a determinant of EREG and AREG expression levels and a prognostic biomarker with regards to PFS upon treatment with anti-EGFR therapy in patients with metastatic CRC. Given the association we and others have observed between EREG levels and the site of primary tumour, this provides additional explanation for the clinical differences observed in right- vs left-sided primary tumours. Although our data are retrospective and are not derived from clinical trials, the use of multiple independent data sets available through the TCGA to validate our findings is a strength of our study.

Although methylation of CpG islands within gene promoters is generally known to be associated with downregulation of expression, it remains unclear which methylated loci are most important in driving differential expression. Although the TCGA Pan-Cancer project has identified for each gene the single CpG island most inversely correlated with expression, our results extended these results specifically to CRCs. For EREG, we found that the CpG island cg19308222, which was most strongly inversely correlated with expression in the Pan-Cancer project, was again the most inversely correlated CpG island in CRC. However, for AREG, we found that the CpG islands cg02334660 and cg26611070 had the strongest inverse correlation with expression in CRC, while in the Pan-Cancer project, cg03244277 had the strongest inverse correlation. Indeed, both cg02334660 and cg26611070 are located within the body of the AREG gene. Although gene-body methylation is thought to be associated with increased expression (Yang et al, 2014), body methylation may conversely function to repress activation of intragenic promoters and functions in a tissue-specific context (Maunakea et al, 2010; Jones, 2012). The tissue-specific effects of methylation at specific loci in specific genes requires further investigation, as this may explain the differences in which locus was selected as the most anti-correlated with expression.

The significant contribution of EREG and AREG methylation on gene expression has not been previously well described. A small in vitro study of gastric cancer cell lines revealed inverse correlation between EREG promoter methylation and expression (Yun et al, 2012). Indeed, iBAG analysis demonstrated that 57–65% of variation of expression in EREG was attributable to methylation of cg19308222, and 33–35% of variation of expression in AREG was attributable to methylation of the AREG loci cg02334660 or cg2661070. Furthermore, treatment of CRC cell lines with demethylating agents indeed resulted in decreased methylation at the cg19308222 locus and increased expression of EREG. This provides the strongest evidence to date that methylation of these important EGFR ligands is a significant likely mechanism of regulation of expression, and also suggests that this mechanism can be manipulated by treatment with hypomethylating agents.

Several studies have already established that high expression of EREG and AREG is associated with improved outcomes with anti-EGFR therapy in refractory metastatic CRC. A study of tumours from 110 metastatic CRC patients treated with cetuximab found that high EREG or AREG expression was associated with longer PFS (HR 0.47, P=0.0002; and HR 0.44, P<0.0001, respectively) (Khambata-Ford et al, 2007), although it was unclear whether there was an interaction of EREG and AREG expression level with KRAS or BRAF mutation status. Another study in 121 irinotecan-refractory KRAS wild-type metastatic CRC patients who received anti-EGFR therapy on clinical trials found that EREG and AREG expression was significantly associated with response rate, disease control, PFS, and OS with anti-EGFR therapy, with the predictive value of EREG expression superior to that of AREG expression (Jacobs et al, 2009). Analysis from the CO.17 trial of cetuximab in patients with refractory metastatic CRC found cetuximab had a larger effect in improvement of OS and PFS in KRAS wild-type and EREG expression-high patients (Jonker et al, 2014). Similarly, results from the PICCOLO trial of second-line panitumumab and irinotecan in metastatic CRC found significant improvement in PFS in patients with either EREG or AREG expression in the highest tertile (Seligmann et al, 2016). Finally, a retrospective evaluation of expression of 110 candidate genes from 144 primary tumours of KRAS wild-type refractory metastatic CRC patients found that high EREG and AREG expression was strongly associated with improved PFS and disease control rate with cetuximab (Baker et al, 2011). In contrast, two first-line studies failed to demonstrate that EREG or AREG expression is predictive of benefit with cetuximab when combined with multi-agent chemotherapy backbones (Adams et al, 2012; Cushman et al, 2015). Moreover, these studies did not further describe additional clinical and pathological variables that we now know may have impacted EREG and AREG expression.

Additional data reveal that patients whose tumours have high EREG and AREG expression are more likely to have clinical and tumour biological characteristics similar to those we found. Indeed, data from 952/1630 samples from the randomised phase III COIN (addition of cetuximab to oxaliplatin-based first-line combination chemotherapy for treatment of advanced CRC) trial showed that high expression of EREG and AREG was significantly associated with wild-type KRAS, wild-type BRAF, left-sided primary colon tumour, and microsatellite stable disease (Adams et al, 2012). An independent study also found that left-sided primary colon carcinomas were more likely to have epiregulin overexpression (Missiaglia et al, 2014). Moreover, data from 331/696 samples from the randomised PICCOLO (Panitumumab, irinotecan, and ciclosporin in CRC) trial in second-line treatment of metastatic CRC similarly found that high expression of EREG and AREG was associated with wild-type BRAF and left-sided primary colon tumour (Seligmann et al, 2016).

Also, multiple trials showed that patients with KRAS wild-type left-sided CRC have improved outcomes upon treatment with cetuximab than those with right-sided CRC. In the AIO KRK-0104 trial, 146 patients were randomised to receive first-line CAPOX with cetuximab or CAPIRI (capecitabine/irinotecan) with cetuximab. In this trial, 95/146 patients had KRAS codon 12 and 13 wild-type tumour, with 68 left-sided and 27 right-sided primary tumours. Patients with left-sided KRAS exon 2 wild-type cancers had superior OS (HR 0.63, P=0.016) and PFS (HR 0.67, P=0.02) than patients with right-sided tumours. In patients with KRAS/BRAF wild-type tumours (n=79), median PFS was 8.2 vs 5.9 mo (HR 0.81, P=0.47) and median OS was 27.3 vs 16.2 mo (HR 0.60, P=0.11; Von Einem et al, 2014). In the CO.17 study, 399 patients with metastatic CRC were randomised to receive cetuximab or best supportive care. In the cohort of patients with KRAS exon 2 wild-type status, there was a significant interaction between site of primary tumour and PFS benefit with cetuximab (for left-sided tumours, HR 0.28 and P<0.0001; for right-sided tumours, HR 0.73 and P=0.26), with predictive effect interaction term (P=0.002) (Brule et al, 2013; Brule et al, 2015).

Notably, although CIMP-high status and right-sided primary CRC were both associated with significantly inferior OS on univariate analysis, neither remained significant after adjusting for NRAS and BRAF status. As CIMP-high status remained significantly associated with PFS with anti-EGFR therapy after adjusting for these variables, this suggests that the inferior anti-EGFR PFS of CIMP-high patients is not solely due to an overall inferior prognosis, at least among the KRAS/NRAS/BRAF wild-type subpopulation. Indeed, several studies have noted inferior overall prognosis among patients with low EREG/AREG expression (Pentheroudakis et al, 2013; Stahler et al, 2016) or right-sided primary CRC (Loupakis et al, 2015). These studies are variable in whether they included only patients with KRAS wild-type tumours, whether they limited analysis to patients with anti-EGFR therapy only, and whether EREG and AREG were treated as a continuous or categorical variable and how any cut points were defined. Furthermore, none included CIMP status in their multivariable analyses. Furthermore, recent data from the PICCOLO study show that although high EREG and AREG levels were associated with improved OS among all patients and among the RAS wild-type subgroup, neither was significantly associated with OS among the RAS and BRAF wild-type subgroup. In that study, site of primary tumour was not associated with PFS or OS (Seligmann et al, 2016). Additional studies to confirm these findings in prospectively collected data will be important in determining the predictive vs prognostic role of these variables.

Previously, there had not been data for a plausible biological explanation for the differential outcomes of left- and right-sided primary CRCs with cetuximab-based treatment, but our findings provide evidence that variation in EREG and AREG methylation, and expression contribute to these differences (Figure 3). Although there is ample existing evidence of the association of right-sided primary cancers with CIMP-high status, our analysis provides a novel association between right-sided and CIMP-high CRCs with increased methylation specifically of the critical CpG islands within EREG and AREG that have the strongest inverse association with EREG and AREG expression. Accordingly, there is also an association with low EREG and AREG expression. Several studies have corroborated that low EREG and AREG expression are associated with inferior outcomes with cetuximab therapy, even if there is conflicting evidence whether EREG and AREG are predictive or merely prognostic. Nevertheless, these data are consistent with our retrospective analysis showing that CIMP-high status is associated with inferior PFS with anti-EGFR-including therapy.

Figure 3
figure 3

Left- and right-sided primary CRCs have distinct pathobiology, with different rates of CIMP, and contrasting levels of methylation and gene expression of EREG and AREG , providing a unifying explanation for differences in outcomes with anti-EGFR therapy like cetuximab.

Although our findings unify disparate observations and associations between location of primary tumour, expression of EREG and AREG, and differential responses to cetuximab, they are based on retrospective analysis of several different data sets and thus cannot ascertain causality. Prospective incorporation of these variables into the analysis of adequately powered future randomised controlled trials of anti-EGFR therapies in first-line and refractory metastatic CRC will be necessary to determine whether CIMP status and methylation level of EREG and AREG are truly predictive of response. Further investigation is warranted into whether differential methylation and expression of additional genes in CIMP-high patients contribute to resistance to anti-EGFR therapy.