Main

Oesophageal adenocarcinoma (OAC) is one of the fastest rising cancers in men in the UK and now accounts for more than 5700 new cases per year (Rouvelas et al, 2005; National Oesophago-Gastric Cancer audit, 2013; Peng et al, 2013). There is an urgent need to identify prognostic subtypes of OAC as despite potentially curative treatment, 5-year survival is only 35–40% and current pathological prognostic markers are unreliable. The systematic identification of molecular prognostic markers would allow for improved prognostic information for the patient and a better understanding of the underlying tumour biology. This will help in the logical development of novel targeted therapies for these patients.

OAC is an area of unmet research need and has been highlighted as a research priority by governments and in the strategies of large funding bodies (Chen et al, 2012). Predicting prognosis after potentially curative surgery for OAC is difficult and inaccurate. It is currently based on internationally accepted tumour staging (Tumour Node Metastasis), with the addition of other important pathological criteria including resection margin status, the presence of vascular or neural invasion and of signet ring histology (Sobin et al, 2010; O’Neill et al, 2013; Schoppmann et al, 2013b; Yendamuri et al, 2013). It is now well recognised that in addition to pathological scoring systems other features of tumours, for example, constituents of the microenvironment, immune infiltration and response to neoadjuvant chemotherapy (NAC), are critical to tumour progression (Courrech Staal et al, 2011). A good response to NAC has been consistently shown to predict for better outcome (Mandard et al, 1994; Fareed et al, 2009; Noble et al, 2013). This is being used by some clinicians to determine adjuvant treatment protocols, but this can only be accurately determined after oesophageal resection, and even in those patients where a poor local tumour response to NAC is observed, a proportion may benefit from systemic treatment by virtue of nodal downstaging (Noble et al, 2013).

Recent advances in OAC have focused on early diagnosis and understanding the genetic landscape of the disease (Kadri et al, 2010; Liu et al, 2014). Next-generation sequencing studies may ultimately lead to molecular phenotype therapeutics in OAC but the widespread application of near-patient whole-genome sequencing at the time of diagnosis is likely to be many years away (Dulak et al, 2013; Weaver et al, 2014). Immunohistochemical (IHC) analysis of differentially expressed proteins is currently superior to DNA-based biomarkers in terms of availability, labour requirements, determining cellular localisation of a marker and takes into consideration post-transcriptional processing. IHC is routinely used in pathology laboratories to differentiate between subtypes of oesophageal cancer and guides targeted therapy with biological agents in a range of cancer types (DiMaio et al, 2012; Ward et al, 2013). A number of IHC-based prognostic biomarkers have been reported in OAC, but none have entered clinical practice (Waterman et al, 2004; Ong et al, 2013).

In this systematic review and meta-analysis, we sought to carefully assess the available published literature on prognostic IHC biomarkers of survival in the resected tumour from patients with OAC. The objective of the study was to identify prognostic markers to provide improved risk stratification in addition to highlighting molecular targets that could offer strategies for the development of novel therapies for patients with OAC.

Materials and Methods

Identification of literature

The aim of the search was to identify all primary literature examining IHC markers of prognosis in OAC. A search strategy combining Plain Text and Medical Subject Heading terms was developed:

(1) (esoph* OR oeoph*) AND (carcinoma OR adenocarcinoma OR cance* OR neoplas* OR tumo*) AND (Prognos* OR Surviv* OR Mortal*) AND (protei* OR marke* OR biomark*)

OR

(2) Oesophageal Neoplasms/AND (prognosis/or disease-free survival (DFS)/OR Survival/OR mortality/or ‘cause of death’/or fatal outcome/or survival rate/) AND (biological markers/or exp antigens, differentiation/or genetic markers/or exp tumour markers, biological/OR genes/or exp genes, neoplasm/) AND Adenocarcinoma/.

The search term was entered into Ovid MEDLINE (1946 to November 2013) without limits and 3059 articles were returned (Figure 1). Existing systematic reviews and reference lists were crosschecked for studies missed by the search term. In cases where studies were derived from the same data set, the more recent or most complete article was retained. Only published results are included in this review. The Preferred Reporting Items for Systematic Review and Meta-Analysis were utilised (Moher et al, 2009).

Figure 1
figure 1

PRISMA flow chart illustrating stages of selection of final articles for meta-analysis.

Screening

Two independent reviewers (FN and LMM) examined 3059 study titles. From these study titles, 695 abstracts were brought forward as relevant to this study. On review of the abstracts, potentially eligible full-text articles were retrieved with relevant appendices and Supplementary Information.

Eligibility and data extraction

Full-text articles were reviewed against quality criteria (Table 1) derived from the REporting recommendations for tumour MARKer prognostic studies criteria (REMARK – published guidelines for quality reporting in IHC-based tumour biomarker studies; McShane et al, 2005).

Table 1 Inclusion criteria adapted from REMARK criteria, utilised at eligibility stage of selection

For relevant articles, variables were extracted. These included the following: first name author, IHC target, year of publication, number of cases, primary antibody used, dilution of primary antibody, reference group for statistical analysis, number of positive stained cases, univariate or multivariate analysis, hazard ratio (HR), 95% confidence interval (CI), P-value, location of stain and type of survival (overall survival (OS), cancer-specific survival (CSS) or DFS. Only CSS and OS were pooled in the meta-analysis).

Synthesis and statistical analysis

Both univariate and multivariate results were considered for the meta-analysis, with univariate analysis used preferentially when both were available. Univariate analysis was preferred due to the variability of analysis used (univariate n=3; multivariate n=5; and both n=27). In addition, there was variability in the method and variables used to derive the final multivariate model making comparative analysis across studies biased (Supplementary Figure 1). Of 36 studies included in the review, 27 (75%) stated HR and CI derived by multivariate analysis. Of these, the method used to make the model was described in only nine (33%). The method used to make the model varied as follows: entering all variables on univariate analysis into the model in 6 (22%); using backward stepwise regression in 2 (7%) and it was impossible to accurately assess the method used in 19 (70%). The number of variables used to create the multivariate model varied and was anywhere between 3 and 13. Where studies considered opposite degrees of expression, the inverse HR and CI was calculated to give results for high expression. For biomarkers analysed in more than one study, HR and CIs were entered into a random-effects model on Stata Statistical Software, SE 12 (StataCorp LP, College Station, TX, USA). The synthesised HR is reported as increase of risk of death from OAC within the individual study’s reference group with HR>1 indicating increased risk of death, and HR<1 indicating decreased risk of death.

The heterogeneity of results between studies was assessed using I2 statistics (a measure of consistency of results between studies) with increasing heterogeneity implying less utility in generalising across studies (Higgins et al, 2003). Sensitivity analysis was carried out by removing individual studies from the meta-analysis and assessing the effect on the pooled result. Presence of publication bias was formally evaluated using funnel plots (Figure 2) (Sterne et al, 2001).

Figure 2
figure 2

Funnel plot showing publication bias for the 58 included studies providing HR and CI. Plotted points are frequently seen away from the ‘0’.

Results

Excluded studies

Of the 3059 articles returned, 2364 were excluded on review of title and 482 on examination of abstract, leaving 214 articles considered relevant. Crosschecking of existing systematic review reference lists revealed no further relevant articles (Vallbohmer and Lenz, 2006; Ong et al, 2010; Chan et al, 2012; Chen et al, 2012, 2013; Peng et al, 2013; Gowryshankar et al, 2014).

Upon careful review of the 214 articles against the REMARK inclusion criteria, 56 did not provide a HR, 43 combined OAC and SCC subtypes for statistical analysis, 25 examined only SCC, 23 used non-IHC methodology, 14 did not examine survival, 7 had inadequate cohort description, 6 examined gastric cancer and 4 repeated use of a cohort. This left only 36 articles that conformed to REMARK inclusion criteria.

Included studies

26 individual research centres contributed to the 36 articles. 20 studies (56%) reported cohorts of patients who underwent surgery only; in 6 studies (17%) the authors reported that some patients had undergone neoadjuvant therapy and in 10 studies (28%) no information were given regarding preoperative treatment. Of the six studies, where some kind of neoadjuvant treatment was reported, this consisted of chemoradiotherapy in five (14%) and chemotherapy in three (8%). The percentage of patients who had undergone neoadjuvant therapy varied from 8 to 100%. The specific neoadjuvant treatment regimes that were used were only reported in three out of six (50%) studies. Little overlap in methodology was seen with every centre using different antibodies at different dilutions or different scoring systems. Variable cohort sizes were used, ranging from 24 to 259 cases. 50 HRs, CIs and relevant variables were extracted from these studies. Extracted data is reported in Table 2 with biomarkers grouped according to the hallmark of cancer with which a functional role for that molecule has been most closely attributed (Figure 3; Hanahan and Weinberg, 2011).

Table 2 Extracted data from biomarker articles
Figure 3
figure 3

Statistically significant prognostic biomarkers from at least one study in resected oesophageal adenocarcinoma covering all hallmarks of cancer.

19 of the 36 articles examined one or more of the same nine molecules, making them suitable for meta-analysis. Upon pooling studies, six of the nine molecules showed prognostic significance: COX-2, CD3, CD8, p53, EGFR and HER2 in order of HR, with LgR5, VEGF and Ki67 not reaching significance. Forrest plots are shown in Figure 4.

Figure 4
figure 4

Forest plots with associated hazard ratio (HR) and 95% confidence interval. Weights calculated using a random effects model. HR>1 implies worse survival with overexpression, HR<1; improved survival (vertical black line indicates HR of 1; red vertical dotted line indicates overall HR). A full colour version of this figure is available at the British Journal of Cancer journal online.

In three of the studies included in the meta-analyses, only multivariate analysis was stated. Where sensitivity analysis was not possible due to lack of appropriate, robust literature, agreement on prognostic value was considered with other studies on OAC.

COX-2

COX-2 is a rate-limiting enzyme in the conversion of arachidonic acid to prostaglandins and has multiple functions in immune evasion, angiogenesis and proliferation. COX-2 is consistently detected with varying expression in OAC (Lagorce et al, 2003). Subsequently, inhibitors of COX-2 have been shown to be protective of progression from Barrett’s oesophagus to OAC, and have shown some promise in improving prognosis when used alongside NAC (Corley et al, 2003; Tuynman et al, 2005).

Three studies, consisting of a total of 382 patients, contributed to quantify the effect of COX-2, which was found to correlate negatively with prognosis (Buskens et al, 2002; Bhandari et al, 2006; Prins et al, 2012). Although consistent overexpression is noted in OAC, differences in cutoff values for staining positivity, and variability in numbers of positive staining cases between studies are seen here (27% and 79% positive) (Buskens et al, 2002; Prins et al, 2012). Within other prognostic studies on OAC, not providing HR, both significant (three studies, n=194) and non-significant (three studies, n=139) results have been reported (Lagorce et al, 2003; France et al, 2004; Kulke et al, 2004; Heeren et al, 2005; Mobius et al, 2005; Tuynman et al, 2008).

CD3

CD3+ cells are mature T lymphocytes and quantification of CD3+ has been commonly used to evaluate immunological response against solid tumours (Dahlin et al, 2011). Two studies identified CD3 as an independent predictor of improved survival in OAC (Rauser et al, 2010; Zingg et al, 2010). Methods of exploration varied; with one study using an automated scoring system across 10 random high-power fields vs central CD3+ lymphocyte count (Rauser et al, 2010; Zingg et al, 2010). However, the studies show good agreement (I2=0.00%) and similar weighting on meta-analysis.

CD8

CD8 is a marker of cytotoxic T cells. CD8+ cells kill cancer cells via release of granzyme and perforin or via Fas ligand presentation (Owen et al, 2013). This is an area of considerable interest with trials in a number of solid organ cancers examining strategies to enhance tumour-cell killing. The discovery of the role of PD-L1 on tumour cells and its interaction with the PD-1 receptor on cyctotoxic T cells leading to immune cell exhaustion have led to the development of antibodies targeting both the receptor and its ligand (McDermott and Atkins, 2013). Two studies were pooled examining CD8 (cytotoxic, T-cell effector), comprising a total of 203 cases (Zingg et al, 2010; Dutta et al, 2012). Moderate heterogeneity is observed (I2=54.5%). Methodological differences may be the cause of the heterogeneity, with observation of increasing CD8+ count across three tertiles, vs CD8 with a cutoff set for high vs low expression (Zingg et al, 2010; Dutta et al, 2012).

EGFR

EGFR is a receptor tyrosine kinase, shown to have effects on cancer differentiation, proliferation, invasion and metastasis (Grandis and Sok, 2004). EGFR targeting is used in the treatment of colorectal cancer and non-small cell lung cancer (Mahipal et al, 2014). A total of 642 patients were pooled from two studies (Wang et al, 2007; Ong et al, 2013). Combined, an overall, slight-negative, prognostic effect of EGFR overexpression was found. This is in agreement with other studies in OAC that were unsuitable for meta-analysis (Mukaida et al, 1991; Yacoub et al, 1997; Lennerz et al, 2011). This effect on prognosis with overexpressed EGFR has been noted in both colorectal cancer and gastric adenocarcinoma (Rego et al, 2010; Hong et al, 2013).

p53

p53 acts as a hub for multiple intra-cellular surveillance systems, constantly reporting on cellular integrity. When stress is detected, a damaged cell can initiate DNA repair, senescence and/or apoptosis. TP53 mutation can increase protein stability, meaning IHC detection correlates with mutation (Bellini et al, 2012). However IHC does not account for all mutations, with between 52% and 80% agreement between IHC and PCR with truncating and missense mutations (Bian et al, 2001). TP53 is the most commonly mutated gene in OAC and has recently been found to have a mutational frequency that would distinguish between disease stages and thus identify progression towards malignancy (Weaver et al, 2014).

Eleven studies examining p53 were reviewed in full (Flejou et al, 1994; Casson et al, 1995; Sauter et al, 1995; Moskaluk et al, 1996; Hardwick et al, 1997; Casson et al, 1998, 2003; Jiao et al, 2003; Heeren et al, 2004; Cavazzola et al, 2009; Madani et al, 2010). Only three of these were suitable for inclusion in the systematic review and subsequent meta-analysis, containing a total of 268 patients and showing a pooled effect of worse prognosis with increased expression (Moskaluk et al, 1996; Cavazzola et al, 2009; Madani et al, 2010). Good agreement is seen between the three included studies. However, five other studies not included in the review failed to reach significance, suggesting that the prognostic value of p53 may not be as obvious as the meta-analysed results suggest (Duhaylongsod et al, 1995; Coggi et al, 1997; Hardwick et al, 1997; Langer et al, 2006; Falkenback et al, 2008).

HER2

HER2 exhibits extensive homology with EGFR, frequently dimerising with it or another member of the EGFR family HER3 (Wolf-Yadlin et al, 2006). HER2 is overexpressed in a number of cancers and undergoes a ligand-independent activation, with consequent downstream signals involved in proliferation and migration (Wolf-Yadlin et al, 2006). HER2 provides a target for the monoclonal antibody trastuzumab, which has proven efficacy in breast and gastric cancer treatment (Hynes and Lane, 2005; Bang et al, 2010). HER2 targeting in OAC is being assessed in the feasibility arm of the MRC STO3 clinical trial (Okines et al, 2013).

Two studies were suitable for pooling, showing an OS benefit with overexpression of HER2 (Nakamura et al, 1994; Yoon et al, 2012; Phillips et al, 2013). Both studies used the ToGA trial protocol to assess HER2 overexpression, including FISH analysis of ErbB2 gene amplification. This effect has failed to be reproduced by smaller studies, potentially as a result of under powering (Polkowski et al, 1999; Reichelt et al, 2007; Hu et al, 2011; Thompson et al, 2011). The overall protective effect seen here is in contrast to studies investigating OAC using techniques other than IHC, where a negative prognostic effect is noted, as well as in breast cancer, where a dramatically worse prognosis is seen with overexpression (Andrulis et al, 1998; Chan et al, 2012).

LgR5

The R-spondin receptor Lgr5 is a stem cell marker in multiple organs in mice and humans. Single Lgr5 stem cells derived from the intestine can be cultured to build epithelial structures that retain hallmarks of the in vivo epithelium (Sato and Clevers, 2013). In tumours, Lgr5 expression is believed to define cancer stem cells and may have prognostic effects by promoting invasion and metastasis as well as initiating self-renewal pathways (Reya and Clevers, 2005). Despite the vast majority of cancer deaths being attributable to invasion and metastasis, Lgr5 was the only suitable biomarker for meta-analysis with its main function associated with this hallmark of cancer (Becker et al, 2010; von Rahden et al, 2011).

Lgr5 failed to reach statistical significance as a prognostic marker. This was due to the wide, asymmetric CIs, resulting from under powering with only 84 cases in total across the two studies (Becker et al, 2010; von Rahden et al, 2011).

VEGF

VEGF is upregulated in response to hypoxia, acting as a key mediator of angiogenesis and affecting vessel permeability, potentially enhancing haematogenous dissemination (Hicklin and Ellis, 2005). Two studies contributed to VEGF meta-analysis, with a total of 181 patients, producing a non-significant effect on survival (Cavazzola et al, 2009; Prins et al, 2012). Again, few studies have examined prognosis and angiogenesis, with contradictory results seen in small cohorts (Couvelard et al, 2000; Saad et al, 2005). With emerging targeted therapies, further work will be required to confirm whether VEGF is a true driver of cancer aggressiveness (Shah et al, 2011).

Ki67

Despite the common use of Ki67 to index cellular proliferation, its biological function in the tumour remains elusive. It seems to co-localise with ribosomal RNA during mitosis suggesting a role in protein synthesis and, more recently, chromatin remodelling (Bullwinkel et al, 2006).

Here, three studies were pooled, comprising a total of 192 patients (Evangelou et al, 2008; Falkenback et al, 2008; Dutta et al, 2012). A non-significant result was observed. Again, this could be due to a combination of asymmetrical wide CIs in two studies, combined with marginal prognostic value in the other. In breast cancer, increased cellular proliferation index has been studied as a negative prognostic marker and in directing use of chemotherapy against rapidly dividing tumours (Martin et al, 2004; de Azambuja et al, 2007; Yerushalmi et al, 2010). However, Ki67 expression is understudied in OAC, and prognostic significance remains inconclusive.

Publication bias

Within the 214 relevant articles, 92 of these provided HRs and statistical significance, 52 (57%) of these provided non-significant results. This is in contrast to the final 36 articles that met REMARK inclusion criteria, where only six (18%) centred on non-significant results. Asymmetry was noted when all data was viewed on a funnel plot (Figure 2) suggesting positive publication bias.

Discussion

Previous meta-analysis of oesophageal cancer examining individual molecules of prognosis have combined OAC and SCC in addition to using different investigational techniques for analysis (Vallbohmer and Lenz, 2006; Ong et al, 2010; Chan et al, 2012; Chen et al, 2012, 2013; Peng et al, 2013; Gowryshankar et al, 2014). There is consensus that OAC and SCC should be considered as separate biological entities and current clinical trials in oesophageal cancer reflect this approach. To date, this is the first meta-analysis that has synthesised the literature associated with all IHC markers solely in resected OAC. Using a validated prognostic marker-reporting tool to inform our strict inclusion and exclusion criteria, we identified 36 high-quality articles providing reliable HRs and CIs (McShane et al, 2005). From these articles, nine markers were suitable for meta-analysis and of these six markers showed significant correlation with survival. These markers were COX-2, CD3, CD8, HER2, EGFR and p53. Several other molecules have been assessed in good quality studies that met the REMARK inclusion criteria, but do not have a second study available for pooling. Of particular interest, MET, B7-H1, CAIX, ANXA1 and VEGF-C all showed significant, highly prognostic effects in cohorts containing over 100 cases but still require validation and/or elucidation of the underlying biology.

A number of the molecules identified in this review are related to emerging therapies. Four of the nine meta-analysed markers (COX-2 – celecoxib, EGFR – gefitinib, HER2 – trastuzumab and VEGF – bevacizumab) focussed on molecules with targeted therapeutics either already in use or in development, and two lymphocyte markers representing the presence of effectors of anti-tumour immunity, which can be induced by new therapies (Zhang et al, 2003; Galon et al, 2006; Ekman et al, 2008; Mei et al, 2014; Ward et al, 2014). As well as new therapies, there is an increasing interest in the role the cancer microenvironment has in OAC progression (Courrech Staal et al, 2011). Here, CD3 and CD8 demonstrate the greatest protective prognostic impact, illustrating the importance of the immune response to OAC. However, IHC analysis of other components of the microenvironment have been largely neglected, for example, only two papers comment on the impact of cancer-associated fibroblasts on prognosis (Laerum et al, 2012; Schoppmann et al, 2013a).

The most striking observation of this meta-analysis is the scarcity of high-quality articles, with 66% (69 out of 104) of potentially suitable studies not conforming to REMARK criteria. In similar meta-analyses published on the two cancers with worse prognoses than OAC, 83 suitable articles were pooled for pancreatic cancer prognosis, and in lung cancer, enough data were found to analyse 17 markers studied in four or more papers (Zhu et al, 2006; Jamieson et al, 2011; Peng et al, 2013). This suggests that prognostic marker research in OAC is lacking. In addition, despite the majority of patients now receiving some form of neoadjuvant therapy before resection for OAC (Noble et al, 2013), we found this to be poorly reported in these studies. It was therefore impossible to make any attempt to discriminate between markers prognostic after primary resection or after neoadjuvant therapy. Future reports should include a detailed description of the types of multimodal treatment given to patients and preferably include an analysis based on these treatment types.

A trend was noticed towards more robust methodology when authors used larger data sets. The largest study identified used independent generation and validation data sets to confirm the prognostic significance of the novel markers SIRT2 and TRIMM44. The analysis was performed in two different patient cohorts from separate centres and was of high quality (Ong et al, 2013). However, we were unable to include this study in the meta-analysis because the cutoffs used to assess the HRs in the two cohorts were different. SIRT2 and TRIMM44 require validation using the same methodology and cutoffs in another cohort. Despite this, the study by Ong et al (2013) describes a sophisticated approach to the development of a biomarker based on genetic analysis carried through to the protein level. Genome sequencing studies such as the UK ICGC project in OAC (Weaver et al, 2014) will deliver more potential markers of prognosis in selected sub-groups and methods such as those described by Ong et al (2013) will be required to translate these findings into meaningful clinical outcomes.

Authors who appeared more than once in the 214 initial articles, often adhered to REMARK criteria, and provided log-rank or cox regression hazard calculations. This suggests a gradual uptake of REMARK criteria, since it’s inception in 2005. Another potential reason for this poorer reporting in smaller studies may be due to more frequent negative results due to inadequate powering. With an overall reluctance towards negative reporting, it is quite possible that these results are left out as redundant data, with the larger data sets having more positive results, and a greater likelihood of publication.(Kyzas et al, 2007)

Limitations

Meta-analysis is able to enhance power that leads to more robust generalisations within a field. However, there are notorious confounding factors (Altman, 2001).

Here, only one study was prospective in design (Madani et al, 2010). Retrospective analysis allows potential issues in reporting and selection bias. With differences in multivariate or univariate analysis, CSS or OS, size of cohorts, cutoffs, primary antibody at different dilutions and occasionally radically different numbers of positive staining cases lead to less validity when combining results. Future work will require multi-centre efforts to gather large enough, prospective cohorts to provide robust clarification of truly prognostic markers.

With this meta-analysis we have only included IHC detectable markers of survival. Both IHC and RT-PCR have their own limitations; however, IHC is seen to be the most practical way to assess protein expression in solid cancers, with IHC survival biomarkers well described in other malignancies (Zhu et al, 2006; Jamieson et al, 2011).

In future work, multivariate modelling will give an insight into interaction between different variables in OAC. In this study, univariate analysis was used preferentially, to limit heterogeneity between methods of producing HRs, as a multivariate HR can be altered by use of different prognostic factors or model types in individual models. In fact, it is likely that a combination of markers will be required to give meaningful prognostic information to an individual patient, perhaps covering multiple Hallmarks of Cancer, rather than considering individual biomarkers in isolation. There are existing data from oesophageal cancer biology to support this strategy (Kadri et al, 2010; Peters et al, 2010; Liu et al, 2014).

Conclusion

Current methods have not delivered clinically useful molecular prognostic biomarkers in OAC. We have highlighted the paucity of good-quality robust studies in this field. This may be because little attention has been focused on OAC research compared with other cancers, or perhaps it is an indication of the molecular complexity of the disease that is only just beginning to be appreciated. The development of new and novel biomarkers in OAC will require understanding of this complexity and in this context IHC alone seems inappropriate. A genome to protein approach would be better suited for the development and subsequent validation of biomarkers. Large collaborative projects with standardised methodology will be required to generate clinically useful biomarkers.