Modest effect of p53, EGFR and HER-2/neu on prognosis in epithelial ovarian cancer: a meta-analysis

Background: P53, EGFR and HER-2/neu are the most frequently studied molecular biological parameters in epithelial ovarian cancer, but their prognostic impact is still unequivocal. We performed a meta-analysis to more precisely estimate their prognostic significance. Methods: Published studies that investigated the association between p53, EGFR and HER-2/neu status and survival were identified. Meta-analysis was performed using a DerSimonian–Laird model. Publication bias was investigated using funnel plots and sources of heterogeneity were identified using meta-regression analysis. Results: A total of 62 studies were included for p53, 15 for EGFR and 20 for HER-2/neu. P53, EGFR and HER-2/neu status had a modest effect on overall survival (pooled HR 1.47, 95% CI 1.33–1.61 for p53; HR 1.65, 95% CI 1.25–2.19 for EGFR and HR 1.67, 95% CI 1.34–2.08 for HER-2/neu). Meta-regression analysis for p53 showed that FIGO stage distribution influenced study outcome. For EGFR and HER-2/neu, considerable publication bias was present. Conclusions: Although p53, EGFR and HER-2/neu status modestly influences survival, these markers are, by themselves, unlikely to be useful as prognostic markers in clinical practice. Our study highlights the need for well-defined, prospective clinical trials and more complete reporting of results of prognostic factor studies.

Epithelial ovarian cancer is the leading cause of death from gynaecological cancers in the Western world. This high mortality is related to the difficulty to detect ovarian cancer at an early stage as well as the lack of effective therapies for advanced-stage disease (Cannistra, 2004).
Prognostic factors are defined as phenotypes, which correlate with the duration of (progression-free) survival (Agarwal and Kaye, 2005). In ovarian cancer, well-known clinicopathological prognostic factors in early-stage disease include differentiation grade and tumour rupture during surgery, whereas in late-stage disease histiotype, patient age, performance status and residual tumour after primary surgery are important prognostic factors (Vergote et al, 2001;Winter et al, 2007). Although these parameters do reflect biological features of both tumour and patient, they do not allow adequate prediction of outcome for the individual patient. The discovery of molecular biological prognostic factors should aid in a more accurate prediction of clinical outcome and may also reveal novel predictive factors and therapeutic targets (Oldenhuis et al, 2008).
The most frequently studied putative molecular biological prognostic factors in ovarian cancer are the tumour suppressor protein 53 (p53), and the oncogenes epidermal growth factor receptor (EGFR) and human epidermal growth factor receptor 2 (HER-2/neu). These markers also hold considerable promise as therapeutic targets. Agents targeting p53, EGFR and HER-2/neu proteins are currently under investigation in clinical trials (Dinh et al, 2008). However, evidence regarding their prognostic value with respect to survival is still inconclusive. Results of systematic reviews, including one from our institution, showed that these markers might predict prognosis in ovarian cancer, and also suggested considerable methodological variability (Crijns et al, 2003;Hall et al, 2004). The identification of these methodological weaknesses and sources of heterogeneity is important to improve the quality of future prognostic and predictive factor studies in ovarian cancer and other tumour types.
The aim of this study was to more precisely estimate the prognostic value of these markers and to adjust for methodological variability. We have used statistical methods developed by Parmar et al (1998) to indirectly estimate hazard ratios from Cox regression analyses and P values from log-rank tests, enabling us to incorporate a large number of studies in our meta-analyses. Moreover, we performed an in-depth analysis of study quality, the presence of publication bias and the extent and sources of heterogeneity between published studies.

Search strategy and selection criteria
A MEDLINE, PubMed and EMBASE search for studies investigating the prognostic significance of p53, EGFR and HER-2/neu in ovarian cancer was performed. Studies published between 1990 and January 1st, 2009, were examined. MESH words used were 'ovarian neoplasm', 'receptor epidermal growth factor', 'receptor erbB-2' and 'protein p53'. Additional words used for title search were marker* or prognost* or survival. The references of all publications and reviews were hand-searched to identify missing relevant publications.
Studies were included in the meta-analysis if they met the following criteria: (1) patients included had chemonaive epithelial ovarian cancer; (2) the endpoint investigated was disease specific or overall survival; (3) the study reported a hazard ratio (HR) and standard error (s.e.) or data sufficient to estimate the HR and s.e. from univariate survival analysis. Where a single study was reported on multiple occasions, only the report with the largest patient group or the most complete data was included. If a study reported results for more than one method (i.e., immunohistochemistry (IHC) and mutational analysis), for more than one welldescribed patient group or for multiple antibodies, results of all analyses were included in the meta-analysis. Thirteen studies published in languages other than English or German were excluded from the meta-analysis (for an overview, see Supplementary Table 1). Reviews, non-original articles and studies on non-epithelial or borderline ovarian tumours were also excluded.
Two researchers (PdG and APGC) independently examined abstracts of articles (n ¼ 614) to decide whether full-text articles should be obtained (Figure 1). Cases of disagreement were resolved by discussing the title and abstract. Full-text articles (n ¼ 216) were examined and excluded if a more detailed examination revealed that they did not meet the inclusion criteria. The sample size of included studies did not differ from the sample size of excluded studies (data not shown). Where applicable, we adhered to the QUORUM criteria for improving the quality of reporting of meta-analyses (Moher et al, 1999).

Data extraction
Data were extracted independently by two investigators (PdG and APGC) by means of a predefined form. Topics in this form were year of publication, country, number of patients, years of patient inclusion, method of case selection (retrospective or prospective cohort of patients), age at time of diagnosis (mean, median, range), distribution of stage, tumour type and differentiation grade, treatment, amount of residual tumour after primary surgery, response to chemotherapy, time of follow-up (median, mean, minimum and maximum), assay method and scoring protocol used, number of marker positive and negative tumours, numbers of (disease specific and overall) death, and results of univariate survival analyses.

Assessment of study quality and publication bias
Study quality was assessed independently by two investigators (PdG and APGC) by means of a predefined form. As there are no generally accepted standards for measuring study quality, this form was derived from the work of McShane et al (2005) and Hayes et al (1996) (Supplementary Table 2). In summary, the following criteria were investigated: whether (1) the study reported inclusion and exclusion criteria; (2) study data were prospectively or retrospectively gathered; (3) patient and tumour characteristics were sufficiently described; (4) the assay used to measure biomarker expression was sufficiently described; (5) a definition of the study endpoint was provided; (6) the follow-up time of patients in the study was described; (7) the study reported how many patients were lost to follow-up or were not available for statistical analysis. Studies with a total score of 8 were considered to show the highest study quality, whereas a zero score indicated the lowest quality.
Additionally, studies were scored as phase I -III prognostic marker studies according to the classification proposed by Simon and Altman (1994). Early exploratory studies are designated phase I studies, whereas phase II studies investigate the association of a biomarker with patient prognosis and are hypothesis generating in nature, and phase III studies are large confirmatory studies of prestated hypotheses.
Publication and selection bias were investigated through a funnel plot (Egger et al, 1997).

Statistical analysis
Statistical analyses were carried out using SPSS version 12.01 (SPSS, Chicago, IL, USA), Review Manager version 4.2 (The Cochrane Collaboration, the Nordic Cochrane Centre, Copenhagen, Denmark) and MLWIN version 2.0 (Centre for Multilevel Modelling, University of Bristol, Bristol, UK).
The first goal of our meta-analysis was to obtain a log-hazard ratio and its standard error for each study according to methods previously described by Parmar et al (1998). If the study reported results of a univariate Cox regression analysis, log-hazard and its standard error were directly included in the meta-analysis. When the study did not report the standard error, it was estimated from the 95% confidence interval (CI) or P value of univariate Cox regression analyses. If results of univariate Cox regression analyses were not presented in the paper, the log-hazard ratio and its standard error were estimated indirectly from P values of the logrank test. Subsequently we performed a meta-analysis using the DerSimonian -Laird random effects model (DerSimonian and Laird, 1986), applying the inverse of variance as a weighing factor. Heterogeneity was investigated by use of the I 2 statistic, which takes values from 0 to 100% . An I 2 value 450% was considered to represent substantial heterogeneity between studies.
Quantitative assessment of sources of heterogeneity was undertaken by meta-regression analysis . The following potential sources of heterogeneity were explored: study quality score, year of publication (o or 4 median year of publication), data collection (prospective or retrospective), region (Europe, United States, Asia or other), FIGO stage (o or 450% FIGO stage III/IV tumours), tumour type (o50% or 450% serous tumours), differentiation grade (o50% or 450% grade III or undifferentiated tumours), type of tumour tissue (frozen or paraffin-embedded), assay method (IHC or other), primary antibody (monoclonal or polyclonal), cut-off value for positive

Quality assessment and publication bias
The median quality score was 5 (range 1 -8) for p53, 5 for EGFR (range 3 -7) and 5 for HER-2/neu (range 3 -8; Supplementary  Tables 3 -5). High study quality was related to a high journal impact factor for p53 (P ¼ 0.010), but not for EGFR (P ¼ 0.59) and HER-2/neu (P ¼ 0.65). Investigation of bias by a funnel plot showed substantial funnel plot asymmetry for HER-2/neu and EGFR, suggesting the presence of publication and/or selection bias (Figure 2). For p53, no funnel plot asymmetry was found.
Meta-analysis and assessment of heterogeneity P53 Meta-analysis of 53 studies on the prognostic value of p53 expression showed that aberrant of p53 status is associated with poor overall survival (HR obtained from DerSimonian-Laird random effects model: 1.55 (95% CI 1.40-1.71); Figure 3), although there was heterogeneity between studies (I 2 ¼ 44.4%). Subgroup analysis revealed a prognostic impact for IHC studies, IHC studies with the DO7 antibody, studies using mutational analysis and studies with a quality score 46. However, considerable heterogeneity remained present, indicating that not all sources of heterogeneity could be accounted for (Table 1). When the meta-analysis was restricted to studies reporting results of (subgroup) analyses for serous tumours (Bali et al, 2004;Terauchi et al, 2005;Ueno et al, 2006;Yakirevich et al, 2006;Kobel et al, 2008;Vartiainen et al, 2008) p53 status was also a predictor of poor survival. Unfortunately, the number of studies reporting results for the other histological subtypes was too small to perform a pooled analysis. Meta-regression analysis revealed that the outcome of analysis was influenced by FIGO stage distribution. When results of six studies reporting results for stage III/IV tumours were subsequently pooled, p53 status was no longer of prognostic value (Table 1).
EGFR Results of meta-analysis for EGFR showed a significant relationship between overexpression of EGFR and poor patient outcome (HR: 1.65 (95% CI 1.25 -2.19); Figure 4). Although significant heterogeneity was present (I 2 ¼ 74.3%), the sources of heterogeneity could not be determined in meta-regression analysis.
Restricting the analysis to studies that used IHC staining for determination of marker expression did not alter results of heterogeneity tests (Table 1). However, further analysis showed that heterogeneity was partly due to results of the study by Psyrri et al (2005). When this study was excluded from the meta-analysis, less heterogeneity was observed.
HER-2/neu Meta-analysis of univariate analyses on the prognostic value of HER-2/neu showed that overexpression of HER-2/neu is associated with poor overall survival (HR: 1.67 (95% CI 1.34 -2.08); Figure 5), but again considerable heterogeneity was present (I 2 ¼ 59.6%). Of note, none of the studies using immunohistochemical staining followed by FISH for ambiguous samples reported a statistically significant relationship between HER-2/neu expression and survival (Castellvi et al, 2006;Malamou-Mitsi et al, 2007;Tuefferd et al, 2007). The most important factor explaining the lack of homogeneity between studies was study quality, with studies of low quality reporting more significant results.

DISCUSSION
In this study, we present a pooled estimate of the prognostic value of p53, EGFR and HER-2/neu in epithelial ovarian cancer. Our results show that as single markers, p53, EGFR and HER-2/neu are not likely to be useful as prognostic factors in clinical practice (pooled HR for all included studies: 1.47 (95% CI 1.33 -1.61) for p53; 1.65 (95% CI 1.25 -2.19) for EGFR and 1.67 (95% CI 1.34 -2.08) for HER-2/neu). Furthermore, our study clearly indicates that adequate conduct and complete reporting are imperative for improving the quality of prognostic factor studies in ovarian cancer.
Although protein expression of p53 and EGFR as assessed by IHC staining has a modest effect on prognosis, neither p53 nor EGFR immunostaining predicts clinical outcome in a manner comparable to well-known clinicopathological prognostic factors such as tumour stage and residual tumour after primary surgery. Our results also show that p53 mutations have prognostic value in epithelial ovarian cancer, although this was of borderline significance. However, this analysis was affected by small sample size and methodological issues, such as the use of different techniques for mutational analyses and the analysis of different exons.
For HER-2/neu and EGFR, the ability to draw reliable conclusions from meta-analysis was affected by the presence of considerable publication bias for studies with a small sample size yielding non-significant results. The presented hazard ratios might, therefore, be an overestimation of the true effect size. More importantly, meta-regression analysis demonstrated that studies that are poorly designed or reported produce higher estimates of the prognostic value of HER-2/neu. This finding has previously been demonstrated in a meta-analysis of clinical trials, where incorporation of results of poor quality randomised controlled trials contributed to significant exaggeration of treatment efficacy (Moher et al, 1998).
It has long been appreciated that the histological subtypes of ovarian cancer show considerable differences with respect to stage at diagnosis, response to chemotherapy and underlying molecular abnormalities (Bell, 2005). This was recently demonstrated by Kobel et al (2008), who assessed the expression of 21 candidate biomarkers in a large cohort of 500 ovarian carcinomas and subsequently performed subgroup analyses for the different histological subtypes. Their results showed that the expression as well as the prognostic value of most biomarkers considerably varied between the subtypes. In this study, we assessed the prognostic value of p53 in six studies presenting (subgroup) analyses for p53 in serous tumours. The results of this analysis did not show a large difference between the prognostic value of p53 in serous tumours and it prognostic value in the entire cohort. Additionally, we performed a subgroup analysis for four studies reporting six analyses on the prognostic value of p53 in stage III/IV tumours. In this group, p53 was not of prognostic value. However, the number of studies that could be analysed was small and we were not able to perform a pooled analysis for the other histological subtypes. Our results underscore the importance of biomarker analysis in homogeneous subgroups of patients, such as patients with a particular disease stage, tumour type or differentiation grade. To perform these kinds of analyses, international collaboration is critical. Furthermore, the submission Figure 3 Forest plot showing results of studies on the prognostic value of p53 expression. Hazard ratios and 95% CI (confidence interval) of individual studies for patients with p53 positive tumours. Hazard ratios: squares whose heights are inversely proportional to the standard error of the estimate, and their respective confidence intervals (horizontal lines). Summary hazard ratio: diamond with horizontal limits at the confidence limits and width inversely related to its standard error. Hazard ratios higher than 1 indicate an increased risk of death for patients with a tumour with aberrant p53 status. Abbreviations: MUT ¼ results of mutation analysis; IHC ¼ results of immunohistochemical staining; cyt ¼ results for cytoplasmic immunostaining; nucl ¼ results for nuclear immunostaining; P arm ¼ results for patients treated with cisplatin; PC arm ¼ results for patients treated with cisplatin/ cyclophosphamide. p53, EGFR and HER-2/neu in ovarian cancer P de Graeff et al of raw, uncategorised study data to public databases would allow for analysis of specific subgroups although maintaining prognostic power.
Most studies in the meta-analysis used IHC staining to study expressions of p53, EGFR and HER-2/neu. Although IHC staining is simple and cost-effective to perform, results are highly dependent on a variety of methodological factors such as storage time and fixation method of paraffin-embedded tissues, choice of primary antibody and IHC staining protocol (Jacquemier et al, 1994;Hall et al, 2004). In this study, differences in IHC staining protocols and cut-off values for positive protein expression ranging from 45 to 490% positively stained cells may have contributed to the observed heterogeneity. Our results, therefore, make a strong case for international consensus on staining and scoring protocols.
As a first step towards quality assessment of prognostic factor studies to be included in meta-analyses, we have developed a quality score. For meta-analyses evaluating results of both clinical trials and diagnostic studies, such criteria are available and are widely used to either exclude studies low-quality studies or evaluate study quality (Jadad et al, 1996;Whiting et al, 2003). As our quality score was newly developed for this study and was not extensively validated, we chose not to exclude studies from statistical analysis beforehand because of a low score. Based on results of meta-regression analysis we do, however, believe that it provides a good estimation of study quality. In future studies, our quality score might serve as a further step towards the development of evidence-based quality assessment tools for metaanalyses of prognostic factor studies. In addition, the use of the recently published REMARK guidelines for reporting of prognostic factor studies will aid in a more complete and transparent reporting (McShane et al, 2005), thereby also increasing the number of high-quality studies that can be included in a meta-analysis.  Figure 4 Forest plot showing results of studies on the prognostic value of EGFR expression. Hazard ratios and 95% confidence intervals for patients with EGFR positive tumours (symbols as in Figure 3).
We have also designated all studies phase I -III prognostic factor studies according to a classification proposed by Simon and Altman (1994). Although several large studies on the prognostic value of p53 and HER-2/neu have been performed, no studies met the stringent criteria for phase III biomarker studies. A prespecified hypothesis, the description of eligibility criteria and a sufficiently large number of patients were often lacking. In addition, almost none of the studies were specifically designed to determine the prognostic impact of p53, EGFR or HER-2/neu as single markers. These results underscore the need for welldesigned studies with clearly stated hypotheses that examine the relationship between biomarker expression and clinical outcome.
Although this study shows that p53, EGFR and HER-2/neu immunostaining do not have a strong direct relationship with survival, it is more than likely that their respective pathways do influence patient prognosis. In future studies, several approaches could be taken to elucidate the prognostic value of these pathways. For instance, IHC staining of activated (phosphorylated) receptors and key regulatory proteins involved in upstream and downstream signalling may be more informative than immunostaining of single markers regardless of their activation status (Wang et al, 2005;de Graeff et al, 2008). In addition, other methods to assess pathway activation status may be employed to identify prognostic factors. For instance, EGFR amplification as determined by FISH has been shown to be independently associated with poor survival in vulvar cancer and in head and neck squamous cell carcinomas (Chung et al, 2006;Growdon et al, 2008). Two recent reports in ovarian cancer also suggest that increased gene copy number of EGFR is more strongly related to survival than protein expression (Lassus et al, 2004(Lassus et al, , 2006. Other attractive approaches for the identification of novel prognostic and predictive factors include the identification of genes and pathways by microarray analysis. Traditional prognostic factor studies, including those on p53, HER-2/neu and EGFR, have until now mainly focused on the prognostic value of single genes. Over the past years, it has become apparent that this 'one gene, one outcome' hypothesis is an oversimplification of the multiple genetic and epigenetic mechanisms that account for ovarian cancer survival. Using pathway analysis of large datasets such as microarray data (Bild et al, 2006), alterations in the p53, EGFR and HER-2/neu pathways rather than single genes can be analysed. Ultimately, the identification or deregulated pathways in a single tumour may lead to a more precise estimation of patient prognosis and might also reveal novel therapeutic targets. However, these studies often need a far more complex design and statistical analysis compared to single marker studies. It is, therefore, especially important to address methodological issues when designing and reporting these analyses, and to take possible sources of heterogeneity into account.
There are some limitations to this meta-analysis. Firstly, especially for EGFR and HER-2/neu considerable heterogeneity was observed. When subgroup analyses for more homogeneous groups of studies was performed, for example, only studies performing IHC staining, heterogeneity remained present. This indicates that not all sources of heterogeneity could be accounted for in this meta-analysis, and that results should be interpreted with caution. Secondly, we have restricted our analysis to published studies written in English or German. Thirteen, mostly small studies that met eligibility criteria according to the abstracts were excluded based on language criteria. This may result in publication or language bias leading to an overestimation of effect sizes (Egger et al, 1997;Pham et al, 2005). Although this was not the case for p53, there was clear evidence of publication bias for EGFR and HER-2/neu. Thirdly, our meta-analysis is based on unadjusted estimates, whereas a more precise estimate could be obtained using a multivariate analysis adjusting for clinicopathological variables. However, multivariate analyses reported in the included studies used various models and different covariates, and could, therefore, not be combined into a pooled estimate.
In conclusion, our study shows that although aberrations of p53 and EGFR have a modest effect on survival in ovarian cancer, they are currently unlikely to influence clinical decision-making.  Figure 5 Forest plot showing results of studies on the prognostic value of HER-2/neu expression. Hazard ratios and 95% CI (confidence intervals) for patients with HER-2/neu positive tumours (symbols as in Figure 3).
Identification of multiple methodological flaws and sources of heterogeneity in currently available prognostic factor studies should contribute to improve design and reporting of future prognostic and predictive factor studies. Hopefully, this way, deregulated molecular biological factors/pathways will be identified that will make a difference in clinical decision making, ultimately resulting in effective, individualised targeted therapy for ovarian cancer patients.
Supplementary Information accompanies the paper on British Journal of Cancer website (http://www.nature.com/bjc)