HE4 and CA125 as a diagnostic test in ovarian cancer: prospective validation of the Risk of Ovarian Malignancy Algorithm

Background: Recently, a Risk of Ovarian Malignancy Algorithm (ROMA) utilising human epididymis secretory protein 4 (HE4) and CA125 successfully classified patients as presenting a high or low risk for epithelial ovarian cancer (EOC). We validated this algorithm in an independent prospective study. Methods: Women with a pelvic mass, who were scheduled to have surgery, were enrolled in a prospective study. Preoperative serum levels of HE4 and CA125 were measured in 389 patients. The performance of each of the markers, as well as that of ROMA, was analysed. Results: When all malignant tumours were included, ROMA (receiver operator characteristic (ROC)-area under curve (AUC)=0.898) and HE4 (ROC-AUC)=0.857) did not perform significantly better than CA125 alone (ROC–AUC=0.877). Using a cutoff for ROMA of 12.5% for pre-menopausal patients, the test had a sensitivity of 67.5% and a specificity of 87.9%. With a cutoff of 14.4% for post-menopausal patients, the test had a sensitivity of 90.8% and a specificity of 66.3%. For EOC vs benign disease, the ROC–AUC of ROMA increased to 0.913 and for invasive EOC vs benign disease to 0.957. Conclusion: This independent validation study demonstrated similar performance indices to those recently published. However, in this study, HE4 and ROMA did not increase the detection of malignant disease compared with CA125 alone. Although the initial reports were promising, measurement of HE4 serum levels does not contribute to the diagnosis of ovarian cancer.

The majority of women who undergo surgery for an ovarian cyst or pelvic mass are treated in a community hospital by a gynaecologist or general surgeon. Although this is appropriate for patients who have a benign cyst, patients with a malignancy should be referred to a tertiary care centre with multidisciplinary teams specialised in ovarian cancer treatment. A recent systematic review showed an improved outcome for patients with ovarian cancer when they were referred to, and surgically treated by, gynaecological oncologists (du Bois et al, 2009). Therefore, it is important to triage women with increased risk for ovarian cancer to the appropriate surgeon and centre.
CA125 is the most widely used tumour marker in ovarian cancer (Bast et al, 1983). The sensitivity and specificity of CA125 are far from ideal as its levels are raised in approximately 80% of all epithelial ovarian cancers (EOC) and in only 50% of stage I EOC (Zurawski et al, 1988). Therefore, CA125 is rarely used as a unique parameter in the prediction of malignancy. Usually, a combination of a patient's medical history, clinical examination results, imaging data and tumour marker profile is used to differentiate malignant ovarian masses from their benign counterparts. Ultrasound has an important role in differentiating between benign and malignant adnexal masses, but experience and proper training are of paramount importance in distinguishing both adnexal masses (Van Holsbeke et al, 2009). This highlights a major problem in that the centre with the least experience in dealing with malignant disease requires substantial experience in ultrasound to triage patients to a gynaecological oncologist. This explains the tremendous amount of effort that has been expended over the past few decades to find new ovarian cancer biomarkers that could be used together with, or instead of, CA125. In 1999, the human epididymis secretory protein 4 (HE4) gene was found to be overexpressed in ovarian cancer (Schummer et al, 1999). It is a member of the Wey acidic protein gene family (Bouchard et al, 2006), and is expressed in normal tissues of the reproductive and respiratory tract (Bingle et al, 2002;Galgano et al, 2006). The first report mentioning HE4 as a potential serum biomarker for ovarian cancer was published in 2003 (Hellstrom et al, 2003). Recently, Moore et al (2008bMoore et al ( , 2009) published a series of papers that used a combination of CA125, HE4 and menopausal status to predict the presence of a malignant ovarian tumour. Originally, nine potential biomarkers were evaluated, of which HE4 was the most effective in detecting ovarian cancer. When CA125 was combined with HE4, the prediction rate was higher, showing a sensitivity for detecting malignant disease of 76.4% at a specificity of 95% (Moore et al, 2008b). Subsequently, Moore et al (2009) performed a multicentre prospective study including 531 women diagnosed with a pelvic mass who underwent surgery. Patients were classified as being at a high or low risk for ovarian cancer with a specificity of 75.0% and a sensitivity of 92.3% for post-menopausal patients, and a specificity and sensitivity of 74.8 and 76.5%, respectively, for pre-menopausal patients.
In this study, we aimed to independently validate HE4 and the combination of HE4 with CA125 using the Risk of Ovarian Malignancy Algorithm (ROMA) for the diagnosis of ovarian cancer.

Patients
From August 2005 to March 2009, 389 patients were included in a prospective study conducted at the University Hospitals Leuven. All patients were diagnosed with a pelvic mass of suspected ovarian origin and were scheduled for surgical intervention. Women with a previous bilateral oophorectomy were not eligible. All patients underwent imaging by pelvic ultrasound to document the presence of an ovarian mass. Clinical information was retrieved from the patients' hospital notes. All patients underwent surgical removal of the ovarian mass, and if a patient was diagnosed with an ovarian cancer, then surgical staging was performed.
Before the collection of biological samples and surgery, all patients were required to give fully informed consent. The protocol was approved by the Local Ethics Committee. The Ethical Committee released the authors from the obligation to obtain an insurance contract because of the character of this study. Patient participation in the study was concluded once the final surgical pathology reports were obtained.

Serum samples
Immediately before surgery, blood samples were obtained. Blood samples were collected in 10 ml clothing activating tubes (BD Vacutainer Serum Tube,ref. 369033;Erembodegem,Belgium). Serum tubes were centrifuged at 800 g for 10 min. Serum was collected, dispensed into multiple cryotubes and frozen at À80 1C. The time between blood sampling and freezing of the serum and presence of haemolysis was noted. The targeted time limit between sampling and freezing was 4 h.

Marker assays
Serum CA125 concentrations were measured using the CanAg CA125 EIA assay (Fujirebio Diagnostics, Göteborg, Sweden) and serum HE4 concentrations were measured using the HE4 EIA assay (Fujirebio Diagnostics). Both assays are solid-phase, non-competitive immunoassays, based on the direct sandwich technique, and were run according to manufacturer's instructions. Each ELISA was performed manually and in duplicate for calibrators, controls and patient samples. The appropriate controls were within the ranges provided by the manufacturer for all runs. For CA125, the normal upper limit was 35 U ml À1 , whereas that for HE4 was 70 pM (as suggested by Moore et al (2008 b) or 150 pM (as suggested in the product insert). A cutoff point that provided the best accuracy (minimal false-negative and falsepositive results) in the study was also determined. We also determined our own ideal cutoff, corresponding to the highest accuracy.

Statistical analysis
ROMA classifies patients as being at a low or at a high risk for malignant disease using the following algorithms: According to the manufacturer's insert, the following thresholds were selected for ROMA: Pre-menopausal women: * PP X12.5% ¼ high risk of finding EOC * PP o12.5% ¼ low risk of finding EOC Post-menopausal women: * PP X14.4% ¼ high risk of finding EOC * PP o14.4% ¼ low risk of finding EOC Statistical analysis was performed with MedCalc v11.1.1.0 (MedCalc Software, Mariakerke, Belgium) and with PASW Statistics v17.0 (SPSS, Brussels, Belgium). The mean age of the patients was compared using Student's t-test, and categorical variables were compared with the w 2 -test. Tumour marker levels were compared using the Wilcoxon rank-sum test (Mann -Whitney two sample statistic) or the Kruskal -Wallis rank test (multiple sample statistic).
Receiver operator characteristic (ROC) curves were constructed, and the area under the curve (ROC -AUC) with a 95% confidence interval was calculated. Sensitivity and specificity were calculated in pre-and post-menopausal women separately and independently of menopausal status. Subgroups were analysed according to the histological subtype of the tumour, stage, grade, use of hormonal drugs, smoking habit, familial history, presence of haemolysis and the time between sampling and freezing. The method described by DeLong et al (1988) was used for the calculation of the difference between two ROC -AUCs. For all statistical comparisons, a P-value of o0.05 was considered statistically significant.

Patient characteristics
The serum of 389 patients was analysed: 228 (58.6%) patients had benign disease and 161 (41.4%) patients had malignant disease (Table 1). Patients with benign disease were younger and more likely to be pre-menopausal. Patients with malignant disease were more likely to have a family history of breast and ovarian cancer.

Sample characteristics
Overall, 40 samples were not frozen within a time limit of 4 h after sampling, of which 24 (10.5%) were from benign cases and 16 (9.9%) were from malignant cases (P ¼ 0.851). Haemolysis was noted in 38 cases, of which 23 (10.1%) were from benign cases and 15 (9.3%) were from malignant cases (P ¼ 0.801).

Tumour characteristics
The most common benign ovarian tumours were cystadenomas (n ¼ 52), cystadenofibromas (n ¼ 26), endometriomas (n ¼ 66) and mature teratomas (n ¼ 29) (Tables 2 and 3). Mixed tumours (n ¼ 16) contain two or more different histological subtypes, making it impossible to categorise these tumours into a specific subtype. The cystadenomas and cystadenofibromas included 47 serous, 26 mucinous and 5 other histological types or mixed cystadenomas/cystadenofibromas. The majority of the malignant tumours were EOC. Most of the EOC were of high grade and were diagnosed at an advanced stage. Other primary non-epithelial ovarian tumours (NEOCs) included two sex cord stromal tumours and two sarcomas. All other malignant tumours of the ovary (n ¼ 26) were metastases from extra-ovarian primary tumours. These tumours were mainly of an endometrial or gastrointestinal origin.

Tumour marker levels
The median CA125, HE4 and ROMA serum levels differed significantly between benign and malignant cases for the whole group, and for the pre-and post-menopausal groups separately (Po0.0001 for all comparisons) ( Table 4, Figures 1 and 2). Within the benign group, the most frequent tumours were analysed. Using Kruskal -Wallis rank test, we found the median tumour marker levels to be statistically different for CA125 (Po0.0001), HE4 (P ¼ 0.0043) and ROMA (P ¼ 0.0006). Post hoc pairwise comparisons with the Wilcoxon rank-sum test showed that CA125 was significantly elevated in endometriosis and ovarian fibromas/thecomas compared with cystadenomas/cystadenofibromas (Po0.0001 and P ¼ 0.0111), functional cysts (P ¼ 0.0160 and P ¼ 0.0281) and mature teratomas (P ¼ 0.0002 and P ¼ 0.0169). For HE4, the only significant comparison found was the pairwise comparison between cystadenomas/cystadenofibromas and endometriosis (P ¼ 0.0002). Risk of Ovarian Malignancy Algorithm was significantly elevated in cystadenomas/cystadenofibromas (Po0.0001) and ovarian fibromas/thecomas (P ¼ 0.0111) when compared with endometriosis, and in cystadenomas/cystadenofibromas when compared with mature teratomas (P ¼ 0.0349).
Within the group of malignant tumours, there was no significant difference between the CA125, HE4 and ROMA levels of EOC and metastatic tumours. There was no significant difference between FIGO stages I and II tumours, nor between FIGO stages III and IV tumours, although the difference between early (FIGO I -II) and advanced stages (FIGO III -IV) was significant for CA125, HE4 and ROMA. There was a significant difference between borderline and invasive disease (grades 1 -3) for all markers, but there was no difference among grades 1, 2 and 3 for the different markers.

ROC curves and performance indices for all tumours
The ROC -AUC of CA125 was not significantly different from that of HE4 or ROMA for all malignant diseases (including EOC, NEOC and metastases) compared with benign disease (Table 5, Figure 3). Pairwise comparison of ROC -AUCs showed that only the difference between HE4 and ROMA was significant. For premenopausal patients, again, only the pairwise comparison between HE4 and ROMA was significant. In the post-menopausal population, there was a significantly better performance of CA125 vs HE4, and of ROMA vs HE4. Overall, ROMA did not perform significantly better than CA125 alone, either for the whole group of patients or for the pre-or post-menopausal patients separately.
At the ideal cutoff, corresponding to the highest accuracy (minimal false-negative and false-positive results), CA125, HE4 and ROMA resulted in a similar sensitivity and specificity within the different menopausal groups. Sensitivity and specificity using   Table 5, together with the sensitivity and specificity of HE4 at a cutoff of 70 pM.

ROC curves for subgroups
When EOC was analysed alone and NEOC and metastatic tumours were excluded, the ROC -AUC of ROMA was higher (Figure 4, Supplementary Table 1). The ROC -AUC was even higher when all borderline tumours were excluded. With regard to histological subtypes, a comparison of ROC -AUC of (pure) serous with that of non-serous EOC (excluding all mixed serous tumours) showed that ROC -AUC of ROMA was higher for the serous subtype. In contrast, all markers performed significantly worse when only the mucinous subtype was examined.

DISCUSSION
This study aimed to investigate the performance of serum tumour markers CA125 and HE4, and the risk stratification tool ROMA in a prospective collection of serum samples from patients with an ovarian mass. We found that there was a significant difference between benign and malignant disease with respect to serum CA125, HE4 and ROMA levels. When the ROC -AUCs of the different tumour markers were compared, HE4 and CA125 performed similarly, except for the post-menopausal patients in whom CA125 performed better. This similar performance of HE4 and CA125 was also noted in other studies (Hellstrom et al, 2003;Scholler et al, 2006;Palmer et al, 2008;Montagnana et al, 2009;Andersen et al, 2010). Combining HE4 and CA125 in the ROMA improved HE4 but not CA125 performance, regardless of menopausal status. As CA125 is the current standard for comparison, this means that neither HE4 nor the ROMA improved the diagnosis of ovarian cancer. This is in contrast to the results of Moore et al (2008 b), who found that a combination of CA125 and HE4 performed better than CA125 alone. However, Moore et al (2008 b) excluded all borderline tumours, NEOC and metastatic tumours to calculate the performance of the tumour markers they tested. We decided not to exclude these tumours in our initial analysis as we wanted to study a patient population that reflected a normal clinical setting. When borderline tumours, NEOC and metastatic cancers were excluded, the ROC -AUC for CA125 was 0.937 vs 0.836 in the study by Moore et al (2008 b). In contrast, the ROC -AUCs of HE4 were similar: 0.914 (our data) vs 0.908 (Moore et al, 2008 b). In a more recent study, Moore et al (2010) also included borderline tumours in their analysis. Within this study, the examination of benign cases vs all stages of EOC and borderline tumours revealed an ROC -AUC of 0.913. Within a setting of a multicentre prospective trial with central review and monitoring it seems plausible that a diagnostic test would perform slightly better.
Compared with CA125, HE4 is inversely influenced by age; whereas CA125 is higher in healthy pre-menopausal patients (Bon et al, 1996;Bonfrer et al, 1997), HE4 tends to be higher in postmenopausal patients (Moore et al, 2008(Moore et al, a, 2008Andersen et al, 2010). These slightly higher normal values influence the performance of the tumour markers concerned. Although not significant, this can also be seen in our study population: the ROC -AUC of CA125 was higher in the post-menopausal group. Of particular interest, HE4 seems to have a slightly higher ROC -AUC in the premenopausal group than in the post-menopausal group. Although this difference is not significant, it causes the ROC curves of CA125 and HE4 to come together in the pre-menopausal group and diverge in the post-menopausal group (Figure 3). In other words, the performance of HE4 is similar to that of CA125 in the premenopausal group, but significantly worse in the post-menopausal group. This increased performance of HE4 in the pre-menopausal group is in agreement with previous studies (Moore et al, 2008 b;Andersen et al, 2010), and confirms that CA125 and HE4 function independently of each other.
Owing to the fact that ROC curves are not used in clinical practice, we aimed to find the cutoff points for the different tumour markers. The cutoff values corresponding to the highest accuracy (minimal false-negative and false-positive results) for all patients were 62.5 kU l À1 for CA125, 72.2 pM for HE4 and 22.2% for ROMA. In the product insert, it is suggested that 94.4% of the healthy female subjects (n ¼ 179) that were studied had a HE4 value of 150 pM or below. If we define the reference value as the value that includes 95% of healthy controls, and we use this as a cutoff point to minimise the false-positive rate, we obtain a sensitivity of 50.3% and a specificity of 96.5%. In clinical practice, this means that 3.5% of patients with a benign tumour will be treated as if they had a malignant tumour (overtreatment), and 49.7% of patients with a malignant tumour will be treated as if they had a benign tumour (undertreatment). Therefore, in our study, this cutoff point is not useful for differentiating benign from malignant cysts. Andersen et al (2010) also determined their cutoff at the 95th percentile in a healthy control group. On the basis of this cutoff, they obtained a sensitivity of 77.0% and a specificity of 94.9%. Unfortunately, they failed to mention what their cutoff value was. Using the cutoff point of 70 pM, as previously suggested by Moore et al (2008 b), we reached a sensitivity of 74.5% and a specificity of 83.3%. This is therefore comparable to our ideal cutoff point of 72.2 pM, and is thus a reasonable cutoff point for HE4. With regard to ROMA, different cutoff points are used in pre-menopausal and post-menopausal patients. Both cutoff points are determined to provide a specificity level of 75% for the CA125 plus HE4 assay combination. Our ideal cutoff points of 16.6% for the pre-menopausal patients and 35.9% for the post-menopausal patients were somewhat different from those suggested previously. However, these were not established at 75% specificity, but at the point on the ROC curve at which we had minimal false-negative and false-positive results. Irrespective of whether we analyse only invasive EOC, our ideal cutoff point in the pre-menopausal and post-menopausal category is higher than the suggested cutoff points of 12.5 and 14.4%, respectively. As expected, histological subtypes seem to be important for the performance of the different tumour markers. With regard to benign tumours, it was interesting to see that the fibromas/ thecomas group and the endometriomas had the highest levels of CA125, whereas for HE4, the endometriomas had the lowest level. As already mentioned by Huhtinen et al (2009), measuring both CA125 and HE4 together could be of particular interest in differentiating endometriosis from ovarian cancer, as ovarian cancer will cause a raised CA125 and HE4, whereas endometriosis will only cause a raised CA125. This could explain why HE4 performs better in pre-menopausal patients compared with postmenopausal patients, and vice versa for CA125. However, even for the pre-menopausal patients, HE4 and ROMA did not perform better than CA125. All malignant tumours expressed high levels of CA125 and HE4, but the highest levels were noted for the serous subtype. High expression levels of HE4 for the different epithelial subtypes, with the exception of the mucinous subtype, were already noticed in previous studies (Lu et al, 2004;Drapkin et al, 2005;Gilks et al, 2005;Galgano et al, 2006).
Although CA125, HE4 and ROMA are not currently recommended as a screening tool, it is interesting to see how well a tumour marker performs in the early stage of disease. A definite  trend could be seen from stage I to stage IV disease, and CA125 and HE4 performed significantly worse when early and late stages of disease were compared. As a consequence, the ROMA also performed worse. With these ROC-AUCs, the chances that HE4 or ROMA will be successful as a screening marker are low, as very high specificities are required in screening for low prevalent disease.
In summary, this large independent validation study was able to demonstrate similar performance indices as those recently published in the literature. However, in our study, neither HE4 nor ROMA increased the detection of malignant disease. Human Epididymis secretory protein 4, or its combination with CA125, could be useful in diagnosing certain benign or malignant subtypes; however, this needs to be explored in more detail.