Introduction

Ovarian cancer accounts for the eighth most common cause of deaths globally1. Epithelial ovarian cancer (EOC) is one of the most common malignancies affecting perimenopausal women2. In India, it is the third leading cause of cancer in females, with a 5-year cumulative estimate of 80,422 cases1.

The lack of specific symptoms, effective screening and diagnostic techniques makes ovarian cancer challenging to diagnose in the early stages. Therefore, late-stage diagnosis with metastasis results in a low survival rate. However, the survival rate increases to 92% when the tumor is diagnosed when localised3. Therefore, ovarian cancer research's primary goal is to develop a practical, valid screening test to detect the disease at an early stage. The most widely used screening tests are serum cancer antigen (CA125) estimation and Transvaginal sonography4, which are non-specific.CA125 is an antigenic determinant recognised by a monoclonal antibody raised as an immunogen using ovarian cell line5. Though serum CA125 was initially reported in patients with ovarian cancer, higher levels were also seen during menstruation, early pregnancy, and endometriosis5. It is also less sensitive as it is raised in only 80% of all epithelial ovarian cancers (EOC) and only 50% of stage I EOC6. Another biomarker that is currently available in the screening and diagnosing EOC is serum Human epididymis secretory protein 4 (HE4), which was approved by the food and drug administration (FDA) in 20087. It is a member of the Wey acidic protein family8 and is expressed in healthy tissues of the reproductive and respiratory tract. It is postulated that HE4 is better than CA125 in diagnosing patients with EOC due to high specificity. However, HE4 increases with age, smoking and renal diseases9. There are several methods to estimate serum CA125 and serum HE4, such as Enzyme-linked immunosorbent assay (ELISA), Radioimmunoassay (RIA), Chemiluminiscence immunoassay (CLIA) and Electrochemiluminesce immunoassay (ECLIA). CLIA and ECLIA have more sensitivity compared to other tests10.

Risk of Ovarian Malignancy Algorithm (ROMA) is a qualitative algorithm that includes the results of serum HE4, serum CA 125 and menopausal status into its value11. ROMA's clinical utility to assess the risk of EOC in patients with pelvic mass has been evaluated. In 2008, Moore et al. suggested that the combination of CA125 and HE4 had the highest sensitivity compared to the single marker12. In the year 2011, Montagnana et al. concluded that ROMA had superior diagnostic performance in estimating the risk of EOC in premenopausal women13. But, ROMA's diagnostic performance compared to CA125 and HE4 is still controversial as the individual studies are affected by limited sample size and random fluctuations. Thus, our study aimed to analyse ROMA's performance by conducting a meta-analysis of the studies using high sensitivity immunoassays like CLIA and ECLIA as the index test. The included studies had CLIA/ECLIA as the index test, which reduced bias due to studies with different analytical methods. This would help in elucidating the performance of ROMA in diagnosing EOC in comparison to other serum markers.

Materials and methods

Inclusion and exclusion criteria

We included the studies if (1) CLIA or ECLIA estimated the values of serum HE4 and serum CA125; (2) studies investigated serum HE4 or CA125 or calculated the ROMA for the EOC diagnosis; (3) blood samples were collected before initiation of anti-tumour therapy; (4) studies used the histopathological diagnosis as the gold standard for assessing EOC; (5) enough data could be extracted for the fourfold table. We excluded the studies if (1) control group contained healthy individuals and borderline cases (2) analytical method used was ELISA (3) studies showed prognostic or post-chemotherapy changes in the marker (4) language of the abstract or full paper was in any language except English, and (5) Studies, which calculated sensitivity at fixed specificity using Receiver operator characteristics curve (ROC).

Data extraction

We extracted data from the selected studies such as author, publication year, country, study design, detection methods, number of patients, sensitivity, specificity and cutoff value. We chose the one that offered the best test performance for the study with more than one cutoff value. For example, in several detection methods used in one study, we chose the results of the most sensitive method, CLIA.

Index tests and reference standard

We calculated the ROMA index using the following formulae.

The premenopausal calculation formula of the ROMA index was: \(12+\left(2.38*\mathrm{ln}\left(\mathrm{HE}4\right)\right)+(0.062*\mathrm{ln}\left(\mathrm{CA}125\right))\)

The postmenopausal calculation formula of the ROMA index was: 8.09 + (1.04*ln(HE4)) + (0.732*ln(CA125)).

Since different Index tests have different sensitivity, we included only CLIA and ECLIA as the index test to reduce the bias because of pooling studies. We considered the result of the histopathological diagnosis as the reference standard. The criteria from FIGO was taken as the reference for surgical staging. PRISMA flow diagram describes the number of studies screened and included for meta-analysis.

Methodological quality assessment

Using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2), two authors (AS and PA) independently assessed the methodological quality of the included studies and extracted data using Rev Manager software. In case of conflict between the authors, the discussion helped in reaching a uniform conclusion.

Statistical analysis

We developed a database on individual study details using a Microsoft Excel spreadsheet and subjected it to statistical analysis using Meta-Disc software version 1.4, a freely available open-source programme. Adding a value of 0.5 could avoid zero value in the study results, and a two-stage analysis was performed. We calculated summary statistics such as sensitivity, specificity, positive and negative likelihood ratios (LR) for each study in the first stage. In the second stage, we estimated overall test accuracy indexes as the weighted average of the summary statistics. Since summary measures (sensitivity and specificity, or LR+ and LR) are paired and often inter-related, the apt measure useful in meta-analysis is the diagnostic odds ratio (DOR). The DOR suggests how much higher the odds of having the disease are for the people with a positive test result than those with a negative test result. For each summary measure, we calculated 95% exact confidence limits. We examined the degree of variability (heterogeneity) among the study results by plotting the summary measures on a forest plot. The sampling error variability is likely to be high in a meta-analysis for studies with different sample sizes. Other sources of variation could be due to the study subjects' characteristics, mode of treatment, and design quality. Therefore, assessing the heterogeneity in meta-analysis is crucial because the presence versus the absence of true heterogeneity (between studies variability) can affect the statistical model. Therefore, we calculated inconsistency (I2) statistics in percentage values along with Tau-square, which suggests the degree of heterogeneity. Value of an I2-statistic higher than 50% considered to have significant heterogeneity. We used a random-effect model for calculations because of variation evidence. Since the studies included might have used directly or implicitly different thresholds/cutoff values to define positive and negative results, these cutoff values are likely to be an essential source of variation in DOR. We carried out receiver operating characteristics (ROC) analysis to see if a threshold effect exists. If the points in the ROC plot show a curvilinear pattern, there is evidence of threshold effect.

Further, we tested the threshold effect based on the Spearman correlation coefficient between sensitivity (true positive rate) and false positive rate (FPR). A significant inverse correlation could confirm the existence of the threshold effect. Combining study results in these cases involves fitting a ROC curve rather than pooling sensitivities and specificities or likelihood ratios. While establishing homogeneity, likelihood ratios and diagnostic odds ratios could be pooled by the Mantel–Haenszel method (fixed effects model); in case of heterogeneity, we used the DerSimonian Laird method (random-effects model) to incorporate variation among studies. The DOR’s or LR’s were averaged with the Mantel–Haenszel method, whereas, by adopting the DerSimonian Laird method, we obtained the average value of logs of DOR or LR’s. Publication bias between the studies was tested by carrying out a regression analysis of ln(DOR) against the inverse of the square root of effective sample size (1/sqrt(ESS)), with p < 0.05 for slope coefficient suggesting significant asymmetry. We carried out all these analyses following the guidelines given by an earlier publication14.

Meta-regression

In case of great variation detected from the analysis detailed above, we explored reasons for such variations by relating covariates such as the age of subjects and cutoff values. To study the sources of variations in the studies, we used the Moses–Shapiro–Littenberg method. We considered the variable ln(DOR) as the dependent variable and the variables such as age, cutoff value and threshold effect as covariates. The antilogarithm transformations of the resulting estimated parameters were a relative DOR (RDOR) of the corresponding covariate, which suggests the change in the test understudy's diagnostic performance for each unit increase in the covariate. We decided the statistical significance of the coefficient based on a p value < 0.05.

Results

Baseline characteristics of the study population

We searched the studies that were published from January 2011 to August 2020. PRISMA flow-diagram (Fig. 1) shows the stages of the studies screened and included for the analysis, and 32 studies15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46 met the inclusion criteria. The number of studies included for HE4, CA125 and ROMA tests were 2516,17,18,19,22,23,24,25,26,27,28,29,30,31,32,34,35,37,38,39,40,41,42,43, 2616,17,18,19,20,22,23,24,25,26,27,28,29,30,31,32,33,34,35,37,38,39,40,41,43 and 2215,16,17,18,20,21,26,27,28,29,31,34,35,36,37,38,39,40,41,42,43,46, respectively. The patients with EOC were taken as cases, and women with benign ovarian mass were taken as control, which was 2233/5682, 2315/5875 and 2281/5068, respectively, for the markers or algorithm (Table 1). The average age of cases and controls were 54 and 43 years, respectively. Stage-specific distribution showed that about 31% of the EOC cases were with stage III, followed by stage IV (~ 14%). Similarly, about 35% EOC cases were serous type followed by endometroid (~ 11%). We assessed each study's quality on six domains and have shown the overall risk of biases in the QUADAS-2 graph (Fig. 2). About 75% of the studies included showed a low risk of bias for selecting the reference standard.

Figure 1
figure 1

PRISMA Flow-diagram.

Table 1 Characteristic of patients undergone different diagnostic tests.
Figure 2
figure 2

Quality of publications (QUADAS-2) on six domains prepared by using Revman 5.3

Comparison of diagnostic measures

Table 2 shows the various diagnostic tests calculated for the three markers or algorithm and pooled estimates of each measure. While comparing the sensitivity measures (Figs. 3A–D), the overall sensitivity for HE4 was 0.73 (95% CI 0.71–0.75), significantly less than CA125 and ROMA. In contrast to the sensitivity measure, the overall specificity for HE4 (0.90; 95% CI 0.89–0.91) was significantly higher than the other two markers (Fig. 4A–D). While comparing the diagnostic measures between premenopausal and postmenopausal women using ROMA, we noted that both the sensitivity and specificity levels were more than 0.80 among postmenopausal women and were significantly higher than those measured among premenopausal women.

Table 2 Comparison of diagnostic measures by three serum markers.
Figure 3
figure 3

Comparison of Sensitivity estimates of HE4 (A), CA125 (B), ROMA for pre-menopause (C) and ROMA for post-menopause (D).

Figure 4
figure 4

Comparison of Specificity estimates of HE4 (A), CA125 (B), ROMA for pre-menopause (C) and ROMA for post-menopause (D).

The likelihood ratio is another way of looking at the reliability of the test. LR expression has two forms. LR + (the probability of a person who has the disease testing positive divided by the probability of a person who does not have the disease testing positive) and LR- (the probability of a person who has the disease testing negative divided by the probability of a person who does not have the disease testing negative). Comparison of pooled LR + values obtained using a random-effect model among the markers (Table 2) revealed that for the HE4 maker, LR + was significantly higher than that of CA125 and ROMA. Using HE4, the likelihood of a woman diagnosed with EOC is about 11 times, given the woman test positive compared to a false positive woman. Further, LR + for HE4 is more than the recommended level (5.0), which was observed in the majority of the studies (84%) (Fig. 5A). Pooled LR + and it's 95% confidence limits (CI) for HE4, CA125 and ROMA were 10.61 (7.28–15.45), 2.91 (2.47–3.42) and 4.66 (3.41–6.35), respectively (Table 2). Only 11% of the studies with CA125 (Fig. 5B) showed LR + more than 5.0. Using ROMA marker, about 46% of the studies with premenopausal women (Fig. 5C) and 65% of the studies with postmenopausal women (Fig. 5D) showed LR + ve more than 5.0. Pooled LR-of all the three markers or algorithm were in the range between 0.18 and 0.27 (Table 2), implying that all the three were behaving similarly for ruling out the disease as evident by overlapping CI of the three markers or algorithm. Figure 6A–D shows the forest plots of LR- values using the random effect model for each marker.

Figure 5
figure 5

Comparison of positive likelihood ratio estimates (using Random effect model) of HE4 (A), CA125 (B), ROMA for pre-menopause (C) and ROMA for post-menopause (D).

Figure 6
figure 6

Comparison of negative likelihood ratio estimates (using Random effect model) of HE4 (A), CA125 (B), ROMA for pre-menopause (C) and ROMA for post-menopause (D).

As accuracy estimates are paired and often interrelated (sensitivity and specificity, or LR positive and LR negative), it is necessary to report these simultaneously with a single measure. One accuracy measure that combines these paired measures is the diagnostic odds ratio (DOR). Individual study wise and pooled DOR with 95% CI for each marker are shown in Fig. 7A–D. Using the random effect model, pooled DOR for HE4 was 41.03 (95% CI 27.96–60.21), which was markedly higher compared to DOR of CA125 (13.44) and ROMA (27.48). Similarly, areas under the curve (AUC) using symmetrical ROC (SROC) analysis were 0.91, 0.86 and 0.91 for HE4, CA125 and ROMA, respectively (Table 2).

Figure 7
figure 7

Comparison of diagnostic odds ratio estimates (using Random effect model) of HE4 (A), CA125 (B), ROMA for pre-menopause (C) and ROMA for post-menopause (D).

Test of heterogeneity for the markers or algorithm showed that the values of test statistics tau-square or I2-statistics were more significant than the allowable limit (50%), suggesting heterogeneity between the studies. Therefore, diagnostics measures obtained by the random effect model were retained. Spearman rank correlation coefficient between sensitivity and FPR showed that only for HE4, the correlation coefficient was significant (r = 0.49; p = 0.013) (Table 2). Regression analysis of diagnostic threshold also revealed that slope was significant (p < 0.05) for HE4 (Table 3). These observations showed that there was no evidence of a strong threshold effect on DOR. Further, the ROC curve (Fig. 8A–D) showed no clear evidence of a curve-linear pattern for any of the markers confirming that DOR measures were independent of the threshold effect.

Table 3 Results of meta regression analysis to assess causes of heterogeneity.
Figure 8
figure 8

Comparison of Area Under Curve (AUC) by symmetric ROC analysis for HE4 (A), CA125 (B), ROMA for pre-menopause (C) and ROMA for post-menopause (D).

Meta-regression analysis to assess the source of heterogeneity

Since there was heterogeneity between the studies for the markers, we carried out meta-regression analysis to evaluate causes responsible for heterogeneity. Table 3 shows the results of the meta-regression analysis. The regression coefficient for the cutoff values of HE4 was statistically significant, showing the influence of cutoff value for the noted heterogeneity. However, neither age nor cutoff value was found to be statistically significant for CA125 and ROMA markers. Both threshold analysis and meta-regression analysis revealed the significant importance of cutoff value for HE4.

Publication bias assessment

We assessed publication bias by carrying out a regression analysis of ln(DOR) against the inverse of the sample size's square root, and Table 4 shows the results. Slope coefficients for all the markers were not significant at p < 0.10, which is the probability level suggestive of significant publication bias. Therefore, there was no evidence of publication bias for any of the markers or algorithm.

Table 4 Statistical tests for publication bias by regression analysis of ln(DOR) against inverse of square root of effective sample size.

Comparison of diagnostic measures between premenopausal and postmenopausal women

The majority of the studies reported test results separately for premenopausal and postmenopausal women for ROMA. A similar analysis for ROMA showed that the diagnostic measures such as sensitivity, specificity, LR + , DOR and SROC (Table 2) were markedly high for postmenopausal women than premenopausal women. LR was the lowest value (0.15) for postmenopausal women compared to the value (0.26) for premenopausal women. The threshold effect for DOR was not established, and heterogeneity was present for both groups. Cutoff value was not found to be an influencing factor for the observed heterogeneity. Since average age was not available separately, we could not carry out meta-regression to establish whether the age was influencing heterogeneity. Further, we could not observe publication biases for either type of study women.

Sensitivity analysis

We carried out the sensitivity analysis to confirm the consistency of the results by considering only blinded studies, which were free from biased observations. The numbers of studies blinded were 1716,17,18,23,24,25,26,27,28,29,32,35,36,37,38,39,40,41, 1916,17,18,20,23,24,25,26,27,28,29,32,33,35,36,37,38,39,40,41 and 1515,16,17,18,20,26,27,28,29,34,35,37,38,39,40 for HE4, CA125 and ROMA, respectively. The meta-analysis of the blinded studies using the random-effect model is shown in Table 5. Comparing the results presented in Tables 2 and 5 showed that all the diagnostic measures such as sensitivity, specificity and DOR obtained for blinded studies were within 95% confidence limits of each measure presented in Table 2.

Table 5 Results of sensitivity analysis based on blinded studies using Random effect model.

Further, the presence of heterogeneity was confirmed among blinded studies also. There was no significant variation in Spearman rank correlation coefficients and regression analysis of threshold effects between combined studies and blinded studies. We could make similar observations while carrying out meta-regression and SROC analysis also. These findings implied that all the diagnostic measures were consistent irrespective of blinding nature.

Discussion

Summary of diagnostic measures

Our study found ROMA as the best marker to differentiate EOC from benign ovarian masses with high diagnostic accuracy (DOR-44.04, sensitivity-0.88 and AUC-0.94) as compared to HE4 (DOR-41.03, sensitivity-0.73 and AUC-0.91) and CA125 (DOR-13.44, specificity-0.84 and AUC-0.86) in postmenopausal women. In premenopausal women, DOR, specificity and AUC for HE4 was highest, suggesting it as a promising predictor of Epithelial ovarian cancer in this group; however, its utilisation requires further exploration.

Strengths

In this meta-analysis, we evaluated a good number of studies, which adds more weight to our results. Pertinent features of our study are the approach that has been used to search the articles and the statistical methods that are used to analyse the data. Other salient features of our study are that we had only included studies that used CLIA/ECLIA for measuring CA-125 and HE4, which reduced the bias that arise due to different testing strategies and also, we used only patients with EOC as cases and Benign ovarian masses as controls.

Limitation

We did not consider unpublished and studies in foreign languages. Therefore, we could not calculate diagnostic measures for HE4 and CA125 in the premenopausal and postmenopausal group separately using the few studies. Further, we did not assess the diagnostic efficiency of these markers stratified by the stage and type of tumour because of the availability of only a few studies.

Interpretation

The first meta-analysis to compare the diagnostic efficiency of HE4 compared to CA125 was published in January 2012 by Yu et al.47. They postulated that HE4 was a better marker than CA125 for sensitivity, specificity, LR + and LR−. Their study had methodological limitation like the inclusion of healthy subjects in the control group, and more than one index test was used. The pooling of such studies can cause increased variation among the studies. Later, Li et al.48, who did a meta-analysis based on 11 studies in 2012, stated that ROMA had high diagnostic accuracy in distinguishing EOC from benign pelvic masses. They also suggested that ROMA had higher sensitivity than HE4 and more accuracy in the postmenopausal age group than the premenopausal age group, as seen in our study. In 2014, Wang et al.49 also inferred similar findings and postulated that HE4 had more specificity than CA125 and could be of diagnostic importance, especially in the premenopausal group. A few months later, Macedo et al.50 published a meta-analysis and postulated that HE4 is a useful predictor of benign or malignant pelvic masses with AUC (0.91) similar to that seen in our study for premenopausal women. Substantiating our findings, a meta-analysis conducted by Liu et al. that included 17 studies also showed HE4 as a promising biomarker for endometrial cancer with high specificity, DOR and AUC51. A study conducted by Niveditha et al. proposed that preoperative low HE4 and biopsy has a high predictive value than preoperative MRI and intraoperative biopsy combined52. In another systematic review conducted by Degez et al., it was found that serum HE4 was a crucial parameter in assessing diagnosis, prognosis and survival of endometrial cancer53. Later in 2016, Dayyani et al.54 advanced that ROMA had higher diagnostic performance than other single marker assays based on the basis of AUC and would thus improve clinical decision making.

Ovarian cancer is the fifth most common cause for cancer deaths among women. It is very important to differentiate malignant and benign ovarian mass at an early stage as the risk of malignancy increases with age55. CA 125 has low sensitivity in early stages and also falsely elevated in non-malignant conditions. Whereas HE4 has higher specificity when compared to CA 125, but it is also increased in smokers and women who are on oral contraceptives56,57. On the other hand, ROMA has better diagnostic performance when compared to single marker assays and hence can be the diagnostic tool in EOC for postmenopausal women. In premenopausal women, HE4 is a better marker to differentiate malignant from benign ovarian masses.

Conclusion

The study infers that ROMA is superior to HE4 and CA125in the postmenopausal group in distinguishing EOC from benign ovarian tumors. HE4 appears to be superior to CA125with diagnostic accuracy and prediction of EOC in the premenopausal group.