Introduction

Unresectable hepatocellular carcinoma (uHCC) is generally associated with a poor prognosis1. Historically, treatment options for advanced HCC patients, particularly those with metastatic and unresectable disease were limited. However, the landscape of treatment for advanced HCC has significantly transformed in recent years with the introduction of immunotherapy2,3. Immunotherapy works by activating the patient's own immune system to fight against tumors. By targeting immune checkpoints, immunotherapy helps to unleash the inhibitory effects that prevent immune cells from effectively recognizing and attacking tumor cells4,5. As a result, modern immunotherapy based on immune checkpoint inhibitors (ICIs) has emerged as a promising first-line treatment approach for unresectable HCC, either as monotherapy or in combination with other anticancer agents6.

According to both the Food and Drug Administration (FDA) and the European Medicines Agency (EMA), overall survival (OS) is considered remains the gold standard for demonstrating the clinical benefit of new anticancer therapies7. However, the interpretability of OS can be influenced by several factors, including prolonged postprogression follow-up time, treatment crossover, and subsequent anticancer therapies8. Therefore, surrogate endpoints such as progression-free survival, time to progression, duration of response, and objective response rate are being investigated and used in oncology studies9,10. The use of surrogate endpoints offers several advantages over OS11,12,13 . Firstly, it accelerates the research process, expediting the approval and clinical trial processes, resulting in faster completion of clinical trials and reducing the time required for research and development. Secondly, it provides earlier access to new therapies. Through the accelerated process, patients, especially those in urgent need of medical intervention, can gain access to new treatment options at an earlier stage. This allows patients to receive potentially life-saving or life-improving treatments sooner and minimizes any delays in receiving effective therapies. Thirdly, it enhances efficiency and reduces costs. Faster trials enable us to swiftly evaluate clinical pharmacology and mechanisms of action, providing valuable insights for both researchers and physicians. This not only facilitates more efficient screening of potential drug candidates, but also brings cost advantages to pharmaceutical manufacturers or sponsors. Such expedited approval and clinical trial processes allow them to allocate resources more effectively, potentially alleviating the financial burdens associated with drug development.

To date, only 20% of the subsequent clinical trials for 93 cancer drug indications with accelerated FDA approval have shown improved overall survival in patients14. Similarly, a review by EMA and NICE found that out of 52 drugs, 43 (82.7%) lacked overall survival data initially, although 9 drugs (17.3%) later demonstrated improved overall survival15. In the context of hepatocellular carcinoma (HCC), nivolumab received accelerated approval based on data from the CheckMate 040 trial16. This phase I/II clinical study, which was multicenter, prospective, uncontrolled, and open-label, showed that nivolumab treatment in advanced HCC patients was safe, manageable, and improved objective response rates. However, the subsequent clinical trial CheckMate 459, an adaptive-designed, randomized, open-label trial, failed to confirm the survival benefit of nivolumab as a first-line therapy for advanced HCC patients who had not previously received systemic therapy17. As a result, the FDA revoked the indication for nivolumab as a first-line therapy for HCC in 2020. In addition, pembrolizumab obtained accelerated approval based on data from the KEYNOTE-224 trial. This phase II prospective, single-arm, open-label study evaluated the safety and efficacy of pembrolizumab in patients with advanced HCC who had previously been treated with sorafenib18. Currently, the approval status of pembrolizumab remains valid without being withdrawn. However, second-line pembrolizumab showed a large difference in improving overall survival (OS) between Asian and non-Asian populations19,20. Furthermore, multiple ongoing trials aim to deepen our understanding of the molecular pathogenesis of uHCC and explore new therapeutic approaches for this condition21. As a result, the selection of alternative indicators for uHCC becomes particularly important.

The aim of this study was to assess whether the therapeutic effect on surrogate endpoints at the trial level can predict the therapeutic effect on OS. A secondary objective was to explore heterogeneity in trial-level correlations based on specific trial characteristics. To this end, we conducted a systematic literature search and trial-based meta-analysis of experimental ICI therapy in patients with uHCC.

Materials and methods

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement was used for this systematic review22.The meta-analysis protocol was submitted to PROSPERO: CRD42023433976.

Search for studies

In our search for relevant trials, we will utilize a comprehensive approach. We will explore multiple databases, including the Cochrane Central Register of Controlled Trials (CENTRAL, Ovid), MEDLINE (PubMed), and EMBASE (Ovid) (refer to eTable 1 for the search strategy). Furthermore, we will conduct searches on online trial registries such as ClinicalTrials.gov, European Medicines Agency (EMA), and WHO International Clinical Trial Registry Platform. Additionally, in our quest for valuable information, we will search for grey literature in abstracts and posters presented at the American Society of Clinical Oncology (ASCO) annual congress via the System for Information.

It is important to note that this search strategy was performed on 9 June 2023.

Types of study to be included

For the inclusion criteria of our study, we will consider all phase I, II, and III clinical trials, as well as expanded access programs (external clinical trials). These trials and programs should involve a comparison between immune checkpoint inhibitors and placebo, no treatment, or systemic/locoregional therapies. The focus of our study is the treatment of uHCC.

Data extraction

The literature search was conducted using the literature management software Endnote X9. After removing duplicates, two researchers (LTH and CYZ) independently screened the search results in two rounds. In the first round, titles and abstracts were reviewed to exclude irrelevant literature or literature that did not meet the inclusion criteria. The full texts of the remaining studies were then read in the second round to complete the selection. In case of disagreement between the two researchers, a third and more senior researcher (XTL) was consulted to make a final judgement. Excluded studies and reasons for exclusion will be documented.

We will extract the data individually. We will extract details of the study population, interventions, and outcomes using a piloted, standardized data extraction form. This form will include the following items:

  1. (1)

    Study methods and characteristics: such as study design, country, target disease, year of publication, funding type, study registration number, inclusion and exclusion criteria, sample size, and population characteristics;

  2. (2)

    Severity of illness: Child–Pugh score, Eastern Cooperative Oncology Group Performance Status, Barcelona Clinic Liver Cancer stage, proportion of participants’ positive for hepatitis B and hepatitis C virus;

  3. (3)

    Experimental and control arms of the trial;

  4. (4)

    Outcomes:

    1. (1)

      Overall survival (OS), defined as the time from randomization until death from any cause, commonly used in phase III trials.

    2. (2)

      Time to progression of the tumor: progression-free survival (PFS), time from randomization to objective tumor progression or death.

    3. (3)

      Tumor response assessments (as recommended by the response evaluation in solid tumors criteria (RECIST1.1)23).

      Proportion of people with objective response (OR), complete response (CR), partial response (PR), progressive disease (PD), stable disease (SD), and disease control (DC).

    4. (4)

      Adverse events: Proportion of participants with one or more nonserious immune-related adverse events (irAEs) based on the National Cancer Institute Common Terminology Criteria for Adverse Events (CTCAE) and/or the Medical Dictionary for Regulatory Activities.

Statistical analysis

We firstly conducted a narrative description of the included studies. Categorical variables are reported as frequencies and percentages, and continuous variables are reported as medians with interquartile ranges (IQRs) unless indicated otherwise.

In each trial, we abstracted the hazard ratios (HRs) along with their corresponding 95% confidence intervals (CIs) and survival rates at specific time points24. In cases where HRs were not explicitly provided, we estimated them based on relevant effect measures derived from the given median survival times or survival rates at those time points. A frequentist hybrid model for random effect multivariate meta-analysis was used to evaluate surrogacy between the HRs for OS and each endpoint. A Bayesian hybrid model random-effects multivariate meta-analysis was used for sensitivity analysis. No covariates were used in the hybrid models for random effect multivariate meta-analysis. A random effects meta-regression model was used to quantify the association between the natural logarithm of the HRs for OS and each endpoint (The data analysis flowchart is shown in eFig. 1). According to the ReSEEM (Systematic Review and Recommendation for Reporting of Surrogate Endpoint Evaluation using Meta-analyses) guidelines25, R2 values ≥ 0.7 represent strong correlations (and thus suggest surrogacy), values between 0.69 and 0.5 represent moderate correlations, and values < 0.5 represent weak correlations. After implementing the meta-regression model, our next step is to calculate the surrogate threshold effect (STE), which plays a crucial role in determining the thresholds for estimating the surrogate endpoint26. In addition, we planned to investigate the impact of clinical information on the association between overall survival (OS) and surrogate measures through subgroup analyses of predefined subgroups. All statistical analyses were performed using R ver. 4.3.0 software.

Results

Characteristics and quality assessment

The systematic review process, as depicted in Fig. 1, enabled us to identify a total of 13 studies from the 14 articles reviewed. The inclusion and exclusion of full-text articles are listed in eTable 2. Among the 13 reported trials, 9 were obtained from full-text reports, for which we conducted a risk of bias assessment. It was found that the overall risk of bias was low, with the main concern being open-label studies (eFig. 2). Notably, the study IMbrave15_update2627 provided additional data during an extended follow-up period, leading to its inclusion in our analysis. Across these 13 studies, we included a total of 20 comparison subgroups, involving a cohort of 4573 patients, as detailed in Table 1.

Figure 1
figure 1

Flow diagram of study selection process. N number of patients.

Table 1 Randomised clinical trials characteristics.

Among the 13 trials included in the analysis, five (38.5%) were phase 3 clinical trials conducted to confirm the efficacy of the treatments. It is interesting to note that in three of these trials (23.1%), overall survival (OS) was defined as the only primary endpoint. Additionally, four of the included studies (30.8%) considered OS as a dual/coprimary endpoint alongside progression-free survival (PFS). Table 1 provides a clear overview of the primary endpoints used in these trials, with PFS being the most commonly used.

Upon examining the characteristics of the patients included in the studies, it was found that the majority of them had Eastern Cooperative Oncology Group performance status classified as 0, Child–Pugh scores categorized as A, and Barcelona Clinic Liver Cancer stage B–C. However, it is crucial to acknowledge the heterogeneity in the etiology of hepatocellular carcinoma (HCC) among these studies. The percentage of patients with hepatitis B virus (HBV) ranged from 24.5 to 94%, while those with hepatitis C virus (HCV) ranged from 1.3 to 31.6%.

Main analysis

Table 2 presents the correlations between OS and potential surrogate outcomes in trials of ICIs in patients with uHCC.

Table 2 Relationship between overall survival (OS) and potential surrogate outcomes.

We conducted a meta-regression analysis to examine the association between the natural logarithm of HR for OS and PFS, adjusting for estimation errors. The resulting equation was: (log HR of OS) = −0.086 + 0.524 (log HR of PFS), indicating a predicted 52.4% increase in the log HR of OS for each unit increase in the log HR of PFS. To further evaluate the relationship, we calculated the surrogate threshold effect (STE), which was found to be 0.165. Moreover, we observed a notably weak association between PFS and OS. This was reflected in an R2 value of 0.352 (95% CI: 0.000–0.967), indicating that only 35.2% of the variability in the effects on OS could be attributed to the observed effects on PFS.

In addition to examining the relationship between OS and PFS, we also explored how treatment effects on OS relate to various tumor response assessment endpoints, including OR, CR, PR, SD, DC, and PD. Our analysis revealed a generally weak association between treatment effects on OS and the majority of these response endpoints, with R2 values ranging from 0.007 to 0.225. However, in regard to CR, we observed a strong correlation between treatment effects on CR and OS. The association between the logarithm of relative risk for CR and the logarithm of hazard ratio for OS yielded an R2 value of 0.905 (95% CI: 0.728–1.000). In the regression model, the equation for the association between OS and CR was (log HR of OS) = −0.079 to 0.131 (log RR of CR). The analysis also included a regression of log HR for OS on log RR for irAEs, which showed consistently weak correlations with R2 values ranging from 0.026 to 0.351, indicating that less than one-third of the variability in survival benefit from ICI therapy can be explained by the variability in treatment effects on irAEs.

Furthermore, the sensitivity Bayesian analysis, conducted across all of the aforementioned analyses, also yielded similar results.

Subgroup and sensitivity analyses

Table 3 presents the results of our examination of the correlation between OS and PFS, OR, CR, PR, SD, DC, PD, or irAEs by stratum. We found a strong correlation between OS and PFS in phase III trials (R2: 0.851, 95% CI: 0.469–1.000). Additionally, we observed high estimated correlations between OS and SD as well as CD in phase III trials. The R2 for OS and SD was found to be 0.890 (95% CI: 0.602–1.000), indicating a strong correlation. Similarly, the R2 for OS and CD was 0.827 (95% CI: 0.391–1.000), demonstrating a significant association between these variables.

Table 3 Overview of sensitivity analyses.

In the subgroup analyses of different intervention groups, we uncovered significant correlations between OS and SD, PD, grade 3–4 irAEs, or grade 5 irAEs specifically in the ICI + IA group. The R2 observed were as follows: 0.713 (95% CI: 0.00–1.00) for OS and SD, 0.816 (95% CI: 0.166–1.00) for OS and PD, 0.766 (95% CI: 0.00–1.00) for OS and grade 3–4 irAEs, and 0.969 (95% CI: 0.00–1.00) for OS and grade 5 irAEs. These results indicate a strong relationship between these endpoints within the ICI + IA group. However, due to the insufficient sample size, we lacked adequate information to analyze the correlations in other intervention groups.

Correlation between survival benefit and risk of mortality or disease progression across time points.

We conducted an analysis to examine the correlation between the survival benefit endpoint (HR for OS) and the relative risk (RR) for both PFS and OS at different time points (6 months, 12 months, 18 months, and 24 months) to investigate the influence of time factors (Fig. 2). We found a strong association between the risk of near-term mortality (within one year) and the survival benefit, with R2 values ranging from 0.724 (at 6 months) to 0.868 (at 12 months). However, we observed a weak association between the risk of disease progression and the survival benefit, regardless of whether it was within one year or after one year, with R2 values ranging from 0.020 to 0.202.

Figure 2
figure 2

Correlation between survival benefit and risk of mortality or disease progression across time points. Each circle represents a trial, and the surface area of the circle is proportional to the number of events observed in the corresponding trial. Straight lines represent weighted regression lines.

Discussion

PFS has historically been the most commonly used alternative endpoint in phase III clinical trials because it is defined as a composite endpoint that combines progression and death28. We found that overall the surrogate relationship between the treatment effects on ICIs and the effects on PFS or OS was weak, which supports existing knowledge in this area29,30,31. However, PFS in all these trials was defined using the traditional RECIST criteria, which were developed in the era before immunotherapy. It has been reported that traditional RECIST criteria fail to properly capture the concept of disease progression with immunotherapies that have atypical response patterns32. That is failure of traditional RECIST criteria to define PFS of immunotherapies might be a reason for smaller benefits in PFS vs OS with the trials of PD-1 inhibitors. In addition, exploring the relationship between the two through study design found some evidence that the study design may help improve the strength of the PFS-OS surrogacy patterns in uHCC. This could be attributed to the characteristics of phase II clinical trials, which often involve a smaller number of patients, shorter follow-up periods, and shorter trial durations. The influence of random noise may be more pronounced, resulting in a weaker correlation between progression-free survival (PFS) and overall survival (OS) 33. On the other hand, phase III clinical trials have larger sample sizes, longer trial durations, and more diverse patient populations. These large-scale trials allow for a more accurate evaluation of treatment effects on PFS and OS, thus typically demonstrating a stronger correlation between the two endpoints.

The number of people achieving ORR is the sum of CR and PR, and the number of people achieving DCR is the sum of CR, PR and SD, which are early-phase outcomes available in most trials34. However, the trial-level association results between OS and tumor response end points did not meet the lowest evaluation criteria in the included meta-analyses (R2 < 0.60). This phenomenon may be relatedto two reasons: the first is the mechanism of immunotherapy. There is a phenomenon known as pseudoprogression, in which the tumor may show local enlargement or the appearance of new lesions at the beginning of treatment, but then significant therapeutic effects can appear35,36. In this case, patients may experience poor disease control for a certain period of time but eventually achieve a good treatment response and extended overall survival. Therefore, the correlation between OS and OR or DC can be affected by pseudoprogression. In fact, low correlations between OS and traditional alternative endpoints are not uncommon in other treatments in chemotherapy10,37,38. The second reason may be related to the importance of liver decompensation as a driving factor for the death of HCC patients. Unlike PFS, liver decompensation cannot be directly captured through radiological assessment. Therefore, the potential impact of liver decompensation limits the full understanding of the relationship between radiological endpoints and OS in HCC patients39. In contrast, for CR, its definition is clearer and usually requires the patient's disease to disappear for a period of time. Complete reduction indicates that the treatment has a very good control effect on the tumor8. Therefore, although achieving complete remission is rare, it does not exclude the possibility of clinical cure for patients who achieve this outcome. This perspective is consistent with the current clinical reality and is not contradictory to existing evidence. Patients may have a higher chance of achieving long-term disease-free survival, signifying that complete remission reflects a more comprehensive treatment response and tumor control and may be directly related to overall survival.

In the treatment of advanced cancer with ICIs, the association between the occurrence of adverse reactions and OS is generally low. ICIs generally have better safety and tolerability than traditional chemotherapy drugs, and the adverse reactions are caused by the overactivity of the immune system caused by the treatment40. ICIs enhance the immune response by inhibiting inhibitory signals on the immune system and may trigger irAEs, such as immune cells attacking normal tissues due to excessive activation41. In our study, we observed a strong correlation between irAEs and OS specifically in the ICI + AI completion group, suggesting that this association may vary based on the treatment approach. However, due to the wide confidence intervals for the estimated R2 values, further validation is needed for all subgroup analyses, and the small sample size may not capture true associations. Therefore, adverse reactions do not directly reflect the effect of treatment on the tumor.

This meta-analysis has several limitations that need to be considered when interpreting the study results. Firstly, the lack of individual patient-level data limits our ability to account for potential confounding factors known to influence OS. Although we intend to investigate this further in future research, the absence of this data is a limitation of the current study. Secondly, not all studies included in the analysis reported all secondary endpoints at the time of analysis, which may introduce publication bias. While efforts were made to estimate HRs from available data, the assumption that HRs can be estimated from median survival times or survival at a specific time point may be overly simplified and may not capture the complete picture. Thirdly, we acknowledge that using HRs as the sole measure to assess the correlation between surrogate endpoints and OS has its limitations. HR is influenced by both the experimental arm and the control arm results, and the efficacy of the control arm can vary across different RCTs. Violation of the assumption of proportional hazards, which is necessary for using HRs, is often observed in RCTs of advanced HCC42. However, given that this study was a secondary analysis without access to raw data for adjustment or consideration of all potential confounding factors, we chose to use the available HRs for practical reasons. Furthermore, the heterogeneity in the timing of surrogate endpoints used in the included trials also poses statistical challenges. While the definitions of tumor response assessments and adverse events were generally similar among the original papers, the variability in timing could affect the results and introduce additional uncertainty. Lastly, it is important to approach the interpretation of the study results cautiously. Using ORR and irAEs as surrogate measures for predicting survival may not provide absolute predictive information due to the different measurement scales and statistical properties of these variables. The relationship between ORR, irAEs, and survival might not be a simple linear one and could have a more complex form. Considering these aforementioned limitations, the findings of this meta-analysis should be interpreted with caution. Further research with more comprehensive data and adjusted analyses is necessary to confirm and expand upon these findings.

In summary, while the near-term effect may provide some clues to the antitumor response after treatment, further long-term studies and observations are needed to determine the survival effect of immune checkpoint inhibitors in the treatment of advanced cancer. OS remains a promising end point, and our study's finding of a high correlation between early and late OS data reinforces the value of including mid-stage analyses in phase III trials to capture early signals of efficacy and to include ineffectiveness boundaries for early termination.

Conclusion

This study provides important insights into the therapeutic effect of ICIs in patients with uHCC and the correlation between surrogate endpoints and overall survival (OS). Our findings indicate that PFS is a good surrogate endpoint for OS in phase III clinical trials. Additionally, CR demonstrates a strong correlation with clinical survival benefit. However, it is essential to note that other commonly used early surrogate markers, such as the OR, are not reliable substitutes for predicting clinical survival benefit. These findings emphasize the need to carefully consider and select appropriate surrogate endpoints in uHCC clinical trials, particularly PFS and CR, to ensure accurate evaluation of treatment efficacy and inform decision-making for improved patient outcomes.