Correlation between surrogate endpoints and overall survival in unresectable hepatocellular carcinoma patients treated with immune checkpoint inhibitors: a systematic review and meta-analysis

This study aimed to assess the therapeutic effect of immune checkpoint inhibitors (ICIs) in patients with unresectable hepatocellular carcinoma (uHCC) and investigate the correlation between surrogate endpoints and overall survival (OS). A systematic literature search included phase I, II, and III clinical trials comparing ICIs to placebo or other therapies for uHCC treatment. Correlations between OS and surrogate endpoints were evaluated using meta-regression analyses and calculating the surrogate threshold effect (STE). The correlation analysis showed a weak association between OS and progression-free survival (PFS), with an R2 value of 0.352 (95% CI: 0.000–0.967). However, complete response (CR) exhibited a strong correlation with OS (R2 = 0.905, 95% CI: 0.728–1.000). Subgroup analyses revealed high correlations between OS and PFS, CR, stable disease (SD), and DC in phase III trials (R2: 0.827–0.922). For the ICI + IA group, significant correlations were observed between OS and SD, progressive disease (PD), and grade 3–5 immune-related adverse events (irAEs) (R2: 0.713–0.969). Analyses of the correlation between survival benefit and risk of mortality across various time points showed a strong association within the first year (R2: 0.724–0.868) but a weak association beyond one year (R2: 0.406–0.499). In ICI trials for uHCC, PFS has limited utility as a surrogate endpoint for OS, while CR exhibits a strong correlation with OS. Subgroup analyses highlight high correlations between OS and PFS, SD, and DC in phase III trials. Notably, the ICI + IA group shows significant associations between OS and SD, PD, and grade 3–5 irAEs. These findings offer valuable insights for interpreting trial outcomes and selecting appropriate endpoints in future clinical studies involving ICIs for uHCC patients.


Materials and methods
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement was used for this systematic review 22 .The meta-analysis protocol was submitted to PROSPERO: CRD42023433976.

Search for studies
In our search for relevant trials, we will utilize a comprehensive approach.We will explore multiple databases, including the Cochrane Central Register of Controlled Trials (CENTRAL, Ovid), MEDLINE (PubMed), and EMBASE (Ovid) (refer to eTable 1 for the search strategy).Furthermore, we will conduct searches on online trial registries such as ClinicalTrials.gov,European Medicines Agency (EMA), and WHO International Clinical Trial Registry Platform.Additionally, in our quest for valuable information, we will search for grey literature in abstracts and posters presented at the American Society of Clinical Oncology (ASCO) annual congress via the System for Information.
It is important to note that this search strategy was performed on 9 June 2023.

Types of study to be included
For the inclusion criteria of our study, we will consider all phase I, II, and III clinical trials, as well as expanded access programs (external clinical trials).These trials and programs should involve a comparison between immune checkpoint inhibitors and placebo, no treatment, or systemic/locoregional therapies.The focus of our study is the treatment of uHCC.

Data extraction
The literature search was conducted using the literature management software Endnote X9.After removing duplicates, two researchers (LTH and CYZ) independently screened the search results in two rounds.In the first round, titles and abstracts were reviewed to exclude irrelevant literature or literature that did not meet the inclusion criteria.The full texts of the remaining studies were then read in the second round to complete the selection.In case of disagreement between the two researchers, a third and more senior researcher (XTL) was consulted to make a final judgement.Excluded studies and reasons for exclusion will be documented.

Statistical analysis
We firstly conducted a narrative description of the included studies.Categorical variables are reported as frequencies and percentages, and continuous variables are reported as medians with interquartile ranges (IQRs) unless indicated otherwise.
In each trial, we abstracted the hazard ratios (HRs) along with their corresponding 95% confidence intervals (CIs) and survival rates at specific time points 24 .In cases where HRs were not explicitly provided, we estimated them based on relevant effect measures derived from the given median survival times or survival rates at those time points.A frequentist hybrid model for random effect multivariate meta-analysis was used to evaluate surrogacy between the HRs for OS and each endpoint.A Bayesian hybrid model random-effects multivariate meta-analysis was used for sensitivity analysis.No covariates were used in the hybrid models for random effect multivariate meta-analysis.A random effects meta-regression model was used to quantify the association between the natural logarithm of the HRs for OS and each endpoint (The data analysis flowchart is shown in eFig.1).According to the ReSEEM (Systematic Review and Recommendation for Reporting of Surrogate Endpoint Evaluation using Meta-analyses) guidelines 25 , R 2 values ≥ 0.7 represent strong correlations (and thus suggest surrogacy), values between 0.69 and 0.5 represent moderate correlations, and values < 0.5 represent weak correlations.After implementing the meta-regression model, our next step is to calculate the surrogate threshold effect (STE), which plays a crucial role in determining the thresholds for estimating the surrogate endpoint 26 .In addition, we planned to investigate the impact of clinical information on the association between overall survival (OS) and surrogate measures through subgroup analyses of predefined subgroups.All statistical analyses were performed using R ver.4.3.0software.

Characteristics and quality assessment
The systematic review process, as depicted in Fig. 1, enabled us to identify a total of 13 studies from the 14 articles reviewed.The inclusion and exclusion of full-text articles are listed in eTable 2. Among the 13 reported trials, 9 were obtained from full-text reports, for which we conducted a risk of bias assessment.It was found that the overall risk of bias was low, with the main concern being open-label studies (eFig.2).Notably, the study IMbrave15_update26 27 provided additional data during an extended follow-up period, leading to its inclusion in our analysis.Across these 13 studies, we included a total of 20 comparison subgroups, involving a cohort of 4573 patients, as detailed in Table 1.
Among the 13 trials included in the analysis, five (38.5%) were phase 3 clinical trials conducted to confirm the efficacy of the treatments.It is interesting to note that in three of these trials (23.1%), overall survival (OS) was defined as the only primary endpoint.Additionally, four of the included studies (30.8%) considered OS as a dual/coprimary endpoint alongside progression-free survival (PFS).Table 1 provides a clear overview of the primary endpoints used in these trials, with PFS being the most commonly used.
Upon examining the characteristics of the patients included in the studies, it was found that the majority of them had Eastern Cooperative Oncology Group performance status classified as 0, Child-Pugh scores categorized as A, and Barcelona Clinic Liver Cancer stage B-C.However, it is crucial to acknowledge the heterogeneity in the etiology of hepatocellular carcinoma (HCC) among these studies.The percentage of patients with hepatitis B virus (HBV) ranged from 24.5 to 94%, while those with hepatitis C virus (HCV) ranged from 1.3 to 31.6%.

Main analysis
Table 2 presents the correlations between OS and potential surrogate outcomes in trials of ICIs in patients with uHCC.
We conducted a meta-regression analysis to examine the association between the natural logarithm of HR for OS and PFS, adjusting for estimation errors.The resulting equation was: (log HR of OS) = −0.086+ 0.524 (log HR of PFS), indicating a predicted 52.4% increase in the log HR of OS for each unit increase in the log HR of PFS.To further evaluate the relationship, we calculated the surrogate threshold effect (STE), which was found to be 0.165.Moreover, we observed a notably weak association between PFS and OS.This was reflected in an R 2 value of 0.352 (95% CI: 0.000-0.967),indicating that only 35.2% of the variability in the effects on OS could be attributed to the observed effects on PFS.
In addition to examining the relationship between OS and PFS, we also explored how treatment effects on OS relate to various tumor response assessment endpoints, including OR, CR, PR, SD, DC, and PD.Our analysis revealed a generally weak association between treatment effects on OS and the majority of these response endpoints, with R 2 values ranging from 0.007 to 0.225.However, in regard to CR, we observed a strong correlation between treatment effects on CR and OS.The association between the logarithm of relative risk for CR and the logarithm of hazard ratio for OS yielded an R 2 value of 0.905 (95% CI: 0.728-1.000).In the regression model, the equation for the association between OS and CR was (log HR of OS) = −0.079 to 0.131 (log RR of CR).The analysis also included a regression of log HR for OS on log RR for irAEs, which showed consistently weak correlations with R 2 values ranging from 0.026 to 0.351, indicating that less than one-third of the variability in survival benefit from ICI therapy can be explained by the variability in treatment effects on irAEs.
Furthermore, the sensitivity Bayesian analysis, conducted across all of the aforementioned analyses, also yielded similar results.

Subgroup and sensitivity analyses
Table 3 presents the results of our examination of the correlation between OS and PFS, OR, CR, PR, SD, DC, PD, or irAEs by stratum.We found a strong correlation between OS and PFS in phase III trials (R 2 : 0.851, 95% CI: 0.469-1.000).Additionally, we observed high estimated correlations between OS and SD as well as CD in phase III trials.The R 2 for OS and SD was found to be 0.890 (95% CI: 0.602-1.000),indicating a strong correlation.Similarly, the R 2 for OS and CD was 0.827 (95% CI: 0.391-1.000),demonstrating a significant association between these variables.
In the subgroup analyses of different intervention groups, we uncovered significant correlations between OS and SD, PD, grade 3-4 irAEs, or grade 5 irAEs specifically in the ICI + IA group.The R 2 observed were as follows: 0.713 (95% CI: 0.00-1.00)for OS and SD, 0.816 (95% CI: 0.166-1.00)for OS and PD, 0.766 (95% CI: 0.00-1.00)for OS and grade 3-4 irAEs, and 0.969 (95% CI: 0.00-1.00)for OS and grade 5 irAEs.These results indicate a strong relationship between these endpoints within the ICI + IA group.However, due to the insufficient sample size, we lacked adequate information to analyze the correlations in other intervention groups.
Correlation between survival benefit and risk of mortality or disease progression across time points.
We conducted an analysis to examine the correlation between the survival benefit endpoint (HR for OS) and the relative risk (RR) for both PFS and OS at different time points (6 months, 12 months, 18 months, and 24 months) to investigate the influence of time factors (Fig. 2).We found a strong association between the risk of near-term mortality (within one year) and the survival benefit, with R 2 values ranging from 0.724 (at 6 months) to 0.868 (at 12 months).However, we observed a weak association between the risk of disease progression and  www.nature.com/scientificreports/ the survival benefit, regardless of whether it was within one year or after one year, with R 2 values ranging from 0.020 to 0.202.

Discussion
PFS has historically been the most commonly used alternative endpoint in phase III clinical trials because it is defined as a composite endpoint that combines progression and death 28 .We found that overall the surrogate relationship between the treatment effects on ICIs and the effects on PFS or OS was weak, which supports existing knowledge in this area [29][30][31] .However, PFS in all these trials was defined using the traditional RECIST criteria, which were developed in the era before immunotherapy.It has been reported that traditional RECIST criteria fail to properly capture the concept of disease progression with immunotherapies that have atypical response patterns 32 .That is failure of traditional RECIST criteria to define PFS of immunotherapies might be a reason for smaller benefits in PFS vs OS with the trials of PD-1 inhibitors.In addition, exploring the relationship between the two through study design found some evidence that the study design may help improve the strength of the PFS-OS surrogacy patterns in uHCC.This could be attributed to the characteristics of phase II clinical trials, which often involve a smaller number of patients, shorter follow-up periods, and shorter trial durations.The influence of random noise may be more pronounced, resulting in a weaker correlation between progression-free survival (PFS) and overall survival (OS) 33 .On the other hand, phase III clinical trials have larger sample sizes, longer trial durations, and more diverse patient populations.These large-scale trials allow for a more accurate evaluation of treatment effects on PFS and OS, thus typically demonstrating a stronger correlation between the two endpoints.
The number of people achieving ORR is the sum of CR and PR, and the number of people achieving DCR is the sum of CR, PR and SD, which are early-phase outcomes available in most trials 34 .However, the trial-level association results between OS and tumor response end points did not meet the lowest evaluation criteria in the included meta-analyses (R 2 < 0.60).This phenomenon may be relatedto two reasons: the first is the mechanism of immunotherapy.There is a phenomenon known as pseudoprogression, in which the tumor may show local www.nature.com/scientificreports/enlargement or the appearance of new lesions at the beginning of treatment, but then significant therapeutic effects can appear 35,36 .In this case, patients may experience poor disease control for a certain period of time but eventually achieve a good treatment response and extended overall survival.Therefore, the correlation between OS and OR or DC can be affected by pseudoprogression.In fact, low correlations between OS and traditional alternative endpoints are not uncommon in other treatments in chemotherapy 10,37,38 .The second reason may be related to the importance of liver decompensation as a driving factor for the death of HCC patients.Unlike PFS, liver decompensation cannot be directly captured through radiological assessment.Therefore, the potential impact of liver decompensation limits the full understanding of the relationship between radiological endpoints and OS in HCC patients 39 .In contrast, for CR, its definition is clearer and usually requires the patient's disease to disappear for a period of time.Complete reduction indicates that the treatment has a very good control effect on the tumor 8 .Therefore, although achieving complete remission is rare, it does not exclude the possibility of clinical cure for patients who achieve this outcome.This perspective is consistent with the current clinical reality and is not contradictory to existing evidence.Patients may have a higher chance of achieving long-term diseasefree survival, signifying that complete remission reflects a more comprehensive treatment response and tumor control and may be directly related to overall survival.In the treatment of advanced cancer with ICIs, the association between the occurrence of adverse reactions and OS is generally low.ICIs generally have better safety and tolerability than traditional chemotherapy drugs, and the adverse reactions are caused by the overactivity of the immune system caused by the treatment 40 .ICIs enhance the immune response by inhibiting inhibitory signals on the immune system and may trigger irAEs, such as immune cells attacking normal tissues due to excessive activation 41 .In our study, we observed a strong correlation between irAEs and OS specifically in the ICI + AI completion group, suggesting that this association may vary based on the treatment approach.However, due to the wide confidence intervals for the estimated R 2 values, further validation is needed for all subgroup analyses, and the small sample size may not capture true associations.Therefore, adverse reactions do not directly reflect the effect of treatment on the tumor.
This meta-analysis has several limitations that need to be considered when interpreting the study results.Firstly, the lack of individual patient-level data limits our ability to account for potential confounding factors known to influence OS.Although we intend to investigate this further in future research, the absence of this data is a limitation of the current study.Secondly, not all studies included in the analysis reported all secondary endpoints at the time of analysis, which may introduce publication bias.While efforts were made to estimate HRs from available data, the assumption that HRs can be estimated from median survival times or survival at a specific time point may be overly simplified and may not capture the complete picture.Thirdly, we acknowledge that using HRs as the sole measure to assess the correlation between surrogate endpoints and OS has its limitations.HR is influenced by both the experimental arm and the control arm results, and the efficacy of the control arm can vary across different RCTs.Violation of the assumption of proportional hazards, which is necessary for using HRs, is often observed in RCTs of advanced HCC 42 .However, given that this study was a secondary analysis without access to raw data for adjustment or consideration of all potential confounding factors, we chose to use the available HRs for practical reasons.Furthermore, the heterogeneity in the timing of surrogate endpoints used in the included trials also poses statistical challenges.While the definitions of tumor response assessments and adverse events were generally similar among the original papers, the variability in timing could affect the results and introduce additional uncertainty.Lastly, it is important to approach the interpretation of the study results cautiously.Using ORR and irAEs as surrogate measures for predicting survival may not provide absolute predictive information due to the different measurement scales and statistical properties of these variables.The relationship between ORR, irAEs, and survival might not be a simple linear one and could have a more complex form.Considering these aforementioned limitations, the findings of this meta-analysis should be interpreted with caution.Further research with more comprehensive data and adjusted analyses is necessary to confirm and expand upon these findings.
In summary, while the near-term effect may provide some clues to the antitumor response after treatment, further long-term studies and observations are needed to determine the survival effect of immune checkpoint inhibitors in the treatment of advanced cancer.OS remains a promising end point, and our study's finding of a high correlation between early and late OS data reinforces the value of including mid-stage analyses in phase III trials to capture early signals of efficacy and to include ineffectiveness boundaries for early termination.

Figure 1 .
Figure 1.Flow diagram of study selection process.N number of patients.

Figure 2 .
Figure 2. Correlation between survival benefit and risk of mortality or disease progression across time points.Each circle represents a trial, and the surface area of the circle is proportional to the number of events observed in the corresponding trial.Straight lines represent weighted regression lines.

Table 1 .
Randomised clinical trials characteristics.OS overall survival, PFS progression free survival, ORR objective response rate, DLT dose-limiting toxicity, ICI immune checkpoint inhibitor, AI angiogenesis inhibitor, TKI tyrosine kinase inhibitor, IMiD immunomodulatory drugs, BSC best supportive care treatment.

Table 2 .
Relationship between overall survival (OS) and potential surrogate outcomes.PFS progression-free survival, OR objective response, CR complete response, PR partial response, SD stable disease, DC disease control, PD progressive disease, irAE immune-related adverse event.

Table 3 .
Overview of sensitivity analyses.NE not evaluable, NAN not a number, ICI immune checkpoint inhibitor, AI antiangiogenic agents.a The median sample size of the included comparison arms was 315, and we performed sensitivity analyses 1 only for studies with larger sample sizes.