Prognostic value of baseline metabolic tumour volume in advanced-stage Hodgkin’s lymphoma

Our aim was to evaluate the prognostic value of initial total metabolic tumour volume (TMTV) in a population of patients with advanced-stage Hodgkin’s lymphoma (HL). We retrospectively included 179 patients with stage IIb-III-IV Hodgkin’s disease who received BEACOPP or ABVD as the first-line treatment. The initial TMTV was determined using a semi-automatic method for each patient. We analysed its prognostic value in terms of 5-year progression-free survival (PFS), overall survival, and positron emission tomography (PET) response after two courses of chemotherapy. Considering all the treatments and using a threshold of 217 cm3, TMTV was predictive of 5-year PFS and PET response after two courses of chemotherapy. In multivariable analysis involving TMTV, IPI score, and the first treatment received, TMTV remained a baseline prognostic factor for 5-year PFS. In the subgroup of patients treated with BEACOPP with a threshold of 331 cm3, TMTV was predictive of PET response, but not 5-year PFS (p = 0.087). The combined analysis of TMTV and PET response enabled the individualisation of a subgroup of patients (low TMTV and complete response on PET) with a very low risk of recurrence. Baseline TMTV appears to be a useful independent prognostic factor for predicting relapse in advanced-stage HL in ABVD subgroup, with a tendency of survival curves separation in BEACOPP subgroup.

PET acquisition and interpretation. PET/CT scans were acquired on three different PET/CT systems-Biograph 16 (Siemens Healthcare, Erlangen, Germany), Discovery 710 (GE Healthcare, Chicago, Illinois, United States), and Biograph mCT (Siemens Healthcare, Erlangen, Germany). All subsequent PET/CT scans conducted for treatment evaluation were performed using the same PET/CT device that was used for the baseline scan.
Patients fasted for at least 6 h before the 18F-FDG injection. Injection was not administered unless the glucose blood level was < 1.8 g/L. The activity of the injected 18 F-FDG activity ranged from 3.5 MBq/Kg to 4.5 MBq/kg, with a maximum activity of 450 MBq. Scans were acquired approximately 60 min after the injection. CT scans were acquired from the orbits to the midthigh in most cases and whole-body acquisition was conducted in others, with 120 kV and 100-150 mAs (based on the patient's weight). OSEM reconstruction was performed with routine parameters (two iterations and 24 subsets). Contrast media injections were not administered.
The response to PET2 was evaluated by using the Deauville score and modified Deauville score (2011 AHL criteria taking into account 140% of liver background) 4 .

Segmentation.
We analysed TMTV using the Beth Israel plugin for FIJI (ImageJ), a shareware from the Beth Israel Deaconess Medical Center, Division of Nuclear Medicine and Molecular Imaging 8 .
Each hypermetabolic focus suspected of lymphomatous localisation was segmented on fused PET/CT images with a threshold of 41% of SUVmax. First, segmentation was performed automatically using the software. Manual verification was then performed with, if necessary, the addition of potential forgotten foci and modification of the automatically segmented ones.
Segmentation of the hypermetabolic lymph nodes, spleen, bone, and other anatomical foci was performed independently, and the TMTV and total lesion glycolysis (TLG) were recorded for each of them.
The TMTV was obtained by summing the metabolic volumes of all nodal and extranodal lesions. Bone marrow was involved in the volume measurement only if focal uptake was observed. The spleen was considered as involved if there was focal uptake or diffuse uptake higher than 150% of the liver background, as recommended 9 .
Three nuclear physicians (SB, ET, and PP) performed the segmentations, with each patient's foci segmented by two of the physicians. For the two values of TMTV and TLG obtained for each patient, the reference value retained was the one determined by the most experienced observer. Statistical analysis. Statistical analysis was performed using the R software, version 4.0.4 10 . Continuous data were compared using independent samples t-tests. Agreement between two observers was evaluated by using intraclass correlation coefficient (ICC) to measure the consistency of MTV and TLG evaluations. The 95% confidence intervals of ICC were estimated using 10,000 bootstrap replications with the adjusted bootstrap percentile 11,12 . The median follow-up was calculated using the reverse Kaplan-Meier method 13 . PFS and OS were estimated from the date of diagnosis to progression (first clinical suspicion of recurrence or diagnosis of recurrence on computed tomography (CT) or positron emission tomography (PET)) or death, and death, respectively. The statistical analysis was performed at 5 years; hence, the data was censored at this time. Receiver operating characteristic (ROC) curves were used to predict the PFS at 5 years for each segmentation method by identifying the optimal cut-off values. Survival probabilities were calculated using the Kaplan-Meier method. Log-rank tests and multivariate analyses were performed using Cox models. Statistical significance was set at a two-tailed p value of < 0.05. For secondary analyses, a Hochberg correction was applied to control the risk of family-wise type I error at 5% 14 . Ethical approval. This study was performed in accordance with the Declaration of Helsinki and local laws, and the protocol was approved by the Institutional Review Board of Henri Becquerel Centre (n°2102 B). Inter-observer correlation. ICC (intraclass correlation coefficient) between observer 1 (SB) and observer 2 (ET) (59 patients) was 0.92 (0.73-0.98) for TMTV and 0.92 (0.77-0.98) for TLG. ICC between observer 1 (SB) and observer 3 (PP) (59 patients) was 0.93 (0.67-0.98) for TMTV and 0.97 (0.77-1.00) for TLG. The values Baseline PET parameters for the whole population. Median TMTV was 251.06 cm 3 (range 125.58-392.37). The ROC curve analysis of the prognostic performance of TMTV on 5-year PFS showed an AUC of 0.57. Using the Youden index, the best TMTV cut-off value was 217 cm 3 for 5-year PFS, with a sensitivity of 67% and a specificity of 50% (Fig. 1). The presence of a TMTV ≥ 217 cm 3 was associated with a significantly shorter PFS (p = 0.027) and a hazard ratio (HR) of 1.91 (1.07 to 3.42). The 98 patients with a high TMTV had a significantly worse outcome, with a 5-year PFS of 64% vs. 77% for patients with a lower TMTV. Using the same cut-off value, the presence of a high TMTV was not significantly associated with a shorter OS (p = 0.15) (Fig. 2). Median TLG was 1389.24 (range 595.13-2507.36). The ROC curve analysis of the prognostic performance of TLG on 5-year PFS showed an AUC of 0.57. Using the Youden index, the best TLG cut-off value was 949 for 5-year PFS, allowing a sensitivity of 73% and a specificity of 46% (Fig. 1). The presence of a TLG ≥ 949 was associated with a significantly shorter PFS (p = 0.015) and a HR of 2.11 (1.14-3.91). The 106 patients with a high TLG had a significantly worse outcome, with a 5-year PFS of 64% vs. 79% for patients with a lower TLG. Using the same cut-off value, the p value evaluating the association between TLG and OS was also provided, but caution is to be used when interpreting this value because the hypothesis of proportional hazard was not respected (Fig. 2).
Subgroup analysis. While separating the sample according to treatment, the optimal cut-off for TMTV in ABVD subgroup remained the same as in overall sample (217 cm 3 ), with a sensitivity of 65% and a specificity of 58%.
In the BEACOPP subgroup, the optimal cut-off for TMTV was of 331 cm 3 , with a sensitivity of 75% and a specificity of 59%. TMTV was not predictive of 5-year PFS, although a tendency of survival curves separation is observed (p = 0.087) with a HR of 3.68 [0.74 to 18.3]. The patients with a high TMTV had a 5-year PFS of 79%, compared to 94% in patients with a low MTV (Fig. 3).
Patients in the BEACOPP subgroup had a significantly higher 5-year PFS than those in the ABVD subgroup (p = 0.0017), with a HR of 0.32 (0.15-0.68). The 115 patients in the ABVD subgroup had a significantly worse outcome than patients in the BEACOPP group (PFS of 62% vs. 87%).

PET2 response.
Of the 110 patients who underwent PET2, the modified Deauville score was predictive of 5-year PFS (p = 0.048) with a HR of 2.34 (0.98-5.58). The 24 patients with a positive PET2 had a slightly significantly worse outcome, with a 5-year PFS of 67% vs. 82% for patients with a negative one.
Of the 110 patients who had PET2, the mean TMTV of 312.42 cm 3 in the negative PET2 subgroup was significantly lower than the mean TMTV of 508.31 cm 3 in the positive PET2 subgroup (p = 0.01). Among the 57 patients with a high TMTV, 17 (29.8%) had positive PET2 results (Fig. 4).
Similar results were found with the Deauville score.
Subgroup analysis. Among the 62 patients treated with BEACOPP, there was a significant difference between the mean TMTV in the negative PET2 subgroup (346.27 cm 3 ) and the one in the positive PET2 subgroup (675.01 cm 3 ), p = 0.005, using the modified Deauville score (Fig. 4) or Deauville score. In contrast, no difference was observed in the subgroup treated with ABVD (n = 48), according to the mean TMTV (p = 0.96) (Fig. 4).

Multivariate analysis.
In multivariate analysis using the Cox model and combining TMTV with known initial prognostic factors (IPI score and first treatment received), TMTV < 217 cm 3 and a first treatment with BEACOPP were associated with a significantly longer PFS (HR 0.43, p = 0.02, HR 0.29, and p = 0.003, respectively), whereas none of these parameters were associated with a significantly longer OS (Table 2).

Combined analysis.
Combining the TMTV-and PET2-modified Deauville score allowed us to identify two risk categories. Patients with a TMTV < 217 cm 3 and a negative PET2 had a significantly higher 5-year PFS than those with a TMTV ≥ 217 cm 3 or a positive PET2 (p = 0.0037). The 44 patients with a low TMTV and a negative PET2 had a significantly better outcome, with a 5-year PFS of 91% vs. 70% for patients with a higher TMTV or positive PET2 (Fig. 5).
Subgroup analysis. In the ABVD subgroup, patients with a TMTV < 217 cm 3 and a negative PET2 had a significantly higher 5-year PFS than those with a TMTV ≥ 217 cm 3 or a positive PET2 (89% vs. 54%, respectively, p = 0.0038). Similar results were found in the BEACOPP subgroup, with patients with a TMTV < 331 cm 3 and a negative PET2 that had a significantly higher 5-year PFS than those with a TMTV ≥ 331 cm 3 or a positive PET2 (97% vs. 77%, respectively, p = 0.021) (Fig. 5).

Discussion
To our knowledge, this is the first study to demonstrate the independent prognostic value of baseline TMTV in advanced-stage HL. A high TMTV allows the identification of patients with high risk for HL recurrence in the whole population. The results of multivariable analysis involving TMTV, IPI score, and the first treatment received showed that TMTV remains a baseline prognostic factor for 5-year PFS, in contrast with IPI score.  However, in subgroup analysis, TMTV remains predictive of 5-year PFS only in patients treated with ABVD. In the population of patients treated with BEACOPP, optimal cut-off was higher (331 cm 3 ) compared to ABVD subgroup (217 cm3). This is most likely because BEACOPP-treated patients are significantly younger (median age 33 years versus 41 years for the ABVD group) and have a significantly higher TMTV (median 307 cm 3 versus 217 in the ABVD group). In this subgroup, TMTV is predictive of response after two courses of chemotherapy.
These results are in accordance with those of Mettler et al. 7 , who also showed the predictivity of TMTV on PET2 response but not PFS in a prospective study of 310 patients treated with BEACOPP. As suggested by the authors, this could be explained by a higher complete remission rate with escalated therapy than with anthracycline-based treatment. Furthermore, the high efficacy of BEACOPP after a positive PET2 could mask the prognostic value of TMTV that could be observed in patients receiving ABVD. In our study, only eight patients who were initially treated with BEACOPP relapsed. Of these eight patients, six had a TMTV ≥ 331 cm 3 (mean, 679 cm 3 ) and five had a positive PET2. Therefore, the low number of events probably limited the results obtained.
The prognostic value of TMTV is now well known in diffuse large B-cell lymphoma [16][17][18] , peripheral T-cell lymphoma 19 , and early stage HL 6 , but remains poorly studied in advanced HL. In 2019, Cottereau et al. showed that in a population of 258 patients with early stage HL from the standard arm of the H10 trial, an initial TMTV of < 147 cm 3 was predictive of better 5-year PFS and OS 6   The TMTV values we found were consistent with those reported in other studies on similar subjects. In a preliminary analysis presented at the American Society of Clinical Oncology (ASCO), using the same method of segmentation, Casasnovas et al. 21 found an optimal cut-off of TMTV of 350 cm 3 (vs. 331 cm 3 in our study) in a population of patients with advanced-stage HL treated with BEACOPP. Kanoun et al. 20 found a median TMTV of 117 cm 3 (vs. 217 cm 3 in our study) in a population of patients with early or advanced HL treated with ABVD. This lower value is probably related to the inclusion of early stage patients. However, the predictive cut-off for PFS in this study was very close to ours: 225 vs. 217 cm 3 in our study.
We chose the segmentation method using the 41% SUVmax threshold to determine the TMTVs, as recommended 22,23 . However, this method does not seem to be the most reproducible 24 , and methods with fixed thresholds may be preferred. In our study, reproducibility of segmentation remained excellent, with an ICC between 0.91 and 0.93, depending on the observers for the measurement of TMTV. This excellent reproducibility can be explained by the fact that a semi-automatic method was used.
Although the prognostic value of TMTV has been demonstrated in several types of lymphoma, TMTV is rarely used in clinical practice. Its measurement is time consuming as each lesion must be segmented individually. To solve this problem, several automatic segmentation methods have been developed in recent years. Among them, those using convolutional neural networks seem the most promising. Currently, they are less reliable than human segmentations, but could potentially allow, in the near future, a reliable estimation of TMTV in a systematic way 25 .
The impact of this new prognostic factor has to be evaluated in patients treated with new drug combinations including agents such as brentuximab vedotin or a checkpoint inhibitor and has to be assessed in prospective clinical trials testing the relevance of adapted therapy depending on TMTV and PET2 response.   www.nature.com/scientificreports/ Our study has some limitations, including its retrospective nature, the relatively small number of patients included, and the heterogeneity of chemotherapy protocols. In addition, the analysis of response to two courses of treatment was limited by the fact that PET2 was not performed for all patients. However, it highlights the prognostic value of TMTV in advanced HL, which could enable the definition of new groups of patients according to their risk of recurrence. In particular, the use of composite criteria considering PET2 response associated with initial TMTV could be relevant and allow the implementation of a possible protocol of treatment relief for the group of patients with a very good prognosis.

Conclusion
Baseline TMTV appears to be a useful independent prognostic factor for predicting relapse in advanced stage HL in the ABVD subgroup, with a tendency of survival curves separation in BEACOPP subgroup and could be used to improve risk stratification. However, its use in everyday practice is limited owing to the multiplicity of segmentation methods and its time-consuming nature. Further prospective investigations are needed to evaluate the benefits of including baseline TMTV as a factor in determining treatment regimen.