Introduction

Follicular Lymphoma (FL) is an indolent CD20 + Non-Hodgkin Lymphoma (NHL) and the most common low-grade B-cell lymphoma, accounting for 25% of all the NHL in Western countries1. and occurs predominantly at an advanced age2.

FL is a non-curable indolent disease with frequent relapses but with median survival time exceeding 10 years3. The clinical course can be very heterogeneous at many levels with a wide variety of presentations, histological appearances, clinical behaviors and responses to therapy.

Treatment varies depending on the stage and clinical presentation. Therefore, accurate staging is crucial for appropriate management. Whether the disease is localized or in an advanced stage, numerous frontline treatment options can be proposed and range from watchful waiting, single-agent rituximab (R) to external radiation therapy for low tumor burden4 or immuno-chemotherapy for high tumor burden according to the Groupe d’Etude des Lymphomes Folliculaires (GELF) criteria5. Since the use of anti-CD20 monoclonal antibodies, primarily Rituximab, combined with chemotherapy and used as maintenance therapy for two years thereafter, the overall survival rate of FL patients has remarkably improved6.

Despite the improvements in long-term disease control, about 20% of patients affected by FL ultimately experience treatment failure and progression of disease within 24 months (POD24) from the time of diagnosis with a 5-years OS rate of only 50% for them7. POD24 is the main prognostic factor affecting the outcome of FL patients8,9. Furthermore, histological transformation of FL occurs in 5–10% of patients with a 2% increased risk per year after the diagnosis10 (usually DLBCL) resulting in a poor prognosis11. Therefore, early identification at diagnosis of patients who will present an early-relapse or progression is needed to guide therapeutic strategy, as at present time no biomarker or tool exist to anticipate POD24 + patients at diagnosis.

Despite its indolent biology, FL is FDG-avid in more than 95% of the cases, regardless of tumor grade12,13,14,15,16. Recent guidelines recommend 18F-FDG Positron Emission Tomography/Computed Tomography (PET/CT) in clinical practice as a standard of care in FL for diagnosis, baseline staging, suspicion of transformation, best suitable re-biopsy site in the context of relapse and at time of response assessment (gold standard)17,18,19,20,21.

Effectively, PET/CT is more sensitive and specific than standard computed tomography (CT) scans or MRI22,23 with better detection of all disease sites, particularly in identifying extra-nodal disease, altering stage assignation and changes management by up-staging or down-staging in around 20% of FL patients24,25. Furthermore, CT is unable to distinguish between viable tumor and fibrosis in the post-therapy residual masses26,27,28,29.

Numerous studies suggested higher values of quantitative maximum standardized uptake value (SUVmax)30, total metabolic tumor volume (TMTV) parameter31,32,33,34 and qualitative Deauville index are considered to be associated with inferior survival at baseline 18F-FDG PET/CT evaluation in FL. The prognostic value of TMTV obtained from baseline 18F-FDG PET/CT has been recently reported in patients with various subtypes of lymphomas 35,36,37,38,39,40,41,42.

With the development of the radiomics, more complex quantitative parameters than TMTV, which more precisely evaluate the tumor burden, can be extracted from 18F-FDG PET/CT data and analyzed, such as tumor’s massiveness-fragmentation, dispersion or activity43,44. There is increasing evidence that these 18F-FDG PET/CT quantitative parameters may also have predictive value in FL45,46,47, but a definitive consensus has not been achieved yet.

FL patients with a high risk of treatment failure or early-relapse cannot be easily identified by the classic prognostic clinical indicators such as the FLIPI-148 or FLIPI-249. Thus, there is a need for new reliable prognostic biomarkers to better select high-risk patient categories that can benefit from personalized, risk-adapted, treatment strategies, shortly after diagnosis.

The aim of the present study was to measure 18F-FDG PET/CT-derived quantitative parameters in newly diagnosed patients with high tumor burden FL and investigate their potential role, alone or in combination, as predictive factors for POD24 at baseline imaging.

Methods

Patients

We carried out a retrospective observational monocentric cohort study called “LYMFOTEP” in the Nuclear Medicine department and the Hematology department of Henri Becquerel Cancer Centre, Rouen, France. Research protocol was approved by the Institutional Review Board of Henri Becquerel Centre (no. 2005B). Patients were informed about the use of anonymized data for research and their right to oppose this use. Written informed consent was waived because of the retrospective nature of the study. The study respected the ethical principles of the 2008 Helsinki Declaration.

The inclusion criteria were as follows: (1) all patients were aged 18 years or older with a (2) histological based diagnosis of follicular lymphoma (3) grade 1-3a in accordance with the World Health Organization (WHO) classification, between 2006 and 2020, (4) treated with first line immuno-chemotherapy including anti-CD20 monoclonal antibodies for high tumor burden according to GELF criteria with a (5) mandatory baseline 18F-FDG PET/CT examination, performed prior to initiation of therapy and (6) patient’s non-opposition statement.

Exclusion criteria were as follows: (1) patients with other malignant tumors, (2) known histological transformation at baseline, (3) aggressive B-cell lymphoma, (4) FL grade 3b according to the WHO classification and (5) no available 18F-FDG PET/CT at baseline.

Clinical data

Baseline patient’s disease characteristics (FLIPI, POD24) and survival data were obtained from internal medical records. Clinical data obtained from all patients included the following information: gender, age at disease onset, disease characteristics, LDH, Hemoglobin, ECOG score and treatment regimens.

PET acquisition and interpretation

All patients underwent 18F-FDG PET/CT with acquisitions performed according to the Society of Nuclear Medicine and Molecular Imaging (SNMMI) and the European Association of Nuclear Medicine (EANM) guidelines50. Patients were instructed to fast for at least six hours before 18F-FDG injection. The radiopharmaceuticals (18F-FDG) were supplied by CURIUM® or PETNET® and manufactured in accordance with Good Manufacturing Practices and the European Pharmacopoeia. Injection was not performed unless glucose blood level was below < 1.8 g/L. 18F-FDG intravenous injected activity was around 2.5–4 MBq/kg, as a function of the PET/CT device used: Biograph 16 (Siemens Medical Solutions, Knoxville, TN, USA), Biograph 40 (Siemens Medical Solutions, Knoxville, TN, USA), Discovery 710 (GE Healthcare, Milwaukee, WI, USA) or Biograph Vision-600 (Siemens Medical Solutions, Knoxville, TN, USA) with a maximum activity of 450 MBq, after 30 min of rest. Scans were acquired approximately 60 min (± 5 min) after injection. CT scans for attenuation correction and anatomic localization were acquired from the mid-thigh toward the base of the skull in most cases and whole-body acquisition was realized in others, with 100 to 120 kV and 100–150 mAs (based on patient’s weight), in helical mode. Contrast media injection was not performed. Images were reconstructed with validated and commercially available iterative algorithms (Ordered-Subset Expectation Maximization iterative reconstruction). The PET systems were normalized daily and the calibration coefficient was validated if the day-to-day variation remained below 0.3%. The global quantification, from the dose calibrator to the imaging system, was measured internally on a quarterly basis and double checked by the EARL’s quality assurance program.

18F-FDG PET/CT data was anonymized and collected in DICOM format. All the data was then retrospectively reviewed and integrated in an eCRF. Quantitative PET parameters and measurements were performed and extracted by a trained nuclear physician, unaware of clinical outcome or patient characteristics. Data was analyzed using the plug-in PET/CT viewer for FIJI (ImajeJ), a freeware from the Beth Israel Deaconess Medical Centre, Division of Nuclear Medicine and Molecular Imaging51,52. 41% of the maximum standardized uptake value (SUVmax) was applied as a threshold53. First, segmentation was performed automatically using the software and was then checked visually to confirm inclusion of only pathological lesions. A manual verification and adaptation was then performed, if needed. Lesions sites were determined according to visual assessment with 18F-FDG PET/CT images scaled to a fixed SUV display and color table. Each hypermetabolic focus suspected of malignant disease localization was segmented on fused PET/CT images. Segmentations of the hypermetabolic lymph nodes, spleen, bone and other pathological foci were saved separately.

Lesions considered as pathological were identified visually as areas of increased uptake outside areas of physiological uptake (e.g. brain, heart, urinary system etc.). For the bone marrow and spleen involvement, only the focal uptakes were included. However, in case of a diffuse and intense spleen uptake, the whole spleen was included if its SUV was greater than 150% of the liver background54.

Radiomic parameters

A total of twelve quantitative 3-D PET/CT-derived parameters were then extracted with the software Oncometer3D v1.043 https://www.researchgate.net/publication/378659728_Oncometer3D_10], an exhaustive description and graphical representation of these PET parameters are available in Supplemental Data.

Statistical analysis

Continuous variables are reported as mean ± SD with minimal and maximal values. Categorical variables are expressed as numbers and percentages.

The relationship between the different PET metrics was characterized by the Spearman’s rank correlation coefficient. A correlogram was represented thanks to corrplot R function. It corresponded to correlation matrix between the twelve PET/CT parameters associated with the significance of each correlation coefficient.

POD24 (progression of disease within 24 months), a binary variable, was defined as disease progression within 24 months after first line immuno-chemotherapy, while non-POD24 was defined as the absence progression within 24 months of first-line therapy. Receiver operating characteristic (ROC) analysis for POD24 was used to determine the optimal cut-off value for each feature by maximizing the product of sensitivity and specificity. Sensitivity and specificity were calculated for that suitable cut-off. The area under the curve (AUC) was also calculated.

Progression-Free Survival (PFS), a continuous variable, was defined from treatment initiation to disease progression, death for any reason or relapse up to 24 months.

Overall Survival (OS) was defined as the time from treatment initiation to the date of death by any cause.

Survival curves were obtained with the Kaplan–Meier method. Quantitative PET/CT variable was dichotomized according to the established cut-off from ROC analysis. For each binary variable, comparison of survival curves between categories was assessed by log-rank test. Cox Proportional Hazards Model was performed to evaluate the relationship between study variables and survival rates. Statistical significance was set at a two-tailed «p» value < 0.05.

Results

Patients’ characteristics and outcomes

One hundred and twenty-six patients extracted from the “LYMFOTEP” study with previously untreated FL, considered as “high” tumor burden according to GELF criteria and available baseline 18F-FDG PET/CT were included in the study (see flowchart in Fig. 1). Description of the population with clinical characteristics is available in Table 1. Patients (65 males and 61 females) had a median age of 61 (range 35–88) and the vast majority (86.5%) received immuno-chemotherapy with R-CHOP regimen. The median follow-up was 120 months.

Figure 1
figure 1

Flow chart of patient selection for the study.

Table 1 Baseline characteristics of follicular lymphoma patients (n = 126).

Mean values of the twelve baseline 18F-FDG PET/CT tumor’s features evaluated are reported in Table 2. Patients were separated between two groups according to their POD24 status. A first group: POD24- defined by POD24 = 0 and including 98 patients (77.8%) and a second group: POD24 + defined by POD24 = 1 including 28 patients (22.2%). Distributions were significantly different between POD24 + and POD24-for TMTV (p < 0.001), TLG (p < 0.001), TVSR (p = 0.009), TMTS (p < 0.001), TumBB (p = 0.038), medEDGE (p = 0.003), medPCD (p < 0.001) and itErosion (p = 0.003).

Table 2 Description of the twelve parameters extracted from baseline PET/CT.

The 10-years OS was 78.4% for the whole population in our study.

18F-FDG PET/CT metrics and correlations

As visible in the correlogram (Fig. 2), four different clusters, combining highly correlated parameters among the twelve PET parameters analyzed, could be identified:

  • Activity (SUVmax; SUVmean),

  • Tumor burden (TMTV; TMTS),

  • Massiveness/fragmentation (TVSR; medPCD; medEDGE; itErosion),

  • Dispersion (Dmax; TumBB; nROI)

Figure 2
figure 2

Correlogram between the twelve PET/CT parameters with numeric values. SUVmax, maximum standardized uptake value; TMTV, total metabolic tumour volume; TLG, total lesion glycolysis; Dmax, largest distance between two lesions; SUVmean, mean standardized uptake value; TVSR, tumour volume surface ratio; TMTS, total metabolic tumour surface; TumBB, tumour bounding box; nROI, number of regions of interest; medEDGE, median edge distance; medPCD, median distance between the centroid of the tumour and its periphery; itErosion, iterative erosion. Grey arrows for p-values > 0.05 if the correlation coefficient is different from 0.

Interestingly, TMTS was highly correlated with TMTV (ρ = 0.91) and therefore had low added prognostic value compared to TMTV, while by contrast, TVSR (ratio between TMTV and TMTS) was rather uncorrelated with TMTV (ρ = 0.45) and could have therefore an added prognostic value. In addition, TLG was the only parameter significantly correlated with all others, including parameters belonging to different clusters, it was, in particular, highly correlated to TMTV (ρ = 0.89).

ROC curve analysis

The ROC curve analysis for POD24 is showed in Table 3 where optimal cut-off, AUC, performance parameters and repartition of patients below/above cut-off were presented. TMTV had the highest area under the curve (AUC = 0.734), followed by medPCD (AUC = 0.733) and TLG (AUC = 0.715). TVSR, medEDGE, itErosion, TMTS and TumBB had also AUC significantly different from 0.5.

Table 3 Diagnostic performances of the 12 PET/CT-derived parameters for POD24 using a ROC analysis.

Kaplan–Meier survival analysis

A Kaplan–Meier survival analysis was performed according to the cut-off values of the ROC curves for POD24. Burden parameters (TMTV, TMTS and TLG) and fragmentation parameters (TVSR, medPCD, medEDGE and itErosion) had statistically significant log-rank tests (all p-values < 0.001). Graphical representations for TMTV (p < 0.001), TVSR (p = 0.0019) and medPCD (p < 0.001) are represented in Fig. 3.

Figure 3
figure 3

Kaplan–Meier survival analysis for PFS at 24 months according to the TMTV (A), TVSR (B) and medPCD (C).

Cox univariate analysis

Cox univariate analyses are presented in Table 4 and Tables 1 and 3. in Supp. Data. The univariate analysis showed that neither FLIPI score, nor male sex were significantly associated with PFS censored at 24 months (FLIPI-High: p = 0.3; FLIPI-Intermediate: p = 0.9; Male sex: p = 0.2) or uncensored PFS.

Table 4 Univariate cox analysis for PFS at 24 months.

In contrast, high burden parameters (TMTV, TLG, TMTS) and high fragmentation parameters (TVSR, medEDGE, medPCD and itErosion) were significantly associated with less favorable survival rates for PFS and OS with the highest Hazard Ratio (HR) for TLG (HR = 4.628; 95% CI 2.13–10.07; p < 0.001), followed by medPCD (HR = 4.507; 95% CI 2.01–10.10; p < 0.001) and TMTV (HR = 4.341; 95% CI 2.12–8.88; p < 0.001) for POD24.

Combination of parameters: TMTV, TVSR and medPCD

Due to the high correlation observed between some of the parameters with significant statistical value in univariate analysis, a combination of parameters from different clusters appeared to be more appropriate than a multivariate analysis including all significant parameters. A combination of three parameters (TMTV, TVSR and medPCD) was then performed.

Patients could be divided into four sub-groups according to the threshold obtained in ROC analyses: “0” for no high parameter among the three, “1” for only one high parameter, “2” for two high parameters and “3” for all three high parameters.

Distribution of the patients according to that classification and POD24 status is presented in Table 5. Groups “0” and “1” represented 67 patients over the 98 patients POD24- while groups “2” and “3” represented 21 patients over the 28 patients POD24+ , thus a specificity of 68% and sensitivity of 75% for this categorization to determine POD24 status.

Table 5 Description of combined score evaluating TMTV, TVSR and medPCD according to POD24 status.

Kaplan–Meier survival curves are available in Fig. 4 and Fig. 2 and 3 in Supp. Data and show significantly different survival curves according to the number of high parameters. The smallest probability of survival was observed for the patients group combining three high level parameters.

Figure 4
figure 4

Kaplan–Meier survival analysis for PFS at 24 months according to the combination score (TMTV + TVSR + medPCD).

Consistently, in Cox analyses (Table 6 and Tables 2 and 4 in Supp. Data), patients with 3 high level parameters had a significantly worse PFS at 24 months (HR = 12.562; 95% CI 3.57–44.20; p < 0.001) than patients with 2 high level parameters (HR = 6.75; 95% CI 1.83–24.95; p = 0.004) or patients with only 1 high level parameter (HR = 5.36; 95% CI 1.34–21.44; p = 0.02), showing the synergistic effect of the combination of these 3 PET parameters.

Table 6 Cox analysis for combined score for PFS at 24 months (TMTV + TVSR + medPCD).

Examples of the four sub-groups of the newly established scoring system using maximal intensity projection (MIP) on PET images are shown in Fig. 5.

Figure 5
figure 5

Maximal intensity projection (MIP) images of 1-3A follicular lymphoma patients group 0 (A: TMTV 202.7 cm3; TVSR 4.3 mm; medPCD 18.1 mm), group 1 (B: TMTV 211.5 cm3; TVSR 5.0 mm; medPCD 29 mm), group 2 (C: TMTV 1440.8 cm3; TVSR 4.6 mm; medPCD 49.7 mm) and group 3 (D: TMTV 2823.1 cm3; TVSR 5.8 mm; medPCD 39.5 mm), according to the combination score (TMTV; TVSR; medPCD).

Discussion

The prognostic value of PET/CT-derived parameters has been investigated in various lymphoma entities, in addition to the standard qualitative visual analysis (Deauville five-point scale). The current study evaluates the association and the prognostic impact of different PET/CT biomarkers such as TMTV, TVSR and medPCD from baseline 18F-FDG PET/CT in a cohort of more than a hundred FL patients with high tumor burden according to GELF criteria, mainly treated with R-CHOP immuno-chemotherapy. Our study confirms the strong and significant prognostic value of tumoral features and their help, notably by combining them, in the early identification of FL patients with a high risk of early progression of disease within 24 months after first-line treatment.

The primary strength of our study is the novel finding that 18F-FDG PET/CT parameters such as TVSR and medPCD have a high prognostic value for POD24 in FL patients. Thus, we established a prognostic stratification model based on three features (TMTV, TVSR and medPCD) and the creation of four different risk groups. The results indicates that PET/CT-derived features might be helpful in the prognostic evaluation and treatment personalization of FL patients. These three PET parameters represent complementary and distinct aspects of the hematology malignancy, which may explain their additional prognostic power.

Considering the parameters separately, TMTV, as in our observations, has already been linked to an unfavorable prognostic in high tumor burden FL, regardless of the segmentation method used but also for many, if not all, types of lymphoma31,34. The optimal TMTV cut-off found for PFS in our study was 533.5 cm3 (AUC = 0.63), similar to the 510 cm3 (AUC = 0.7) found by Meignan et al.31 and Cottereau et al.33 for PFS, who also used a 41% SUVmax threshold (median TMTV in our study was 606 vs 297 cm3). However, the optimal TMTV cut-off in our results relative to POD24 was 1195 cm3. This result could have been due to the high total tumor burden with GELF criteria of the patients included in our study (50% of patients with high burden tumor with a FLIPI score 3–5) as well as the differences in treatments.

TVSR is the ratio of two parameters: TMTV and TMTS and represents the tumor fragmentation. To illustrate this parameter, Fig. 6 gives the 2D representation of two patients with almost similar TMTV (1353 cm3 and 1346 cm3) but with different TVSR (5.1 mm and 9.2 mm respectively). The first patient, who had a more fragmented tumor, survived more than ten years after the beginning of treatment (OS: 129.45 months) while the other patient survived less than four years (OS: 45.34 months). In our study, a high value suggesting a massive tumor had a significantly worse prognosis. Our findings are consistent with the results previously published in DLBCL patients and reinforce the prognostic impact of the combination of TMTV and TVSR44.

Figure 6
figure 6

(Left) Example MIP image of patient with high TMTV (1353 cm3) and low TVSR (5.1 mm), in favor of a fragmented tumor. (Right) Example MIP image of patient with both high TMTV (1346 cm3) and high TVSR (9.2 mm), in favor of a massive tumor.

Complementary results are observed with medPCD which is the median distance between the centroid of the tumor and its periphery and describes tumor’s massiveness with an unfavorable prognosis in massive tumors. TMTV and medPCD may be linked to a worse response to treatment due to less tissue-penetration of anti-cancer drugs. Therefore, more intensive chemo-immunotherapy might be considered for FL patients with high baseline TVSR and medPCD.

High baseline TLG has recently been showed to be a strong prognostic factor in FL45,46,47. We also highlighted its prognostic value. However, we observed that this parameter was too highly correlated with the totality of the PET parameters analyzed and not enough discriminant, being probably at the crossroads of all parameters studied. In our opinion, it could be more interesting to combine three different and relatively uncorrelated PET parameters exploring different aspects of the multi-site tumor rather than one parameter. However, because of its "central" character, it is possible that this parameter has an interesting alternative value, especially to describe the disease in a more generic way than TMTV.

In addition, dissemination feature Dmax was not significantly linked with survival in our study (p = 0.05) contrary to data found in other lymphomas such as DLBCL37 or Hodgkin lymphoma55. Only a tendency was found, showing that this promising parameter may not be associated with survival in this particular lymphoma entity.

Regarding the segmentation method used in this study, The SUV41% method, recommended by the European Association of Nuclear Medicine (EANM) for delineating lesions in lymphoma studies, uses threshold for volume contouring, where voxels included in the lesion volume have an SUV of at least 41% of the hottest voxel in that lesion56. The main weakness of this method is the risk of an underestimation of true lesion volumes if the FDG uptake is very heterogeneous. Furthermore, it can overestimate the TMTV of lesions with low SUVmax value. Different thresholds can be applied depending on the type of tumor, the radiotracer used, and the specific clinical context. Therefore, an SUV threshold of 4 is sometimes used as a fixed cutoff point to differentiate between benign and malignant lesions57, as it is generally above the range for benign conditions. However, this higher threshold may not be as sensitive in detecting all metabolically active lesions, especially in diseases like follicular lymphoma where lesions can have variable metabolic activity, sometimes low. Thus, further studies are still needed to determine whether the reference method should change for follicular lymphoma.

However, in our cohort of more than a hundred patients, only two of them had a SUVmax < 6 and no significant difference was made for SUVmax between FL patients with high or low TMTV. Measurements in routine practice appeared limited owing to the multiplicity of segmentation methods and its time-consuming nature in daily practice. Modern softwares allow to obtain volume computation in a few seconds and only leaves the exclusion of non-pathological regions which have been erroneously selected by the software as a task to the physician to improve efficiency. As a result, TMTV, TVSR and medPCD measurements could now become possible in clinical routine, especially with the help of fully automated or even machine-learning platforms58.

It should be noted that most of the PET/CT-derived parameters analyzed in this study are geometrical parameters, describing shape. Contrary to the majority of radiomic textural features, these parameters are robust and less sensitive to differences in PET/CT devices or even reconstruction algorithms59,60,61. Therefore, harmonization of data extracted from different PET/CT scans are unnecessary given the nature of the parameters explored. Furthermore, the PET/CT parameters examined in this study are easily understandable from a biological point of view.

From a mathematical perspective, if lymphoma tumors were perfectly spherical, TVSR and medPCD should be perfectly correlated because they are radius dependent. If we observed a limited correlation between these two parameters (ρ = 0.74), it is precisely because lymphoma cannot be considered as spheres.

Concerning the survival, we chose not to study OS given the known prolonged OS in FL patients and the small number of events (Supplemental Data; Fig. 1). For this pathology, the POD24 is considered as a surrogate marker for OS in clinical trials62,63. Combination of TMTV, TVSR and medPCD at baseline may help physicians to anticipate POD24 + and to propose alternative, risk-adapted, treatment strategies in this high-risk population with unmet medical need in order to improve patient’s outcomes.

Finally, our study has some limitations, such as the single center retrospective nature, the lack of a validation cohort, the existence of potential selection bias and results may not be extrapolated to patients with low tumor burden according to GELF criteria. Consequently, large-scale prospective multi-center studies are worth performing to confirm our conclusions.

Conclusion

In conclusion, TMTV, representing the total tumor burden, TVSR, describing the tumor fragmentation and medPCD, illustrating the tumor massiveness, measured on baseline 18F-FDG PET/CT are strong prognostic factors in FL patients that require treatment.

Combination of TMTV, TVSR and medPCD is promising and has a synergistic effect. A prognostic scoring system consisting of these three PET-derived parameters could be useful to improve risk stratification at baseline imaging and help to identify a group of high-risk patients, which may benefit from more personalized treatment strategies.