Introduction

The intensity of the conditioning regimen given before allogeneic hematopoietic cell transplantation (allo-HCT) can vary substantially, determines acute regimen related toxicity and impacts transplant outcomes. The myeloablative conditioning (MAC) versus the reduced intensity conditioning (RIC) classification has set for the last two decades a global standard to indicate transplant conditioning intensity and proved a reliable approach for clinical decisions and registry analyses [1, 2]. As intensity represents a continuum and novel drugs and new conditioning regimens are now used, with some of them not being readily amenable to the RIC/MAC nomenclature [3,4,5,6,7], we recently developed a tool which provided finer stratification, better discriminating ability and more standardized assessment of the intensity of the preparative regimen [8]. Briefly, we assigned intensity weight scores for frequently used components in the conditioning regimen, we used their sum to generate the transplant conditioning intensity (TCI) score, and we built a discrete 3-category stratification TCI index which was tested on a discovery cohort of 8255 patients with acute myeloid leukemia (AML) allografted between 2005 and 2017. TCI group assignment (low, intermediate, high) was the most important determinant of day (d) 100 and d180 early non-relapse mortality (NRM) and was very effective in predicting 2-year NRM and relapse (REL), independently from other established prognostic factors. The internal validity of the TCI model was assessed using a bootstrapping technique, however, a formal validation conducted in a separate and more contemporary patient population was lacking, hence the current validation study. Using data reported to the European Society for Blood and Marrow Transplantation (EBMT) registry, we included transplant recipients meeting inclusion criteria from the discovery study but who were allografted in a more recent period (2018 to June 2021), we assigned them to a TCI category (low, intermediate, high) according to the calculated TCI score ([1,2], [2.5–3.5], [4–6], respectively), as previously described [8], and examined the validity of the TCI category in predicting early NRM, 2-year NRM and REL.

Materials and methods

Study design and data collection

This is a retrospective, multicenter, registry-based analysis. Data were provided by the EBMT registry, to which >600 transplant centers submit annually anonymized data of all their consecutive HCTs according to specific guidelines and audited quality measures, following patient informed consent and according to the local regulations applicable at the time of transplantation. The Acute Leukemia Working Party (ALWP) of the EBMT approved the study in accordance with the guidelines of the Declaration of Helsinki. We included patients with AML between 55 and 75 years of age who had received an allogeneic HCT at first complete remission between January 2018 and June 2021. Other inclusion criteria included availability of detailed conditioning information, time from diagnosis to HCT < 18 months, use of peripheral blood stem cell (PBSC) or bone marrow (BM) grafts from a matched sibling or HLA-matched unrelated donor. Cases with a missing HCT-comorbidity index (HCT-CI) score were excluded (n = 464). The TCI score was calculated for every patient by adding the intensity weights for each component given any day before the graft infusion, as shown in Supplementary Table 1, and as previously described [8]. Assignment to the low, intermediate, or high TCI category was performed according to the TCI score of [1,2], [2.5–3.5] and [4–6], as previously described. For example, a regimen consisted of busulphan 12.8 mg/kg iv (3 points) and fludarabine 120 mg/m2 (0.5 points) has a TCI score of 3.5 and is assigned as an intermediate TCI regimen, whereas when the same dose busulphan is combined with cyclophosphamide 120 mg/kg as in the classical BuCy protocol the TCI score is 4 (high TCI regimen). Data sharing is available through the ALWP office (myriam.labopin@upmc.fr).

Endpoints and statistical analysis

The primary endpoint for estimating the impact of TCI was early NRM measured at d100 and d180 from the time of stem cell infusion. Secondary endpoints included NRM and REL incidence at 2 years. NRM was defined as death without evidence of REL. Relapse incidence and NRM were calculated using cumulative incidence curves in a competing risk setting. Overall survival (OS) defined as time to death from any cause, and leukemia-free survival (LFS) defined as time being alive without evidence of REL, were also reported and were calculated from time of transplant using the Kaplan–Meier estimate. Univariate analyses for NRM and REL were performed using Gray’s test. Univariate comparisons between TCI groups were performed using the Chi-squared or Fischer’s exact test for categorical variables and the Kruskal-Wallis test for continuous variables. Multivariate analysis was performed using a Cox proportional-hazards model which included variables differing significantly between the groups, factors known to be associated with outcomes, plus a center frailty effect to take account of the heterogeneity across centers, as previously reported [9]. The results were expressed as the hazard ratios (HR) with 95% confidence interval (CI). All tests were two-sided with the type 1 error rate fixed at 0.05. Statistical analyses were performed with SPSS 27.0 (SPSS Inc., Chicago, IL, USA) and R 4.1.1 (R Development Core Team, Vienna, Austria, URL: https://www.R-project.org/).

Results

Characteristics of the validation cohort

The validation cohort comprised 4060 adult patients with AML who were transplanted in first complete remission in the most recent period (median year 2019, range 2018–2021). In contrast to the discovery cohort which included patients between 45 and 65 years of age (median 55.6 years), patients in this validation dataset were one decade older (median 63.4 years, range 55–75). In total, 48 different conditioning regimens were used (Supplementary Table 2). Baseline characteristics are shown by TCI group in Table 1. In this validation cohort, 1934 (48%) and 1948 (48%) patients were assigned to the low, and intermediate TCI group, respectively, while a high TCI was less prevalent (n = 178, 4% of patients). As expected, there was an inverse relationship between age and TCI, with a median age of 65 years (interquartile range [IQR], 61.3–68.4), 62 years (IQR, 58.8–65.9) and 59 years (IQR, 56.8–63.3) for the low, intermediate, and high TCI groups, respectively (p < 0.0001). About 18–33% of patients among TCI groups had a low (≤80%) Karnofsky Performance Score (KPS) and/or high (≥3) HCT-CI, with patients in the low TCI category more likely to have a lower KPS ≤ 80% and a higher HCT-CI ≥ 3 (p < 0.0001). Except for the more frequent use of matched sibling donors in the high TCI cohort (p < 0.0001), other characteristics were distributed equally between the 3 TCI groups. The most often used immunosuppressive drug combination for graft versus host disease (GvHD) prophylaxis was cyclosporine/mycophenolate mofetil (34.8%, 31.3% and 31.6%) or cyclosporine/methotrexate (34.9%, 32.5% and 39%), whereas post-transplant cyclophosphamide (PTCY) was used in 8.8%, 11.3% and 13.9% of TCI low, intermediate, and high groups, respectively (Table 1). The median follow-up of survivors was 22.3 months (IQR, 20.8–23.2). The outcomes for the entire population were as follows: cumulative incidence of d100 NRM was 6.2% (95% CI 5.5–7), of d180 NRM was 10.2% (95% CI 9.3–11.2), of 2-year NRM was 19.2% (95% CI 17.8–20.5), of REL was 25.7% (95% CI 24.2–27.3), of acute graft-versus-host disease (GVHD) grades II-IV was 22.1% (95% CI 20.8–23.4), of acute GVHD grades was III-IV 7.6% (95% CI 6.8–8.5), of chronic GVHD was 31.7% (95% CI 30.1–33.4) and of extensive chronic GVHD was 14.2% (95% CI 13–15.5). The estimate of LFS and OS at 2 years was 55.1% (95% CI 53.3–56.9) and 62.2% (95% CI 60.4–63.9), respectively. Graft failure was low and did not differ between TCI groups (p = 0.34), results not shown. Causes of death are given in Supplementary Table 3 with original disease the main cause in each TCI category.

Table 1 Population baseline characteristics of validation cohort.

Validation of TCI for NRM

The risk of NRM in the validation group followed the same pattern as in the discovery cohort, with a monotonic increase in NRM rate from lower to higher TCI (Fig. 1). In the unadjusted comparison, the TCI provided a highly significant risk stratification for d100, d180 and 2-year NRM, with the cumulative incidences being 4.5% (95% CI, 3.7–5.5), 8.2% (95% CI, 7–9.6) and 16.5% (95% CI, 14.7–18.5) in the low TCI group, rising to 7.3% (95% CI, 6.2–8.5), 11.6% (95% CI, 10.1–13.1) and 21.4% (95% CI, 19.4–23.5) in the intermediate TCI group, and further increasing to 12.4% (95% CI, 8.1–17.8), 17% (95% CI, 11.8–23.1) and 23.5% (95% CI, 17.2–30.5) in the high TCI group, respectively (p < 0.0001 for all comparisons) (Table 2). In a multivariable model including baseline characteristics known to impact NRM such as age, KPS, and HCT-CI score (complete case analysis n = 3791), TCI group assignment was found to be strongly and independently associated with NRM (Table 3). Relative to the low TCI group, the HRs for d100, d180 and 2-year NRM in the intermediate TCI group were 1.95 (95% CI 1.42–2.69, p < 0.0001), 1.62 (95% CI 1.26–2.08, p < 0.0001) and 1.44 (95% CI 1.20–1.74, p < 0.0001), and in the high TCI group were 4.00 (95% CI 2.2–7.28, p < 0.0001), 2.86 (95% CI 1.76–4.64, p < 0.0001) and 1.87 (95% CI 1.25–2.80, p = 0.003), respectively. In a pairwise comparison between high and intermediate TCI groups, high TCI was associated with an increased risk for early NRM (d100 NRM: HR 2.05; 95% CI 1.17–3.57, p = 0.012; 180 NRM: HR 1.76; 95% CI 1.12–2.78, p = 0.015) but not for 2-year NRM (p = 0.19). Besides TCI category, other independent prognostic factors for NRM were incremental age, HCT-CI score ≥3, KPS score ≤80%, unrelated donor (early NRM) and a female to male transplantation (2-year NRM) (Table 3).

Fig. 1: NRM by TCI category.
figure 1

Non-relapse mortality (NRM) for entire validation cohort stratified by Transplant Conditioning Intensity (TCI) category (low, intermediate, high).

Table 2 Univariate analysis for early (d100 and d180) NRM, NRM, and REL according to TCI category.
Table 3 Multivariable analysis for early NRM, NRM, REL.

Validation of TCI for REL

In univariate analysis, the REL rate was significantly higher in the low TCI group (29.7%, 95% CI 27.4–32.1) when compared to the intermediate (21.9%, 95% CI 19.8–24.0) and the high (25%, 95% CI 17.9–32.6) TCI group (p < 0.0001) (Fig. 2). By using the multivariable complete case analysis previously mentioned, TCI group was found to be an independent predictor for REL (Table 3). When compared with the low TCI group, the REL risk was significantly decreased in the intermediate TCI group (HR 0.66; 95% CI 0.57–0.78, p < 0.0001), however, we observed only a non-significant reduced REL risk trend in the recipients receiving high TCI regimens (HR 0.79; 95% CI 0.55–1.13, p = 0.20). REL was significantly influenced by adverse cytogenetics and the use of a bone marrow graft (Table 3). There were no significant associations between TCI group and LFS or OS (data not shown), except a borderline better OS for high versus low TCI (HR 1.35; 95% CI 1.01–1.81, p = 0.043).

Fig. 2: REL by TCI category.
figure 2

Relapse (REL) for entire validation cohort stratified by Transplant Conditioning Intensity (TCI) category (low, intermediate, high).

Discussion

To validate the original TCI, we used a cohort of more than four thousand patients transplanted in the most recent period (January 2018 to June 2021). Because allo-HCT has recently been increasingly administered to older patients and especially to those aged ≥65 years, we included in this more contemporary study patients who were one decade older (55–75 years of age) as compared to the discovery study (45–65 years) [10,11,12]. The chosen timeframe of the 3 most recent years is particularly useful since it includes the currently used conditioning regimens [13, 14]. In line with real-life data demonstrating a notable decrease in high dose MAC transplants over the last few years, our validation cohort included only 4% of patients being classified as high TCI, versus 21% of patients that fell into this category in the original study [15]. Taken together, this is a fully independent population and temporal validation study, reflecting present-day transplantation practice.

The TCI performed very well in this validation cohort. It stratified patients into 3 levels for early NRM, with near doubling the HR for early d100 and d180 NRM observed in each TCI group. TCI grouping provided also very strong stratification ability and independent prognostic information for 2-year NRM. The discriminative ability of TCI for NRM applies regardless of other established factors such as age, performance status (KPS), organ impairment (HCT-CI), donor type, and graft source. Of note, TCI proved to be the most important determinant of early NRM, suggesting that TCI not only stratifies conditioning intensity very efficiently but also intensity of the preparative regimen is the main driver of early NRM. Taken together, TCI could stratify the 48 different conditioning regimens used in this cohort particularly finely, based on their impact on transplant-related death, and emphasizes once again the utility of the TCI index.

Compared to TCI low regimens the use of a regimen with an intermediate TCI score was highly correlated with decreased REL, reflecting another inherently linked effect of the intensity of the preparative regimen. We found only a trend towards reduced REL risk between low and high TCI groups (HR 0.79; 95% CI 0.55–1.13, p = 0.20), which runs somewhat contrary to the common assumption that dose intensification may reduce relapse [16, 17]. The most plausible explanation for this finding is that the small number of recipients in the high TCI group (n = 178) undermined the statistical power to detect a significant effect. Moreover, opposite to the detected monotonic increase of NRM from lower to higher TCI, we found neither a significant difference nor a trend towards a reduced REL risk in the direct comparison of high versus intermediate TCI groups. Though this could again be attributed to the small sample size of the high TCI group and the low statistical power to detect differences, another explanation is that in fact the intermediate TCI group captured the so called “reduced toxicity conditioning” regimens that were specifically designed to minimize NRM without affecting REL [18]. Notably, as in the original dataset, the intermediate TCI category included in nearly equal proportion, RIC (56.4%) and MAC (43.6%) regimens. Thus, we confirm once again that although TCI was built upon the scaffolds of the MAC/RIC definitions, it represents a distinct and novel classification scheme which accounts for regimens that were not readily amenable to the RIC/MAC approach.

Transplantation is a multifactorial process, and it is a challenge to predict allogeneic HCT outcomes [18]. To account for the heterogeneity of patient and disease-specific factors, different prognostic scores for NMR (e.g. HCT-CI) or relapse risk (e.g. Disease Risk Index) have been established and constantly refined [19,20,21,22]. Likewise, the here validated TCI reflects the heterogeneity of the preparative regimens and is meant to capture in a more standardized and more precise manner their broad spectrum and to be used for risk stratification. TCI still provides valuable prognostic information for HCT outcomes but is not meant to be used for suggesting a conditioning regimen for any group of patients. Not surprisingly, the strongest prognostic information of TCI was for NRM and to a lesser extent for relapse, whereas there was no association of TCI grouping with LFS and OS. This reflects the contradictory effect of conditioning intensity in NRM and relapse and the strong likelihood of selection bias in the choice of conditioning in a retrospective study like ours. The current TCI does not account for PTCY given for GvHD prevention (used in 10.2% of patients, Table 1), which is associated with toxicities such as delayed engraftment, cardiac events, and hemorrhagic cystitis [23]. Future studies may refine and update the TCI by including to the prototype model presented here the PTCY and/or other conditioning components (e.g., antisera, novel drugs).

In summary, our study confirms in contemporary treated patients that TCI reflects the preparative regimen related morbidity, but also the anti-leukemic efficacy, highly satisfactorily and across other established prognostic factors. Though the generalizability of the model must be proven across different diseases and disease stages (except AML CR-1), ages (e.g., younger adults), and donors (e.g. mismatched), TCI index has all the features to support clinicians in their everyday clinical practice and to be instrumental in correlative analyses and comparative studies. We anticipate TCI to be used as a well-defined, easy calculated and reproducible tool to define and measure intensity of the preparative regimen before allo-HCT.