Introduction

In recent decades, the number of allogeneic hematopoietic cell transplantations (alloHCT) has risen [1] and due to the use of reduced intensity conditioning protocols, alloHCT can be offered to older patients and those with more comorbidities [2, 3]. Pre-transplant assessments enable an individual risk-benefit ratio, as all-cause mortality still amounts to 46% [3]. In this context, calculating individual risk scores to predict overall survival (OS) and non-relapse mortality (NRM) are used prior to alloHCT. Commonly used scores are [4, 5] the European Blood and Bone Marrow Transplantation (EBMT) score [6], the Hematopoietic Cell Transplantation-Comorbidity-Index (HCT-CI) [7] the revised Disease Risk Index (rDRI) [8] and the revised Pre-transplant Assessment of Mortality score (rPAM) [9]. As none of these scores covers all important variables, the concurrent use of various scores is recommended to include various aspects of risk prediction. However, according to an analysis by Shouval et al. [5] the forecast reliability of those scores ranges from moderate to random. To specify the individual risk-benefit ratio and provide an additional basis for medical decisions, e.g., the intensity of conditioning therapy, patients’ organ functions are usually determined by medical tests. The body mass index (BMI) is also calculated. Although muscle weakness and restrictions of the cardiorespiratory system are common therapy related side effects [10,11,12,13] that may cause mortality [14, 15], so far the level of physical performance is not studied in this context. Stress tests could therefore help us estimate the interaction of different organ’s function and allow more accurate assessment of patients’ physical performance. A cardiopulmonary exercise test (CPET) covers pulmonary, cardiovascular, blood circulation, and muscle systems’ interaction, thereby providing a comprehensive evaluation of a patient’s physical performance and organ function under stress. Peak oxygen intake (VO2peak) as a main CPET outcome is known to predict mortality in healthy individuals [16] and in breast or colon cancer survivors [17, 18]. There are initial indications that pre-alloHCT VO2peak could also provide information for predicting OS and NRM [19, 20]. But these results have not yet been confirmed or compared to existing risk calculation methods. CPET outcomes are also influenced by patients’ body composition parameters [21, 22]; the phase angle in particular also predicts clinical outcomes [23, 24]. However, the validity of these individual risk prediction methods—the calculation and use of risk scores, examining physical functioning and body composition parameters—have not been concurrently verified within one sample.

Aim of the present analysis was therefore to examine the explanatory power of pre-alloHCT VO2peak and body composition to predict OS and NRM 2 years after transplantation. We also compared this risk prediction to those of commonly-used risk scores. We hypothesized that assessing VO2peak and body composition would improve the present risk prediction methods and thus enable a more accurate post-alloHCT prognosis.

Materials and methods

Study design and patients

This survival analysis relied on the baseline data from a randomized controlled trial [25]. This study was designed to investigate the effects of whole body vibration training on patient’s physical performance during alloHCT. Examinations were performed before conditioning therapy, at hospital discharge and 180 days after alloHCT. Within a 17-month period, patients scheduled for alloHCT were consecutively recruited at the Department of Medicine I, University Medical Center Freiburg, Germany on the day of hospital admission. Included patients underwent study assessments prior to their first administration of conditioning therapy including CPET and body composition analysis in the course of baseline procedure. More information about the study design and inclusion and exclusion criteria can be found elsewhere [25]. For OS- and NRM-analysis, patients were followed for 2 years after alloHCT. Only patients with hematological malignancies were included in the survival analysis.

Cardiopulmonary exercise test

VO2peak was examined during CPET. Patients performed an incremental CPET on an electronically-braked cycle ergometer (ergoline ergoselect 1200, Ergoline GmbH, Bitz, Germany) in recumbent (40°) position during continuous monitoring of their electrocardiogram, heart rate, and blood pressure. Gas exchange and ventilation were recorded continuously via breath-by-breath gas analysis (MetaLyzer 3B-R3, Cortex Biophysik GmbH, Leipzig, Germany). Patients had to perform until exhaustion starting at 20 W with a 10-W increase every minute. Exhaustion was determined by respiratory exchange ratio (RER) ≥ 1.1 or a heart rate ≥85% of the age predicted maximum [26, 27]. The RER describes the ratio of carbon dioxide produced and oxygen consumed. It is an indicator of metabolic fuel and therefore rates patients metabolic effort [21]. VO2peak describes the maximum amount of oxygen uptake during CPET, and is influenced by pulmonary, cardiovascular, hematopoietic, neuropsychological and muscular functioning and interaction [21, 28]. We also documented the data on maximum reached watt (Pmax) and received perception exertion (RPE). RPE also reflects patients’ effort rated subjectively on a scale from 6 to 20 immediately upon discontinuing CPET [29].

Body composition

BMI was calculated as body weight (kg) divided by squared height (m²). Bioelectrical impedance analysis was used to determine body fat mass (FM) (%), lean body mass (LBM) (%), body cell mass (BCM) (%), and phase angle (°) (BIA, Nutriguard-S, Data Input, Pöcking, Germany). This measurement can reveal cell membrane function, and thus evaluates body composition in more detail [30]. We applied sex-, age-, and BMI-specific references [31] to calculate the standardized phase angle according to [23]: \(\frac{{{\mathrm{phase}}\;{\mathrm{angle}} - {\mathrm{references}}}}{{{\mathrm{standard}}\;{\mathrm{deviation}}\;{\mathrm{references}}}}\). To ensure better validity, all measurements were taken in the morning after at least 20 min rest, in lying position and before breakfast [32].

Risk scores and clinical parameters

We calculated the EBMT score and HCT-CI prospectively before alloHCT, whereas the rPAM score and the rDRI were calculated retrospectively for this analysis. Information on the contents of the individual scores, their range and the classification of the risk groups can be found in Table 1. For our analysis we have combined the low and intermediate risk group of the rDRI. Information on OS and NRM was collected during medical follow-up examinations. Medical characteristics and history were retrospectively extracted from medical reports.

Table 1 Pre-alloHCT risk scores.

Quality of life

We used the subscales “global quality of life (QoL)” and “physical function” of the EORTC QLQ-C30-questionnaire to supplement description of patients’ physical functioning and for survival analysis.

Statistical analyses

Patients’ characteristics and physical conditioning were described using absolute and relative frequencies or median (range) respectively. Since all patient data were observed over exactly 24 months, we evaluated differences between 2-year-survivors (2yrS) and 2-year-decedents (2yrD) via nonparametric tests. Relapse was considered as a competing risk. Patients still alive after two years of observation were censored at this time. The Kaplan–Meier method was used to estimate the probability of survival. Cox’s method was used for univariable and multivariable analyses on OS. Analyses on NRM were done by Fine & Gray model. Results are presented as hazard ratios (HR) for Cox’s model and sub-distribution HR (SHR) for the Fine & Gray model, together with two sided 95% confidence intervals (CI). For continuous prognostic variables, the HR/SHR describes the risk increase for a one unit (or unit specified in the table) increase in the prognostic variable. The variable selection for multivariable models is based on profound theoretical considerations, including univariable results and possible correlations between prognostic factors to reduce multicollinearity. Gender, remission, conditioning, the EBMT score, rDRI and HCT-CI were examined as categorical variables. The Karnofsky Performance Scale (KPS) was categorized in below and above or equal 90%. All other parameters were calculated as continuous variables. The proportional hazards assumption was tested using graphic methods. Statistical analyses were conducted using the IBM SPSS Version 25 (SPSS Inc., Chicago, Illinois, USA) and SAS 9.4 (SAS Institute Inc., Cary, NC, USA) software.

Results

We enrolled a sample of 71 patients in this study [25]. Data on 59 patients were used for survival analysis: four patients had to be excluded due to a nonmalignant disease, two died before alloHCT, medical data were missing on one patient, and five patients were unable to perform CPET (four patients due to inappropriate blood values, one due to a respiratory infection). All patients received peripheral blood stem cells from an unrelated donor. Patients’ characteristics is presented in Table 2.

Table 2 Patient characteristics.

Clinical outcomes

During the observation period (24 months) 16 of 59 patients died. The median survival time of deceased patients was 3.3 months (0.6–14 months). Forty-three patients were still alive and thus censored after 2 years. Median OS (50% of patients died) was not reached (Fig. 1). Univariable and multivariable analyses showed that VO2peak was not associated with either OS (Table 3) or NRM (Table 4). Same results are shown for three of four risk scores, age, KPS, BMI, phase angle, and total energy consumption. However, the rDRI, FM, LBM, and BCM revealed associations with OS and NRM in univariable analyses (rDRI: high risk HR 0.184, 95% CI 0.062–0.544, p = 0.002; very high risk HR 0.078, 95% CI 0.015–0.395, p = 0.002, SHR 0.076, 95% CI 0.009–0.641, p = 0.018; FM: HR 0.974, 95% CI 0.954–0.995, p = 0.014; SHR 0.974, 95% CI 0.965–0.983, p ≤ 0.001; LBM: HR 0.977, 95% CI 0.956–0.997, p = 0.026; SHR 0.976, 95% CI 0.967–0.985, p ≤ 0.001; BCM: HR 0.975, 95% CI 0.955–0.996, p = 0.021, SHR 0.975, 95% CI 0.966–0.984, p ≤ 0.001). Remission was only associated with OS (HR 0.315, 95% CI 0.101–0.979, p = 0.049). Multivariable models confirmed the significant association between rDRI and OS (very high risk HR 18.007, 95% CI 3.878–83.602, p ≤ 0.001) and NRM (high risk SHR 0.088, 95% CI 0.027–0.281, p ≤ 0.001; very high risk SHR 0.018, 95% CI 0.003–0.098, p ≤ 0.001). Patients in the very high risk group had a 18 times higher mortality risk than those of the low/ intermediate risk group. The male sex also increases overall mortality by a factor of six (HR 6.668, 95% CI, 1.440–30.882, p = 0.015).

Fig. 1: Kaplan–Meier curve presenting overall survival of 59 analyzed patients.
figure 1

Additionally patients at risk are presented in a two-month interval.

Table 3 Cox regression analyses—association of pre-transplant variables with overall survival.
Table 4 Fine & Gray model analyses—association of pre-transplant variables with NRM.

Physical conditioning

Most of our patients (88%) performed CPET till exhaustion quantified by RER ≥ 1.1 or a heart rate ≥ 85% of the age predicted maximum. We detected no difference between 2yrS and 2yrD across experienced and measured exhaustion—in the median, patients reported the level of effort as “very hard” and attained 89% of the maximum predicted heart rate. The VO2peak and maximum reached watt were also comparable. In median all patients reached 120 W (60–280 W) and a VO2peak in relation to body weight of 20 ml/min/kg (12.1–44.8 ml/min/kg). Differences in body composition were only noted in the amount of FM with 2yrS showing a greater FM (median 27% (9–42%); 24% (9–32%), p = 0.026). Energy consumption per week and KPS were also comparable (energy consumption/week: 2yrsS 9126 kcal, range 2771–27881 kcal; 2yrD 8060 kcal, range 1965–19488 kcal, p = 0.343; KPS: 2yrS and 2rsD 90%, range 70–100%, p = 0.375). Detailed descriptions of physical conditioning are found in Table 5.

Table 5 Physical conditioning.

Risk scores

Risk assessments relying on the EBMT score and the rDRI revealed most patients (61% and 54%) carried an intermediate risk. In turn, using the HCT-CI resulted in the same number of patients having a low, intermediate or high risk. There were no differences between 2yrS and 2yrD in their allocation to different risk groups (EBMT score p = 0.768; HCT-CI p = 0.550, rDRI p = 0.139) or the median rPAM score (p = 0.192).

Discussion

The aim of the present analysis was to examine the explanatory power of pre-alloHCT VO2peak to predict OS and NRM two years after transplantation and to compare this risk prediction to that of commonly-used risk scores. We hypothesized that assessing VO2peak would improve the present risk prediction methods and thus precise post-alloHCT prognosis. Contrary to our assumptions, no association of VO2peak to OS or NRM was found. Only the male gender and the rDRI were associated with OS and NRM two years after transplantation. In univariable analyses, body composition showed an association with OS and NRM, but this could not be confirmed in multivariable models.

Our results stand in contrast to Wood et al. [20] and Kelsey et al. [19], who showed that OS and NRM are partly predictable via the pre-alloHCT VO2peak. However, comparing their patient groups to ours reveals remarkable differences in the numbers of deceased patients. While our mortality rate in this study was only 27% 24 months after transplantation (24 month), Wood et al. [20] reported 37% after 100 days (~3 month) and Kelsey et al. [19] 52% after 25 months. We speculate that our study population was in better health and more physically fit. Compared to Wood el al. [20] our patients suffered from fewer comorbidities (HCT-CI ≥ 3 of 37% vs. 50%) and demonstrated a higher physical performance level, i.e., greater values in maximum reached power during CPET (120 W vs. 47.5 W). However, compared to healthy individuals’ reference values, our study population’s VO2peak was below the 5% percentile [27, 33] indicating that our patients were not especially fit. Furthermore, Kelsey et al. [19] reported even higher VO2peak values and maximum workload during CPET. Nevertheless, CPET-protocol differences can lead to diverging results [34, 35], which can hamper the comparability of patients’ physical capacity.

The association of rDRI with OS we found reflects the assumption that disease-specific determinants can provide best prognosis [36, 37]. Accordingly, the risk of NRM predicted by the rDRI is lower. In general, pre-transplant disease features are more likely to predict relapse mortality, while patient-specific characteristics possess greater validity for transplant-related mortality [4]. A high rDRI risk group means a high tumor burden and therefore patients’ organism is already weakened before alloHCT. This increases the mortality risk [38] which is also indicated by the association of remission status and OS we found in univariable analysis. Shouval et al. [5] described only the random to moderate reliability of pre-transplant risk scores. This is in line with our findings regarding to the remaining scores. The cohorts used for the development of the scores significantly influence the explanatory power. This leads to a better prediction of the rPAM score and HCT-CI for patients undergoing myeloablative conditioning therapy in contrast to patients receiving reduced conditioning protocols [5]. Since 75% of our sample received reduced intensity conditioning, this could be a reason for the missing association between those risk scores and OS as well as NRM.

An explanation why male patients had a higher mortality risk cannot be given. So far it could only be shown that male patients are more affected by complications when they receive female donor cells [39]. But this donor-recipient constellation only existed in four cases.

In univariable analyses higher lean body mass, body cell mass, and fat mass were shown to have an association with reduced mortality risk 2 years after transplantation. Since this correlations could not be confirmed in multivariable analyses, other variables than body composition seem to be decisive for survival of our sample. Nevertheless, in other investigations the pre-transplant loss of body weight was associated with a higher NRM and shorter OS [40]. Furthermore, a low proportion of fat mass in conjunction with low muscle mass before intense medical treatment can raise the mortality risk for cancer patients [41, 42]. Thus, having the appropriate body composition and standard BMI are essential to resisting the treatment-induced loss of body mass, and to withstanding drug toxicity during alloHCT [23, 24]. We assume that not adjusting BMI for gender and age and categorizing values for analysis could have led to the missing association [23]. In addition, infusions may have influenced the measurement of body composition via bioelectrical impedance analysis [43].

For a comprehensive explanation of our results, aspects of statistical analysis must also be taken into account. We provided a suitable sample size (N = 59) and observed 16 events (deaths). As the statistical power crucially depends on the number of events, only large prognostic-factor effects are detectable as revealing statistical significance. Thus, the low number of events is a potential weakness in our analysis. However, as comparable investigations [19, 20] did not report the quality of their statistical analysis, it is difficult to classify both their and our results.

The present survival analysis is limited by the fact that our data were analyzed retrospectively and retrieved from a randomized controlled trial that enforced stringent exclusion criteria. That randomized controlled trial was not designed to predict clinical outcomes. Furthermore, collecting data retrospectively can lead to failures to capture relevant data that might not have been recorded.

To best of our knowledge, we are the first to investigate the predictive power of physical conditioning and commonly-used risk scores in relation to survival time within one sample. We could only confirm that the rDRI and male gender predict survival after alloHCT. However, unlike physical performance, these variables cannot be positively influenced and do not reflect all individual preconditions of each patient. Considering the growing numbers of increasingly elderly and frail alloHCT recipients, our results underline the need for further research to identify an all-encompassing and meaningful examination method for predicting patients’ pre-alloHCT risk.