Introduction

Plerixafor (Mozobil®) in combination with granulocyte-colony stimulating factor (G-CSF) is approved by the European Medicines Agency to enhance the mobilization of hematopoietic stem cells (HSCs) to the peripheral blood for collection and subsequent autologous transplantation in patients with lymphoma and multiple myeloma (MM) who are recognized as poor mobilizers of HSCs. Plerixafor is a selective, reversible inhibitor of the chemokine receptor 4 with a (C-X-C) motif (CXCR4) and has a unique mechanism of action compared with other HSC mobilizing agents [1, 2]. The C-X-C motif chemokine 12/CXCR4 receptor interaction is an integral part of the retention of HSCs in the bone marrow and inhibition of this interaction by plerixafor temporarily mobilizes HSCs from the bone marrow to the peripheral blood [3, 4].

There is a theoretical risk of tumor cell mobilization with any stem cell mobilization method for hematopoietic stem cell transplantation (HSCT). Therefore, the European Union mandated analysis of the European Society for Blood and Marrow Transplantation (EBMT) data registry to evaluate the long-term outcomes of patients with MM who had received plerixafor.

The long-term clinical outcomes collected in this postapproval analysis of the EBMT registry included an evaluation of progression free survival (PFS), overall survival (OS), and the cumulative incidence of relapse (CIR) in patients with MM who had undergone mobilization, collection, and transplantation of autologous blood progenitors. The analysis evaluated patients who received plerixafor for stem cell mobilization and HSCT and compared their outcomes with those of patients who had received other mobilization regimens.

Methods

Study design

This was an international, multicenter, noninterventional registry study with patient follow-up of 3.5–7.5 years to evaluate the long-term outcomes of MM patients who received plerixafor for stem cell mobilization and who completed their first autologous HSCT between 2008 and 2012 (ClinicalTrials.gov number NCT01362972). The analysis included a prospectively defined cohort of MM patients with data reported retrospectively to the EBMT. Patients from Austria, Belgium, Bulgaria, Czech Republic, Finland, France, Germany, Greece, Hungary, Ireland, Israel, Italy, Netherlands, Poland, Romania, Spain, Sweden, Switzerland, and the United Kingdom were included in the study. Eligibility included all patients’ ≥18 years of age from the EBMT registry with a diagnosis of MM who were to receive an autologous HSCT and were transplanted. This was a noninferiority study. The noninferiority margin was assigned as a 30% increase in PFS and OS corresponding to a hazard ratio (HR) upper limit of 1.3. No lower limit was set. Summary curves for CIR were planned. Due to the observational nature of the study, no formal statistical hypothesis testing was planned with adequate power and the type I error control.

The study was conducted in accordance with the Declaration of Helsinki and the International Conference on Harmonization Guidelines for Good Clinical Practice. For all sites, approval of the protocol was obtained from the governmental authorities and Institutional Review Boards.

Poor mobilizers

Predicted poor mobilizers were patients who had received prior irradiation to marrow-bearing areas or who had high exposure to marrow-damaging chemotherapy. Proven poor mobilizers were patients who, in previous mobilization attempts, had failed to mobilize sufficient CD34+ cells to the peripheral blood to proceed to apheresis or transplantation, or who, in the current mobilization, had failed to achieve a sufficient rise in peripheral blood CD34+ at the predicted time of peak mobilization.

Data collection

The data were entered, managed, and maintained in a central database with internet access. Data were retrieved from variables identified on the EBMT Medical A and Medical B forms and Medical C form for poor mobilization data.

Outcomes

The primary efficacy outcomes were OS, PFS, and CIR. Key secondary efficacy outcomes were hematological recovery (time to absolute neutrophil counts of ≥0.5 × 106/l and platelet reconstitution of ≥50 × 109/l). The key safety outcome was transplant complications occurring from the day of transplantation until 100 days post transplant.

The following mobilization regimens were compared:

  1. (1)

    G-CSF plus plerixafor (G-CSF + P) compared with G-CSF alone.

  2. (2)

    G-CSF + P compared with G-CSF plus chemotherapy (G-CSF + C).

  3. (3)

    G-CSF plus plerixafor plus chemotherapy (G-CSF + P + C) compared with G-CSF + C.

Graft failure was defined as no engraftment (neutrophils never reached ≥0.5 × 109 cells/l) or graft loss (neutrophils reached ≥0.5 × 109 cells/l but subsequently decreased to a lower level of cells until additional engraftment treatment was given).

Statistical analyses

In nonrandomized clinical studies, differences in baseline characteristics between treatment groups may influence outcomes, leading to bias [5]. The propensity score, defined as the individual probability of receiving a treatment based on the baseline characteristics of the patient, is intended to reduce bias when assessing outcomes between two treatments [5].

Propensity score method was used to identify study comparison groups that were balanced with respect to baseline characteristics, including, demographics, MM disease type, disease characteristics and staging, bone marrow involvement, prior treatment characteristics, and disease status [5]. The baseline variables and patient demographics used for propensity score matching are shown in Table 1. Only patients who were identified as a “Proven or Predicted Poor Mobilizer” were included in the analysis.

Table 1 Patient demographics used for propensity score matching in the matched comparison groups

A single imputation approach was implemented to create complete data sets for analyses. Propensity scores were then fitted using logistic regression models. Matches for plerixafor patients were identified from the nonplerixafor groups based on the estimated propensity scores. Matching was performed without replacement. Model success was based on whether balance between the plerixafor and the control groups matched samples was achieved. In the original design, the plan was to have two nonplerixafor patients identified for each plerixafor patient. However, it was not possible as there were many more plerixafor patients than nonplerixafor patients in patients who were predicted or proven poor mobilizers. In particular, in the group of patients who did not receive chemotherapy, matching was performed for two plerixafor patients with one nonplerixafor patient.

Following the propensity score analysis, the outcomes for each mobilization treatment group were analyzed for comparable groups. Cox proportional hazards model with covariates was used for OS and PFS. The 95% confidence intervals (CI) and HR for the effect of treatment were calculated. Survival curves were developed for each treatment group using nonparametric Kaplan–Meier estimates [6]. A competing risk model was developed for CIR; death without prior progression/relapse was treated as a competing event. The 95% CI and cumulative incidence at each year post transplantation were estimated.

Due to the observational nature of the study, sample size was not calculated based on power calculations. It has been estimated using the following assumptions: that 85% of transplanted MM patients would receive G-CSF + C and 15% would receive G-CSF alone; that 10% of transplanted patients with each regimen would be treated with plerixafor; and that 70% of plerixafor patients would be matched at a ratio of 1:2 plerixafor to comparator. It was estimated that 4600 patients would be included in the study over a 5-year period and would include: 100 patients in the G-CSF alone group, 540 patients in the G-CSF + C group, 50 patients in the G-CSF + P group, and 270 patients in the G-CSF + P + C group. The predicted number of events for the outcome analysis was 101 for the G-CSF + P compared with G-CSF alone, 101 for the G-CSF + P compared with G-CSF + C, and 546 for the G-CSF + P + C compared with the G-CSF + C group.

Results

Participants and demographics

Overall, 3582 MM patients were screened and, of these, 3566 patients met the study eligibility criteria. These included 141 patients treated with G-CSF + P, 119 patients treated with G-CSF + P + C, 585 patients treated with G-CSF alone, and 2721 patients treated with G-CSF + C (Fig. 1).

Fig. 1
figure 1

Patient eligibility and treatment

Baseline demographic and disease history data used in propensity scoring are summarized in Table 1. The groups were well matched on age and sex and were comparable for Durie and Salmon disease staging, with the majority of patients in each group assessed as Stage III (IIIA or IIIB). At mobilization, the proportion of patients with bone marrow involvement ranged from 12.9 to 46.2% across the groups (Table 1).

The propensity scoring of poor mobilizers identified matched groups for the comparative analysis (Table 2). After propensity scoring, 77 versus 41 patients were matched in the G-CSF + P versus G-CSF alone cohort, 129 versus 129 in the G-CSF + P versus G-CSF + C cohort, and 117 versus 117 in the G-CSF + P + C versus G-CSF + C cohort. The three groups treated with plerixafor had greater proportions of patients, prior to administration of plerixafor, who failed to mobilize sufficient CD34+ cells at the predicted peak mobilization time compared with the comparison groups (Table 2).

Table 2 Mobilization characteristics for the matched comparison groups

Primary endpoints

Progression free survival

The estimated 3-year PFS for the G-CSF + P group was 0.27 [95% CI: 0.17, 0.38] versus 0.43 [0.27, 0.58] for G-CSF alone (comparison 1, Table 3); G-CSF + P group was 0.27 [95% CI: 0.19, 0.36] versus 0.32 [95% CI: 0.24, 0.41] for the G-CSF + C group (comparison 2), and G-CSF + P + C group was 0.29 [95% CI:0.21, 0.38] versus 0.34 [95% CI: 0.25, 0.43] for the G-CSF + C group (comparison 3). Due to the small sample size, the 95% confidence limits of the HRs for PFS and OS were wide and the upper limits of the 95% CI were >1.3, based on prespecified boundaries; the plerixafor-containing groups did not therefore fulfill the criteria for noninferiority compared with the comparator groups. Kaplan–Meier survival curves showed that PFS in the plerixafor groups was generally lower (Fig. 2).

Table 3 Primary outcomes: progression free survival, overall survival and cumulative incidence of relapse for each of the comparator groups
Fig. 2
figure 2

Progression (event) free survival for each of the comparison groups, G-CSF plus plerixafor versus G-CSF alone (comparison 1); G-CSF plus plerixafor versus G-CSF plus chemotherapy (comparison 2); G-CSF plus plerixafor plus chemotherapy versus G-CSF plus chemotherapy (comparison 3)

Overall survival

The results for OS are shown in Table 3. Kaplan–Meier survival curves showed that OS in the plerixafor groups was generally lower as time progressed (Fig. 3). As the upper limit of the HR was >1.3, based on predetermined boundaries, noninferiority of plerixafor was not demonstrated for any of the comparison groups.

Fig. 3
figure 3

Overall (event-free) survival for each of the comparison groups, G-CSF plus plerixafor versus G-CSF alone (comparison 1); G-CSF plus plerixafor versus G-CSF plus chemotherapy (comparison 2); G-CSF plus plerixafor plus chemotherapy versus G-CSF plus chemotherapy (comparison 3)

Cumulative incidence of relapse

A competing risk model was used to determine the CIR, which appeared slightly higher in the plerixafor groups compared with the comparator group (Fig. 4).

Fig. 4
figure 4

Cumulative incidence of relapse with death without progression/relapse as a competing risk for each of the comparison groups, G-CSF plus plerixafor versus G-CSF alone (comparison 1); G-CSF plus plerixafor versus G-CSF plus chemotherapy (comparison 2); G-CSF plus plerixafor plus chemotherapy versus G-CSF plus chemotherapy (comparison 3)

Secondary outcomes

Post transplantation

Adverse events occurring in more than one patient in any treatment group up to 100 days post first transplantation are shown in Table 4. Infections and infestations were the most common standard organ class complication in all plerixafor and comparator groups (Table 4).

Table 4 Adverse events occurring in more than one patient in any treatment group up to 100 days post first transplantation

Engraftment was reported for ≥95% of patients in the groups in each of the three paired comparisons. Collectively, the median number of days to achieve a neutrophil count of ≥0.5 × 109/l was 12 days, and a platelet count of ≥20 × 109/l was 13 days.

Discussion

This was a postapproval study in the European Union to monitor for recurrence or progression of myeloma as a surrogate marker of tumor cell contamination of autologous peripheral blood stem cell harvests when using plerixafor in stem cell mobilization regimens. Due to the observational nature and the small sample size, no firm conclusions could be drawn from this study, although the cohorts treated with plerixafor had a trend towards shorter PFS and OS times and a higher CIR, safety outcomes were similar to their respective comparators.

In line with the licensed therapeutic indication for the use of plerixafor in patients with MM, all patients in the primary analysis had to be poor mobilizers, either as predicted poor mobilizers through exposure to high-dose chemotherapy, or proven poor mobilizers based on their mobilization history. Despite propensity scoring, there were more proven poor mobilizers in the plerixafor cohorts, which may have influenced the outcomes due to an imbalance between comparison groups. A further difference between the plerixafor and comparison cohorts was that the median CD34+ cell counts in the plerixafor group during the current study mobilization were lower compared with comparison cohorts. These differences suggested that the groups may not have been balanced for disease prognosis, which may be important, as it has been reported that in poor mobilizers (defined as patients with a collection yield of <4 × 106 CD34+ cells/kg), the time to disease progression, PFS, and OS are all significantly shorter compared with successful mobilizers [7]. In our study, the shorter PFS, OS, and higher CIR for those who received plerixafor compared with the comparison cohorts may, in part, be due to poorer mobilizers in the plerixafor groups.

Even with the introduction of novel agents, including proteasome inhibitors and immunomodulatory drugs, into first-line MM therapy, autologous transplantation remains a cornerstone of treatment for transplant eligible patients [8]. Estimates of the proportion of patients failing to mobilize adequate numbers of stem cells for successful transplantation vary considerably [9,10,11]. Furthermore, it is now recognized that a second autologous HSCT has a role to play in the management of patients having a response to a first autologous HSCT of >1 year [12,13,14]. Second HSCT is now recommended for some patients by the National Institute for Clinical Health and Excellence [15] and the International Myeloma Working Group [13] guidelines, and it is probable that a substantial proportion of plerixafor mobilized patients will have obtained sufficient stem cells to facilitate this approach.

The possible effect of the infusion of tumor cells on long-term outcomes should be interpreted with caution, as unfavorable outcomes may simply reflect more aggressive disease as well as factors inherent to tumor cell mobilization. It is important to note that it is very likely that some of the patients in our study would probably not have proceeded to autologous transplantation without plerixafor treatment [16,17,18]. In support of our findings, results from a 5-year, long-term, phase III, follow-up study (not restricted to poor mobilizers) showed that the use of G-CSF + P did not have a negative effect on PFS and OS in patients with MM, with more than half of all patients with MM still alive 5 years following transplantation [19]. Furthermore, a major concern in the mobilization of stem cells for transplantation is the potential risk of tumor cell mobilization. In this respect, a recent study reported that MM tumor cells were not detected in the apheresis products of patients who received either G-CSF + P or those who received G-CSF alone [20]. Collectively, the findings from these two studies further support the safety of G-CSF + P for mobilizing CD34+ cells for transplantation in poor mobilizers. It is therefore more likely that relapse occurs due to clonal evolution of tumor cells within the patient at the time of transplantation.

Propensity scoring is increasingly used in nonrandomized clinical trials to assess small treatment effects that may introduce potential bias and imbalances between treatment cohorts due to an imbalance of baseline covariates, such as disease state and the number of previous failed mobilizations [5]. In this respect, propensity scoring matched patients for disease and response status, but was unable to match the risk level of the patients, because the genetic risk evaluation based on fluorescent in situ hybridization cytogenetics was not reported, and this may have impacted on the results of our study. Genetic risk evaluation is an important omission, because it is now well established that genetic aberrations are a major prognostic factor for disease progression in patients with MM [21,22,23]. Patients with high-risk genetic abnormalities benefit less from HSCT, and there is a substantial impact on the survival of these patients [21, 22]. Specifically, point mutations that affected transcriptional regulation of genes have been shown to negatively impact on event-free survival and OS in MM patients [21, 22]. Therefore, it is important to ensure that treatment groups are genetically balanced for those at low risk and at high risk of disease progression. Also, propensity scores cannot adjust for unreported differences between groups and, therefore, this is both a limitation and a source of potential bias in the use of propensity scores [24].

In our study, propensity scoring led to modest size comparisons but maintained an adequate balance for certain key disease characteristics collected in the database. The cohort with the worse results (G-CSF + P versus G-CSF alone) was the group with the smallest sample size—a factor that introduced a large variability. The sample size resulting from propensity score matching could be one of the limitations for the demonstration of noninferiority in the current study. Some important variables, such as the amount of prior chemotherapy, cytogenetic risk, or the use of consolidation and maintenance treatments, could not be incorporated in the model due to the lack, or limited availability, of the data. This imbalance in the data collection could have had a substantial impact on outcomes in our study.

There are a number of other limitations to our study that mandate caution when interpreting the results. The proportion of patients in the primary analysis who were proven poor mobilizers was numerically greater in the plerixafor cohorts (98.3–100%) compared with the comparator cohorts (72.1–85.4%). There were higher numbers of patients in the plerixafor cohorts who failed to mobilize sufficient CD34+ cells at the predicted peak mobilization times. In line with reimbursement of medicine costs in many European countries, clinicians may have selectively given plerixafor to patients who were the poorest mobilizers at the highest risk of poor outcomes, and this may have contributed to the trend for worse outcomes in the plerixafor cohorts [25,26,27,28,29]. A further consideration is that propensity score matching led to modest numbers of patients in the comparison groups, which may have had an effect on the outcomes of the study. There may have been selection bias, as only patients with successful mobilization were included in the study. The observational nature of the study may also have introduced bias, although every effort was made, potential biases may not be removed completely in observational data.

Conclusions

The findings from this study should be interpreted with caution due to its observational nature, the small sample size of the comparison groups, and the wide 95% CI observed in the HRs. The absence of information on genetic risk and maintenance treatment are further limitations of the study. The cohorts treated with plerixafor had a trend toward numerically shorter PFS and OS times and higher CIR, with similar safety outcomes compared with their respective comparators. Poor mobilization is associated with more aggressive disease and hence poor mobilizers are potentially predisposed to worse outcomes, as may be indicated by the lower baseline CD34+ cell counts at predicted peak of mobilization in the plerixafor cohort [30]. G-CSF + P remains an additional option for the mobilization of HSCs in poor mobilizers with MM with no substantial differences in PFS, OS, and CIR in comparison with other regimens.