Introduction

Acute lymphoblastic leukemia (ALL) is the most common malignancy in children and adolescents, with ~85% of cases being B-cell precursor ALL (B-ALL). Over the past few decades, the overall survival rate in children with newly diagnosed ALL has improved dramatically from ~10% in the 1960s to almost 90% today [1, 2]. Despite this remarkable improvement, ~2% patients are refractory to induction chemotherapy [3], and an additional 10–15% of ALL patients still experience a relapse [4]. Although subsequent second complete remission (CR) can be achieved in most patients [5,6,7,8], ~55% of those patients will relapse again [6, 9]. Those children are generally managed with intensive chemotherapy, with or without novel agents to induce a third remission, followed by hematopoietic stem cell transplant (HSCT) if indicated [10, 11]. Despite the improvement of outcome in newly diagnosed patients, the reported event-free survival (EFS) of patients with first relapse of ALL has not changed significantly for more than 20 years and remains poor at ~35–50% [5,6,7,8,9, 12, 13]. The outcome for patients who fail initial induction therapy (primary induction failure), for those who do not respond to salvage therapy, and for those who are multiply relapsed is even worse. Therefore, new strategies are needed to improve the outcome of these patients.

The Therapeutic Advances in Childhood Leukemia consortium (TACL) was established in 2004 to develop innovative therapies through phase I/II clinical trials in children with incurable leukemia and lymphoma. Previously, TACL conducted a retrospective study to evaluate the remission rates and outcomes for children with refractory or multiply relapsed (R/R) ALL treated at eight TACL institutions in the United States (US) from 1995 to 2004 [5]. A CR rate of approximate 40% was identified in children who experienced second and subsequent relapse. Several other studies reported similar outcomes in children with multiply R/R ALL [10, 11]. The data provided reference information for clinicians and families to make treatment decisions and serve as a benchmark for the evaluation of new agents and regimens [5]. However, these studies may not reflect the current practice as treatment patterns and supportive care measures have changed over the past 10 years.

To provide current and precise estimates of outcome in children with multiply R/R B-ALL, we performed a more comprehensive follow-up study using pooled retrospective data collected from 24 TACL institutions in the US, Canada, and Australia. The primary objective of the study was to estimate the CR rate in pediatric patients with multiply R/R or primary induction failure B-ALL treated according to the institutional standard of care at participating centers. The secondary objectives were to estimate the EFS probabilities in this patient population, and to investigate patient and disease characteristics that are associated with these primary and secondary objectives.

Subjects and methods

Patients

The TACL T2014–004 study included patients ≤21 years with R/R B-ALL who experienced a qualifying treatment failure at a TACL institution between 2005 and 2013. Qualifying treatment failures included patients who underwent salvage treatment for primary induction failure, or with ≥2 occasions of relapsed disease; or failure to achieve remission after first or more salvage treatment attempt.

Patients meeting the eligibility criteria for this study were identified at each participating TACL institution. The approach for identifying potentially eligible patients included: tumor registries, medical records, hospital billing records, and internally maintained patient databases to ensure a complete census of eligible patients. Patient demographic information and clinical data related to the initial diagnosis and subsequent treatment failures were abstracted from the medical record. Collected data included disease characteristics, chemotherapy regimen, disease response, and survival until the date of death or end of follow up at least through 31 December 2014. Data were entered in the TACL DataLabs Clinical Data Management System and reviewed centrally. This study was approved by the institutional review board of each participating institution.

Definitions

A salvage treatment attempt was defined as a chemotherapy treatment plan initiated because of a relapsed or refractory leukemia. A curative attempt was defined as a treatment plan with the goal of achieving a CR. To determine whether a chemotherapy plan was curative or palliative, therapy regimens were evaluated and classified by two independent reviewers. A third reviewer participated when the two reviewers did not agree. Palliative attempts were excluded for all analyses that used response as a dependent variable. The outcomes of these palliative attempts were classified as “not evaluable” in analyses using prior treatment response as an independent variable predictive of subsequent response.

Response was evaluated using the complete blood count, bone marrow (BM), and extramedullary disease evaluations collected at the end of each treatment. Patients were considered to have achieved a CR if there were ≤5% blasts in the BM and no evidence of extramedullary disease. Relapse referred to leukemia recurrence in the BM, central nervous system (CNS), or other extramedullary sites following a CR. CNS leukemia was defined as CNS3 disease (≥5/µl WBCs and positive for blasts, or clinical signs of CNS leukemia). Medullary relapse was defined as >5% blasts in the BM. Isolated extramedullary relapse was defined as ≤5% blasts in the BM and evidence of disease in CNS, testicular, or other extramedullary sites. Combined relapse was defined as >5% blasts in the BM and evidence of extramedullary leukemia. Refractory disease was defined as failure to achieve CR after one course of curative chemotherapy. Primary induction failure was defined as failure to achieve CR after one course of induction chemotherapy for de novo ALL. Patients who had peripheral blasts ≥25% by morphology without BM assessments were designated to have relapse or refractory disease. Induction death was defined as death within 30 days from the initiation of systemic salvage chemotherapy.

The duration of prior remission in patients achieving CR was defined as the time between relapse date and the date of the previous CR. EFS was measured from the time of remission confirmation, to the date of relapse or death from any cause, or was censored at the earliest of the date of last follow up or 31 December 2014.

Statistical methods

Response and EFS analyses used salvage attempt as the unit of analysis. Attempts were excluded from these analyses if data were insufficient to determine outcome, if the treatment was judged as palliative, or if the attempt was prior to the patient being first seen at the TACL institution. The latter exclusion was to eliminate selection bias of treatment attempts that were more likely to be successful, as patients with unsuccessful treatment attempts (e.g., resulting in death) would be less likely to present at a TACL institution.

Univariable and multivariable logistic regression was used to analyze reinduction failure rates at the first and later salvage attempts. Independent variables included salvage treatment attempt number, prior remission duration, National Cancer Institute (NCI) risk criteria [14] at time of initial diagnosis, extramedullary and BM status at the start of the treatment attempt, and cytogenetics at diagnosis.

Cox regression analysis was used to examine the influence of these independent variables on EFS following CR. These analyses were restricted to salvage treatment attempts where patients achieved remission at their second or later salvage attempt. The analysis of CR rate and EFS used salvage attempts rather than patients as the primary analytic unit, so that each patient contributed data on one or more attempts. As in our previous publication [5] the corresponding logistic and Cox regression analyses, accounting for this inter-patient correlation, gave equivalent results to analyses that ignored this correlation. Results from the latter analytic method are reported. The administration of HSCT after salvage was included as a time-dependent covariate in the Cox regression analysis.

All p-values are two-sided tests, and estimates of relative risk and relative failure rate are presented with 95% confidence intervals. Statistical computation was performed using Stata 11 (StataCorp. 2009. Stata Statistical Software: Release 11. College Station, TX: StataCorp LP).

Results

Analysis cohort

A total of 366 unique patients from 24 TACL institutions with a total of 940 salvage attempts were enrolled in this study. The analytic set included 578 first and greater salvage treatment attempts among 325 patients (Fig. 1). Reasons for exclusion included: no evidence of systemic treatment (n = 20 attempts); attempts that were administered prior to the patient’s treatment at a TACL institution (n = 108 attempts); attempts that were determined as palliative (n = 122 attempts); and attempts that were not evaluable for response due to incomplete or missing data (n = 112 attempts). Those attempts were not included in the analytic set, although their information could be used as independent variables in analysis. The clinical characteristics of the patients at initial diagnosis are summarized in Table 1.

Fig. 1
figure 1

Consort diagram

Table 1 Characteristics at initial diagnosis of patients with ALL who received at least one salvage attempt (n = 325)

The majority of salvage attempts were due to BM relapse (458/578, 79.2%), while 13.5% were due to isolated extramedullary disease (Supplementary Table S1). BM status was unclear for the remaining 7.3% of the salvage attempts for extramedullary disease (Supplementary Table S1). Due to the complexity of the treatment, the salvage attempts were grouped into chemotherapy only, attempts with chemotherapy with a novel agent, and attempts with chemotherapy with HSCT (Supplementary Table S2). Fifty-eight salvage attempts (10%) included novel agent. HSCT was included in 31% salvage attempts.

Response to salvage treatment attempts

Since the majority of multiply R/R disease occurred in the BM, we focused our analysis of the response rate for treatment of two and more BM (isolated and combined) relapses. This comprised 267 unique patients with 458 salvage treatment attempts (Fig. 1). Table 2 summarizes the number of salvage attempts resulting in CR by whether or not a previous remission was achieved and the length of the previous remission at the specified salvage treatment attempt. The overall CR rate was 69 ± 3.6% after the first salvage treatment attempt, 51 ± 3.9% after the second salvage attempt, and <40% after the third and subsequent attempts (Table 2). Among the 25 patients with primary induction failure, 13 patients (52%) achieved CR after first salvage treatment attempt (Table 2). There were 16 induction deaths among the 458 curative salvage attempts (3.5%).

Table 2 Achievement of CR after treatment of bone marrow disease at reporting TACL institutions (n = 267 unique patients with 458 salvage attempts)

The results of the logistic regression for reinduction failure occurring at the first and later salvage attempts are displayed in Table 3. Salvage attempt number, duration of previous remission, and NCI risk category at diagnosis were all significant predictors in both univariable and multivariable analyses (Table 3). In the multivariable model, increasing salvage attempt number was associated with increased risk of reinduction failure (trend p = 0.0001). Duration of prior remission was inversely correlated with risk of reinduction failure (trend p = 0.0028). Patients with a high or unknown NCI risk category at initial diagnosis or infant ALL were also associated with higher risk of reinduction failure compared to patients classified as NCI standard risk (p = 0.0322). Neither extramedullary involvement nor BM status (M2 vs. M3) at start of therapy was associated with reinduction failure in univariable and multivariable analyses. Unfavorable cytogenetics at diagnosis was associated with higher risk of reinduction failure compared to patients having favorable and other cytogenetics in univariable analysis (p = 0.0429), but not in multivariable analysis (p = 0.7991).

Table 3 Summary of logistic regression for reinduction failure for medullary disease at reporting TACL institutions (267 unique patients with 458 salvage attempts)

Comparing the unadjusted CR rate among patients who received ≥2 salvage attempts between our study and the previous TACL study, we identified an improved CR rate for patients who received fourth through eighth attempts, in whom 31% of patients achieved CR (Table 4) vs. 12% in the prior study (p = 0.014) [5].

Table 4 Comparison of unadjusted CR rates of patients with medullary relapsed/refractory ALL between two sequential TACL studies

2 year EFS for patients achieving CR

Next, we focused on factors that impacted 2 year EFS among patients with at least two salvage attempts who subsequently achieved a CR. A total of 286 patients were identified who met these criteria. Among patients undergoing ≥2 salvage attempts, 125 attempts (in 108 unique patients) resulted in CR (Fig. 1). Survival analysis was completed on this cohort of patients. Kaplan–Meier survival curves by treatment attempts are provided in Fig. 2.

Fig. 2
figure 2

Estimated 2 year event-free survival for patients who achieved complete remission after ≥2nd salvage attempt. CR complete remission, EFS event-free survival

To investigate the prognostic factors that can impact the survival after achieving CR, Cox regression univariable and multivariable analyses were performed and presented in Table 5. Prior number of salvage attempts was the only significant predictor in the univariable analysis of EFS time among patients who achieved CR after a second or greater salvage attempt (p = 0.0323, trend p = 0.0906). However, in the multivariable analysis, the effect of the salvage attempt number is attenuated. Duration of previous remission, NCI risk category, extramedullary involvement, HSCT post-remission, BM status, and cytogenetics at start of therapy all failed to reach statistical significance in both univariable and multivariable analysis. In addition, in the multivariable analysis, a trend of decreasing risk of relapse/death among patients with longer duration of prior remission for patients with 18–36 months and at least 36 months duration of previous remission was seen compared to patients that achieved a prior remission lasting less than 18 months.

Table 5 Cox proportional hazards model of event-free survival from start of remission for patients who achieved CR after ≥2 salvage attempts (n = 108 patients with 125 attempts)

Although minimal residual disease (MRD) data were available for some salvage attempts (40 attempts), most of the patients did not have MRD data available. Therefore, MRD was excluded in the analysis.

Discussion

This is the second retrospective pooled data analysis from TACL that evaluates the outcome of pediatric patients with multiply R/R B-ALL treated during a contemporary period. Efforts were made to include a complete census of eligible patients from each participating center in order to minimize patient selection bias. Twenty-four TACL institutions participated in the study, representing major pediatric hematology/oncology centers across the US, Canada, and Australia. The inclusion of 325 patients provided us with the opportunity to undertake a robust analysis and evaluate factors that influenced the remission rate and survival.

Comparison of the current study with the previous TACL study is not straight forward, since the majority of our patients had ≥2 occasions of relapses, whereas the previous study also included patients with first relapsed disease [5]. Therefore, not surprisingly, the CR rate among our cohort after a 2nd treatment attempt is biased as it excludes patients who had a single treatment failure, achieved remission, and remained in remission, and slightly less than the reported CR rate of 81–94% in the literature [5,6,7,8]. We observed a trend of improved response rate for patients who had two or more salvage attempts in our study when compared to the results reported by Ko et al. and other published studies [5, 10, 11]. More importantly, we identified a significantly higher CR rate among patients who received fourth through eighth salvage attempts in the current study when compared to the previous TACL study. Considering the improvement of the treatment of de novo ALL in recent time periods, one might assume that remission would be harder to achieve in these multiply salvaged patients. However, our finding is consistent with the previous report from the Children’s Oncology Group suggesting that post-relapse survival is independent of initial treatment intensity in children with first relapsed ALL [9]. This observation is also consistent with the findings of several genomic studies that indicate that clones responsible for relapse are often present at diagnosis or mutated to a resistant phenotype through intrinsic genomic instability rather than treatment exposures [15, 16]. We speculate this apparent improvement in CR rate could be related to intensification of salvage chemotherapy, introduction of novel agents in this patient population in recent time periods, and better supportive care. However, the difference could also be attributed to differences in non-treatment-related features of patients in these two groups.

As previously published, the number of prior salvage attempts and duration of previous remission are the prognostic factors contributing to the subsequent CR in children with multiply relapsed ALL [5, 10, 11]. We found no compelling association of reinduction failure with either extramedullary involvement or BM status (M2 vs. M3) at start of therapy in univariable or multivariable analyses. Our study identified that NCI risk category at diagnosis is a significant independent prognostic factor for remission induction. This observation was consistent with a non-significant trend observed in the previous TACL publication [5]. However, after achieving CR, it had no impact in the survival of these patients. Although several studies have suggested that cytogenetics at diagnosis was an independent prognostic factor in children with ALL in first relapsed or primary induction failure [3, 13], the impact of cytogenetics in children with multiply relapsed ALL was unknown. In our study, unfavorable cytogenetics was associated with higher risk of induction failure only in univariable analysis. There appears to be a trend of lower disease progression among patients with unfavorable cytogenetics when CR was achieved after ≥2 salvage attempts in multivariable analysis in our small cohort. Given the wide 95% confidence interval, further studies is warranted. Taken together, these data highlight the importance of understanding the biology in relapsed ALL to identify targets for novel therapies that can result in more sustained CR.

Few studies have evaluated survival in patients who achieved CR3 and beyond. Previous studies reported a 23–31% EFS in patients achieving CR3 [4, 11, 12]. In our analysis, we found an improved 2 year EFS for patients who achieved a CR3 (41% ± 5.6%). With small patient numbers, we found no compelling advantage for HSCT in patients who achieved CR after ≥2 salvage attempts. Furthermore, a 2 year EFS of 27 ± 13% was seen in patients who had ≥4 salvage attempts, a slight improvement from the previous TACL study, although the number of patients are very small. Of note, a total of 61 salvage attempts were administered to 16 patients as their fourth eighth attempts. Among the 61 salvage attempts, only five (8%) were CD19 chimeric antigen receptor (CAR) T cell therapy. Therefore, the improved CR rate among this group could not be solely explained by the highly effective CD19 CAR T cell immunotherapy. However, it is possible this treatment could have resulted in deeper and more sustained CR and have influenced the EFS in these 16 patients who experienced five or more prior treatment failures.

Overall, compared to previous studies, our data demonstrated that in the contemporary era, more effective reinduction therapy resulted in a trend of higher CR rates, more sustained remissions and improved survival. Despite these improvements, the majority of patients in our cohort died from their disease. Therefore, new approaches are still needed to improve outcomes. The results from our data provide important reference background information for evaluating CR rates in future early phase clinical trial designs for B-ALL, especially with respect to the composition of patient characteristics in those new agent trials which typically included multiply relapsed/refractory patients. However, it will always be important to consider the limitations of historical data when using them as reference in clinical trials because retrospective clinical data are not equivalent to clinical trial data. Other limitations of our analysis included, the lack of data regarding organ function, performance status, other co-morbidities, and enrollment in investigational studies. Treatment-related adverse events and therapy modifications due to toxicity were not collected. Only a few patients had MRD data available; therefore, the impact of MRD in the outcome is unknown. Out of the 325 patients, only a small number of patients received CD19 or CD22 directed immunotherapy (blinatumomab, n = 7; CD19 CAR T cell, n = 11; inotuzumab, n = 1) [17,18,19]. Therefore, our analysis reflects the treatment outcome prior to the CD19 directed immunotherapy era. It will be interesting to see whether the introduction of new promising agents such as blinatumomab, inotuzumab, and CD19 CAR T cell therapy will change the long-term outcome in these patients.

In conclusion, this is the largest retrospective study to date of the outcome of children with multiply R/R B-ALL receiving contemporary treatment across North America and Australia, and demonstrated a trend of improvement in CR rate and survival compared to the previous TACL study. The pooled data provide important background information in the outcome of children with multiply R/R B-ALL, which can be valuable in planning clinical trials assessing new drugs and biologic agents.