Introduction

Chronic myeloid leukemia (CML) originates from a hematopoietic stem cell affected by a reciprocal translocation between chromosomes 9 and 22, which results in the formation of the BCR-ABL1 fusion gene in the short derivative of chromosome 22 (Ph chromosome)1. In this rearrangement, latent ABL1 tyrosine kinase is constitutively activated by its oligomerization mediated by the N-terminal region of BCR and causes an unregulated expansion of myeloid cells, especially mature granulocytes2. CML is generally diagnosed during the stable chronic phase (CP). Since BCR-ABL1-positive cells (called CML cells in the following) are quite dependent on BCR-ABL1 for their growth and survival, ABL1-specific tyrosine kinase inhibitors (TKIs) have clinically demonstrated substantial reduction of CML cells and restoration of normal hematopoiesis in the vast majority of CP-CML patients3,4,5,6,7.

At present, frontline TKIs for newly diagnosed CP-CML include imatinib3, dasatinib4, nilotinib5,6,7, and bosutinib8, among which the latter three are the second generation TKIs with more potent activity than the former one. The actual therapeutic response to TKIs is estimated by quantitative polymerase chain reaction analysis of BCR-ABL1 transcripts in peripheral blood and is represented by the ratio of BCR-ABL1 mRNA to ABL1 mRNA on the international scale (IS)9,10,11. The consensus optimal response to TKIs in CP-CML is major molecular response (MMR, IS ≤ 0.1%) within 1 year of therapy12, whereas a significant number of CP-CML patients achieve superior or deep molecular response (DMR; MR4.5, IS ≤ 0.0032%)13,14,15, especially on frontline nilotinib or dasatinib4,5,6,7. Furthermore, a recent series of clinical trials on TKI discontinuation revealed that nearly a half of CP-CML patients maintained treatment-free remission (TFR) following >2 years of DMR on TKIs16. Thus, to ultimately reach TFR, the practical target of CML treatment is to achieve DMR as early as possible after frontline TKI therapy, and the switch to another TKI may be a therapeutic option when DMR is not likely to be obtained. Thus, an algorithm to enable the early prediction of DMR achievement on TKI therapy would be quite valuable.

In this paper, we propose a mathematical framework that can predict the efficacy of nilotinib in each CML patient only from short time-series data. We constructed a simple classification method that can estimate whether a CML patient reaches DMR or not within 24 months after nilotinib administration. The key to success is to focus on the dynamical behavior of CML cells.

Results

Current scoring systems and guidelines are not suitable for DMR classification

The dataset analyzed in this study was obtained in a phase II clinical trial named the N-road study17 that aimed to examine the safety and efficacy of nilotinib for newly-diagnosed CML patients. Complete blood counts (CBCs) and IS were measured in a single certified laboratory center every 3 months for 2 years. In this study, we analyzed the data from the eligible 32 patients. Hereafter, among the eligible patients, we defined 18 patients who have reached DMR by 24 months as DMR patients and remaining 14 patients as non-DMR patients (Supplementary Figs. 12). See Supplementary Materials for the detailed information.

Recently, K. Sasaki et al. reported optimal IS values that predict the highest probability of reaching a sustained DMR at specific time points between 3 and 12 months18. However, to the best of our knowledge, there are neither scoring systems nor algorithms for early prediction of DMR achievement in CML patients under TKI therapy. Therefore, we examined whether the current scoring systems and guidelines in CML clinical practice could be adapted to predict DMR. The European LeukemiaNet (ELN) guideline in 201312 recommends the EUTOS prognostic score for prediction of complete cytogenetic response (CCyR) and progression-free survival (PFS) on frontline imatinib. In the N-road study dataset17, the EUTOS score did not precisely discriminate DMR from non-DMR patients (Fig. 1a). We tested another prognostic score, the EUTOS long-term survival (ELTS) score, which is newly adopted in ELN guideline 202015. The ELTS score was developed for prediction of long-term survival considering leukemia-related death. As well as the EUTOS score, the ELTS score also did not distinguish between DMR and non-DMR patients in the N-road study (Fig. 1b). In addition, the definition of response to frontline TKIs employed in the ELN12,15 required at least 12-month observation to discriminate the two populations in the same cohort (Fig. 1c–e). Then, we sought for key features to construct a better method of early DMR prediction.

Fig. 1: Several criteria of the current scoring systems and guidelines are not suitable for DMR classification.
figure 1

Each dot indicates each patient value within the N-road dataset. The top and bottom bars of the boxes indicate the maximum and minimum values, respectively. The top, center, and bottom lines are the upper quartiles, medians, and lower quartiles, respectively. Patients are divided into two groups: a DMR patient set and a non-DMR patient set. a The EUTOS score did not predict non-DMR patients. The patient is considered “High risk” if the EUTOS score of a CML patient is >87 and “Low risk” otherwise. b The ELTS score did not also predict non-DMR patients. A CML patient is considered as “High risk” if the patient’s ELTS score is >2.2185, “Low risk” if the score is 1.5680 or lower, and “Intermediate risk” otherwise. c–e According to the ELN guideline, we plotted patient IS and separated them into three groups: “Optimal”, “Warning”, and “Failure” for the data at (c) 3, (d) 6, and (e) 12 months after the initiation of TKI administration. In c, d many non-DMR patients were predicted as “Optimal”, indicating the failure of these predictions.

Key features for DMR prediction

Potential key features should be found in the dynamics of CML cells in CML patients on TKIs because mathematical studies on CML19,20,21,22,23,24,25 explained the characteristic dynamics of CML cells in CML patients on TKIs and successfully predicted relapse. Particularly, M. Horn et al. suggested that relapse was predictable based on residual leukemic stem cells and reduction slopes of IS23. However, for the purpose of DMR prediction from short-term measurements, most models are too complex to obtain suitable parameter values for each patient. Numerous approaches have been proposed to overcome these data-shortage problems such as high-dimensional data in general26 and prostate cancer in specific27,28. In this research, we overcome this problem by focusing on two important key features on the dynamics of CML cells in CML patients and by proposing an adequately-simple mathematical model, which can reproduce the two key features from a small dataset.

Two key features for early DMR prediction in CML patients are the decreasing rate and the convergence value of IS time-series data (Fig. 2). Both early and late dynamics of IS can be clearly represented by these two features (Fig. 2a, b). Additionally, the distributions of these two features in DMR and non-DMR patients seem to be distinguishable (Fig. 2c, d). Thus, we can discriminate DMR patients from non-DMR patients, if we accurately estimate these two features from short time series of IS values. Hereafter, we discuss a mathematical way to estimate these two features.

Fig. 2: Key features for DMR prediction: the decreasing rate and convergence value of IS, and the reduction rate and convergence value of CML cells.
figure 2

a, b The time series of IS are plotted in (a) for a non-DMR patient (patient # 19) and in (b) for a DMR patient (patient # 9). The decreasing rate and convergence value of IS are two key features for DMR prediction. A detailed information of the parameter estimation can be found in Methods. c The distributions of the IS decreasing rate for non-DMR and that for DMR patients were different. d The distributions of the IS convergence value for non-DMR and that for DMR patients were distinct. The definitions of the bars of the boxes, lines, and dots used in the boxplots (c, d) are the same as those in Fig. 1. e, f We show the estimated dynamics of normal WBCs and CML cells counts in (e) for a non-DMR patient (patient # 19) and in (f) for a DMR patient (patient # 9). These figures indicate that the reduction rate and convergence value of CML cells play a key role in the distinction between DMR and non-DMR patients.

Our proposed method showed a good performance for distinguishing between DMR and non-DMR patients

First, we propose a mathematical model, which describes the time evolution of the white blood cell (WBC) count in CML patients (Eq. 1 in Methods). Our model includes two variables: the normal WBC (namely BCR-ABL1-negative) count and CML cell (namely BCR-ABL1-positive) count. Furthermore, each variable has two parameters: the recovery rate and the convergence value for the normal WBC count, and the reduction rate and the convergence value for the CML cell count. Since both the cell counts are mathematically calculated by the measurement of IS and the total WBC count in our model (Eqs. 23 in Methods), we can estimate the variables and parameters (Fig. 2e, f). The first feature, the decreasing rate of IS, corresponds to the reduction rate of the CML cells. The second feature, the convergence value of IS, corresponds to the convergence value of the CML cells. Our mathematical model sufficiently approximates the time series of IS for various CML patients (Fig. 3a and Supplementary Fig. 2). These findings indicate the validity of our proposed model.

Fig. 3: DMR and non-DMR patients were separated on the plane of the reduction rate and convergence value of CML cells.
figure 3

a The estimated time series of CML dynamics (red dashed lines) sufficiently approximated the measured data for IS [%] and for WBC counts [number / μl] (black solid lines). The patients shown in the colored boxes were chosen as examples from the patients within the same color regions shown in (c). b The distribution of the convergence value and recovery rate [1/month] of normal WBCs did not show a correlation in terms of nilotinib response. The number indicated on each dot is patient #. c The distribution of the convergence value and reduction rate of CML cells was apparently classified into several subsets: a set including most DMR patients (the blue region), non-DMR patients (the orange region), and outlier patients (other color regions). d By choosing optimal threshold values, the distribution of DMR patients shown in (c) was perfectly separated from that of non-DMR patients. Notably, this result was obtained by using all patient data points.

Second, we discuss how we deal with estimated parameter values to classify the TKI response (Fig. 3b–d). DMR and non-DMR patients were clearly separated on a parameter plane of the CML cell reduction rate and convergence value (Fig. 3c). To obtain appropriate threshold values of the two patients groups, we determined horizontal and vertical lines on the parameter plane by using all the eligible patient data points (see “DMR prediction criteria” in Methods). With the threshold values, we positively discriminated between DMR and non-DMR patients in the analyzed N-road dataset with 100 % accuracy (Fig. 3d). On the other hand, another set of parameters did not clearly separate patients (Fig. 3b). We note that, in Fig. 3, all the time points of all patients data were used to estimate our model parameters. However, the parameters can be estimated based only on three time points for each because of our model’s simplicity. To classify patients into DMR or non-DMR based on the estimated parameters, the threshold values that separate DMR and non-DMR patients in the parameter plane still need to be determined.

Third, we discuss an approach to determine the threshold values of the CML cell reduction rate and convergence value in a CML patient by utilizing information from other patients. We employed the leave-one-out cross-validation. The dataset with 32 patients is divided into two groups, one is the training set composed of 31 patients, and the other is the validation dataset composed of one patient. By applying the aforementioned framework to data from the training dataset, we obtained estimated parameter values of each patient and optimized the threshold values under a certain criterion. For the optimization, we employed three different objective functions, i.e., maximizing accuracy, maximizing sensitivity, and maximizing specificity. An optimal threshold value is given by maximizing a corresponding objective function (see also “DMR prediction criteria” in Methods). Based on the optimized threshold values, we predicted a patient of the test dataset whether the patient will reach DMR or non-DMR from short-term measurements. For the test patient, we set the correctness score as 1 or 0 if the DMR prediction successes or fails, respectively. After the cross-validation, i.e., the 32 times of the prediction test, we obtain the classification performance as an average of the correctness scores among 32 patients (see also “Performance of DMR prediction” in Methods). In Fig. 4, we summarize the performance of DMR predictions. Our proposed approach, which uses data only at 3 initial time points (at 0, 3, and 6 months), reached the performance of the ELN definition at 12 months of therapy. Furthermore, our method enhanced specificity and F1 scores compared to the EUTOS scoring system and ELN guidelines. As for the ELTS scoring system, it estimated most patients as “Intermediate risk” or “High risk”. By assuming that these classifications correspond to non-DMR, we consider it difficult for the ELTS scoring system to detect non-DMR patients (see also Supplementary Fig. 6 for the confusion matrices of the ELTS score-based prediction). In Supplementary Figs. 78, we analyzed the EUTOS score-based and ELTS score-based prediction results based on the patient’s characteristics using the parameters for these scores (see also “EUTOS score-based and ELTS score-based DMR prediction” of Supplementary Materials). We note that the difference of our three methods relies on the way to obtain the threshold values, i.e., maximizing accuracy, maximizing sensitivity, and maximizing specificity (see also “DMR prediction criteria” in Methods). The classification performances of our three optimization methods balanced the four scores: accuracy, sensitivity, specificity, and F1 scores. In addition, the performance scores of our three methods were equal to or higher than those of most current scoring systems and guidelines when we adopted MR4.5 for DMR definition (Fig. 4). The exceptions are the cases of specificity of the ELTS score and sensitivity of the ELN guidelines that achieved 100 % performance. These seemingly good scores come from the biased prediction of the score/guidelines (see Supplementary Fig. 6). The similar results were also obtained for MR4.0 and complete molecular response (CMR) (Supplementary Fig. 5) (see Methods for the definitions of MR4.0 and CMR). Thus, our algorithm has high potential to predict non-DMR patients on frontline nilotinib, and our method should improve the current CML treatment.

Fig. 4: Our prediction method achieved favorable performance in comparison with the criteria of the current scoring systems and guidelines.
figure 4

We summarize the performance for accuracy (blue bars), sensitivity (green bars), specificity (orange bars), and F1 scores (purple bars) of the EUTOS score, the ELTS score, the ELN guideline for the cases of 3, 6, 12 months, and our proposed method (presented with the three rightmost labels). In the current study, “positive” and “negative” indicate DMR and non-DMR patients, respectively. In the cases of the EUTOS scores, the ELTS scores, and the ELN guidelines, we decided to estimate a target patient as “positive”, if the patient was classified as “Low risk” or “Optimal”. Under this estimation, we obtained accuracy, sensitivity (the true positive rate), and specificity (the true negative rate) for each case. In our setting, the prediction of non-DMR patients is highly important, because the administrated TKI should be promptly changed for them. Thus, our method was superior to the EUTOS score and the ELN guideline because our method showed higher specificity than them. The ELTS score cannot distinguish between non-DMR and DMR patients. Notably, each of our three results was optimized in terms of accuracy, sensitivity, and specificity through the training period. See also “DMR prediction criteria” and “Performance of DMR prediction” in Methods.

Discussion

Finally, we discuss whether the presented mathematical model is applicable to other frontline TKIs including imatinib and dasatinib. The BCR-ABL1 transcript dynamics under imatinib treatment21,23 was similar to the IS dynamics under nilotinib treatment analyzed in our study. Although further investigations are needed, we expect that our model can be applied to imatinib treatment. TKIs achieve rapid complete hematological response, including normal WBC counts, in a vast majority of patients with CML. However, among ever approved TKIs, only dasatinib reveals a peculiar property of rapid mobilization of non-leukemic lymphocytes, especially NK cells and CD8 + CD57 + large granular lymphocytes, in peripheral blood peaking 1–2 h after an oral intake29,30. This unique phenomenon frequently results in transient elevation of WBC counts in blood testing of CML patients even with stable molecular response. Similar effects on blood cell dynamics are not observed with any other TKI. Accordingly, the present mathematical model is likely to be applicable to TKIs except for dasatinib.

In conclusion, it is suggested that our mathematical method has an ability to predict the possibility to achieve DMR in a retrospective cohort of CML patients treated with frontline nilotinib. This outcome is due to the suitable degree of simplicity of our mathematical model: it is simple enough to operate with short-term time series but also complex enough to distinguish between DMR and non-DMR patients. The ultimate goal of CML treatment is likely to be TKI-free maintenance of DMR, and the achievement of durable DMR is a prerequisite for this goal31. Furthermore, timely prediction of nilotinib failure to achieve DMR will enable the consideration of other therapeutic approaches32, which will allow sustained DMR leading to TFR. Therefore, the present study contributes to the progress of clinical treatment that is supported by quantitative measurement of disease dynamics and mathematical analysis.

Methods

Definition of deep molecular response (DMR)

In this study, we employed DMR as the criterion for effective drug response in CML patients. There are multiple definitions of DMR such as MR4.0 (IS ≤ 0.01%) and MR4.5 (IS ≤ 0.0032%)9,12,13,14,15. If a patient’s IS becomes lower than one of the above threshold values at a certain time point, it is considered that the patient reaches the DMR corresponding to the employed threshold value. There is a stricter criterion designated as CMR32. A patient is considered to reach CMR, if his/her expression of the BCR-ABL1 gene is not measurable at two consecutive time points. In this paper, we designated patients as CMR who reached MR4.5 at both 21 and 24 months. Because lower IS implies a more favorable situation for CML patients, CMR is considered to be a better criterion. However, the measurement of low BCR-ABL1 expression is technically challenging, i.e., non-negligible noise/error is often contained in such measurements. Therefore, we employed MR4.5 for this analysis (Figs. 14). In Supplementary Figs. 35, for MR4.0, MR4.5, and CMR, we compared the numerical results corresponding to Figs. 1, 3b–d, and 4.

Here, we note a more detailed definition of DMR patients. We designated patients whose IS decreased below 0.0032% at a certain time point as DMR patients. However, in a subset of these DMR patients, their IS fluctuated and/or again increased above 0.0032%. In this study, we treated these patients as DMR patients. In further investigation, this point may be reconsidered, if more detailed clinical information becomes available. One of the reasons for IS reoccurrence is the acquisition of drug resistance. However, we cannot distinguish patients who acquired drug resistance based on the available information, because observations over a longer time span are needed for that purpose.

Detailed information of the N-road trial

We used the dataset from the N-road trial17, which was a phase II, single-arm, open-label, multi-center clinical study for newly diagnosed CP-CML patients. This study was organized based on previous trials on nilotinib5,6,7. The CP-CML diagnosis had to be made by chromosome banding or fluorescent in situ hybridization (FISH) or reverse transcription polymerase chain reaction. The N-road aimed to investigate the safety and efficacy of nilotinib in newly diagnosed CP-CML patients based on the early achievement of CMR. The primary endpoint was the CMR (defined as MR4.5 in the N-road study) achievement rate within 24 months. This trial included 53 patients and was conducted from August 2012 until July 2017. Nilotinib 300 mg twice daily was administered to patients for 24 months. If patients did not reach the criteria for optimal response defined by the N-road study referring to the ELN 2013 definition12 at any time point, the administration of nilotinib up to 400 mg twice daily as second-line therapy was allowed.

Notably, our mathematical framework requires only IS and WBC counts. Other laboratory findings were also obtained but were not used in this paper. In addition, in this study, we analyzed the data from the eligible 32 patients; the remaining 21 patients were ineligible for analysis due to missing data of one or more CBC and/or IS measurements. Furthermore, in this paper, moving to the second-line therapy or not was not thematized.

This trial was approved by the institutional review boards of the participating institutions and was conducted in accordance with the ethical guidelines for medical and health research involving human subjects. This study also adhered to the ethical principles of the Declaration of Helsinki. All patients provided written informed consent to participate in the study. This N-road study is registered in University hospital Medical Information Network (UMIN) system, and its ID is UMIN000008565 17.

Details of the EUTOS score, the ELTS score, and the criteria of the current guideline

Here, we introduce details of the EUTOS score19, the ELTS score15, and the ELN guidelines from 201312 and 202015 to be related to our study. Both the EUTOS score19 and the ELTS score15 are the prognostic scores to estimate the survival risk at baseline (i.e., before the TKI administration). The EUTOS score19 was proposed as a score to predict CCyR and subsequently PFS. The EUTOS score is calculated by the spleen size below costal margin and the basophil ratio as follows:

$$\begin{array}{l}{{{\mathrm{EUTOS}}}}\;{{{\mathrm{score}}}}: = 4\, \times \, {{{\mathrm{spleen}}}}\;{{{\mathrm{size}}}}\;{{{\mathrm{below}}}}\;{{{\mathrm{costal}}}}\;{{{\mathrm{margin}}}}\left( {{{{\mathrm{cm}}}}} \right)\\ \qquad\qquad \qquad\quad\,\,+ \,7 \times {{{\mathrm{basophil}}}}\;{{{\mathrm{ratio}}}}\;({{{\mathrm{\% }}}}).\end{array}$$

If the EUTOS score of a patient is >87, the patient is designated as being in a “High risk” situation. The ELTS score was recently adopted in the ELN guideline 202015 and proposed for long-term survival considering leukemia-related death. At baseline, the ELTS score is calculated by the age in completed years, the spleen size below costal margin, the percentage of blasts in peripheral blood, and the platelet count as follows:

$$\begin{array}{l}{{{\mathrm{ELTS}}}}\;{{{\mathrm{score}}}}: = 0.0025 \times \left( {\frac{{{{{\mathrm{age}}}}\;{{{\mathrm{in}}}}\;{{{\mathrm{completed}}}}\;{{{\mathrm{years}}}}}}{{10}}} \right)^3 \, + \, 0.0615\\ \qquad\qquad\qquad\,\, \times\, {{{\mathrm{spleen}}}}\;{{{\mathrm{size}}}}\;{{{\mathrm{below}}}}\;{{{\mathrm{costal}}}}\;{{{\mathrm{margin}}}}\;({{{\mathrm{cm}}}}) \\ \qquad\qquad\qquad\,\,+\, 0.1052 \, \times \, {{{\mathrm{blasts}}}}\;{{{\mathrm{in}}}}\;{{{\mathrm{peripheral}}}}\;{{{\mathrm{blood}}}}\;({{{\mathrm{\% }}}}) \\ \qquad\qquad\qquad\,\,+\, 0.4104 \, \times \, \left( {\frac{{{{{\mathrm{platelet}}}}\;{{{\mathrm{count}}}}}}{{1000}}} \right)^{ - 0.5}.\end{array}$$

A CML patient is considered as “High risk” if the patient’s ELTS score is >2.2185, “Low risk” if the score is <1.5680, and “Intermediate risk” otherwise.

In the ELN guideline 201312, the condition of a patient is classified into three stages according to the relationship among IS, chromosome banding analysis (CBA) from bone marrow cells, FISH of blood interphase cell nuclei, and the duration since the beginning of TKI administration: The three stages are “Failure”, “Warning”, and “Optimal” starting from the worst. At the 3-month time point, the threshold value between “Warning” and “Optimal” is set at 10% of IS. At the 6-month time point, the threshold value between “Failure” and “Warning” is set at 10%, and that between “Warning” and “Optimal” is at 1%. At the 12-month time point, these threshold values are set at 1% and 0.1%, respectively. The above information is illustrated in Fig. 1.

We should note that the ELN guideline 202015 changed the criteria of the risk stage. However, the changes did not affect the results obtained from our dataset. Here, we explain the change related to our analysis and why it did not affect our result. One of the changes related to our analysis is the “Failure” stage criteria at the 3-month time point. The criteria of the guideline in 201312 do not explicitly refer to IS values, but those in 202015 refer to IS values. According to the guideline in 2020, a patient is confirmed as the “Failure” stage if the IS value satisfies the following both conditions: the IS value is >10% at the 3 months; the same IS level is confirmed again before the 6 months. The N-road dataset has IS values every 3 months. Therefore, no patient satisfied the second condition. Thus, the changes in the ELN guideline 2020 did not affect our results.

We tested the performance of the EUTOS score, the ELTS score, and the ELN guideline for DMR prediction in our dataset as shown in Fig. 4. It should be noted that the EUTOS score, the ELTS score, and the guideline were not proposed for DMR prediction. In the cases of the EUTOS score and the guideline with 3 months and 6 months (labeled as “ELN guideline as of 3 months” and “ELN guideline as of 6 months”), the results indicated high “sensitivity” and low “specificity”; therefore, the prediction of non-DMR patients based on them is very difficult. On the contrary, for the ELN guideline as of 12 months, specificity was higher at 57%. Thus, for accurate prediction of non-DMR patients based on the guideline the 12 months duration is necessary. The ELTS score has higher “specificity” than the ELN guideline as of 12 months. However, the “sensitivity” and “accuracy” are significantly low. This result indicates that the ELTS score is not suitable for the prediction of DMR patients (see Supplementary Figs. 3 and 5 for the performance of the EUTOS score, the ELTS score, and the guideline in the cases of MR4.0, MR4.5, and CMR). The performance of the current guidelines, excluding the EUTOS score and the ELTS score, was tested in 32 patients. Since spleen size data, which are required to calculate the EUTOS and the ELTS scores, were not available for two patients, the EUTOS and ELTS scores were evaluated in the remaining 30 patients.

Dynamical model of CML cells

Here, we introduce the details of our dynamical model. We modeled the dynamics within peripheral blood of CML patients. Administration of an effective TKI results in the reduction of CML cells (BCR-ABL1-positive WBC) and the increase of normal WBCs (BCR-ABL1-negative WBCs) in peripheral blood. In this study, we assumed that the number of CML cells and that of normal WBCs exponentially change with time. We also assumed that these numbers asymptotically converge to constant values. Then, the time series of these numbers can be described as a couple of linear differential equations as follows:

$$\begin{array}{*{20}{c}} {\frac{{dx_i(t)}}{{dt}} = a_ix_i^ \ast - a_ix_i\left( t \right),} \\ {\frac{{dy_i(t)}}{{dt}} = b_iy_i^ \ast - b_iy_i\left( t \right),} \end{array}$$
(1)

where \(x_i\left( t \right)\) and \(y_i\left( t \right)\) represent the number of normal WBCs and that of CML cells for the ith patient at time t, respectively. Their convergence values are denoted by \(x_i^ \ast\) and \(y_i^ \ast\). The coefficients \(a_i\) and \(b_i\) represent the recovery rate of normal WBCs and the reduction rate of CML cells for the ith patient, respectively. Examples of the dynamics generated by the model are shown in Fig. 2e and f.

Relationship between measured values and CML dynamics

We explain the relationship between our model and measurements from CML patients. We can convert two measurement variables, the WBC count and IS, into two state variables in our model, \(x_i\left( t \right)\) and \(y_i\left( t \right)\), via the following procedure. The WBC count clearly agrees with the sum of \(x_i\left( t \right)\) and \(y_i\left( t \right)\). Then, the WBC count for the ith patient at time t, denoted by \({{{\mathrm{WBC}}}}_i\left( t \right)\), is described as follows:

$${{{\mathrm{WBC}}}}_i\left( t \right) = x_i\left( t \right) + y_i\left( t \right).$$
(2)

The definition of IS is the ratio of the BCR-ABL1 gene (mRNA) expression to the ABL1 gene (mRNA) expression: The BCR-ABL1 gene is expressed in CML cells, whereas the ABL1 gene is expressed in both CML cells and normal WBCs. Here, we assume that the total mRNA expression level is proportional to the number of cells. We denote the ratio of the total ABL1 mRNA expression level in normal WBCs to that in CML cells by c. Thus, IS for the ith patient at time t, denoted by \({{{\mathrm{IS}}}}_i\left( t \right)\) [%], is described as follows:

$${{{\mathrm{IS}}}}_i\left( t \right) = \max \left( {\frac{{y_i\left( t \right)}}{{cx_i\left( t \right) + y_i\left( t \right)}} \times k_{{{{\mathrm{IS}}}}} \times 100,\;0.0032} \right),$$
(3)

where \(k_{{{{\mathrm{IS}}}}}\) is the conversion factor of IS, and our dataset of the N-road adopts \(k_{{{{\mathrm{IS}}}}} = 1.2\). Because the detection limit of IS measurements in this dataset is 0.0032%, we set the lower bound as 0.0032.

These Eqs. (2) and (3) can transform the WBC count \({{{\mathrm{WBC}}}}_i\left( t \right)\) and IS \({{{\mathrm{IS}}}}_i\left( t \right)\) into \(x_i\left( t \right)\) and \(y_i\left( t \right)\), respectively. However, the constant ratio c is generally unknown. Thus, we estimated the ratio c and the parameters of the CML dynamical model \(a_i,b_i,x_i^ \ast ,\) and \(y_i^ \ast\) from the measurements. See Methods for the detailed methods for the parameter estimation.

IS is an international standard value that is converted using a laboratory-specific conversion factor (CF), which is calculated by comparing the actual measured values of patients’ samples between each laboratory and the international reference laboratory11. Bio Medical Laboratories (BML), which performed IS testing in the N-road study, obtained CF = 1.2 from the Institute of Medical and Veterinary Science (IMVS) in Adelaide, Australia, which is the reference laboratory for international standards. We adopted CF = 1.2 for the value of \(k_{\mathrm{IS}}\) in the Eq. (3).

Fitting IS time-series data with piecewise functions

In this part, we explain the method to estimate IS dynamics parameters shown in Fig. 2c, d. To obtain the decreasing rate and convergence value of IS, we defined a piecewise function with two domains as follows:

$$f_i\left( t \right) = \left\{ {\begin{array}{*{20}{c}} {\log \;{{{\mathrm{IS}}}}_i\left( 0 \right) + \frac{{\left( {\log \;{{{\mathrm{IS}}}}_i^ \ast - \log \;{{{\mathrm{IS}}}}_i\left( 0 \right)} \right)}}{{\tau _i}} \times t,} \\ {\log \;{{{\mathrm{IS}}}}_i^ \ast ,} \end{array}\begin{array}{*{20}{c}} {{{{\mathrm{if}}}}\;0 \le t \,<\, \tau _i,} \\ {{{{\mathrm{if}}}}\;t \ge \tau _i,} \end{array}} \right.$$

where \(10^{{{{\mathrm{IS}}}}_i^ \ast }\) is the IS convergence value, \(\left( {{{{\mathrm{log}}}}\;{{{\mathrm{IS}}}}_i^ \ast - {{{\mathrm{log}}}}\;{{{\mathrm{IS}}}}_i\left( 0 \right)} \right)/\tau _i\) is the IS decreasing rate of the ith patient, and \(\tau _i\) [month] is a boundary between the two domains. Then we solve the following optimization problem:

$$\mathop {{{{{\mathrm{minimize}}}}}}\limits_{{{{\mathrm{IS}}}}_i^ \ast ,\tau _i} \mathop {\sum }\limits_{t = 0,3, \ldots ,24} \left( {f_i\left( t \right) - \log \widehat {{{{\mathrm{IS}}}}}_i\left( t \right)} \right)^2.$$

We used the Adam optimizer of the TensorFlow machine learning library to solve the problem.

Parameter estimation of the CML dynamical model

Here, we introduce the optimization method to estimate the parameters of Eqs, (1), (2), and (3). The measurement value of the WBC count and that of IS for the ith patient at the time t are denoted by \(\widehat {{{{\mathrm{WBC}}}}}_i\left( t \right)\) and \(\widehat {{{{\mathrm{IS}}}}}_i\left( t \right)\), respectively. Then, the optimization problem is described as follows:

$$\mathop {{{{{\mathrm{minimize}}}}}}\limits_{x_i^ \ast ,y_{i,\;{{{\mathrm{min}}}}}^ \ast ,a_i,b_i,c} \mathop {\sum }\limits_{i \in P_ \ast } \mathop {\sum }\limits_{t = 0,3, \ldots ,24} \left( {\ln {{{\mathrm{IS}}}}_i\left( t \right) - \ln \widehat {{{{\mathrm{IS}}}}}_i\left( t \right)} \right)^2 + \left( {\ln {{{\mathrm{WBC}}}}_i\left( t \right) - \ln \widehat {{{{\mathrm{WBC}}}}}_i\left( t \right)} \right)^2$$

\({{{\mathrm{subject}}}}\;{{{\mathrm{to}}}}\;x_{{{{\mathrm{min}}}}}^ \ast \le x_i^ \ast \le x_{{{{\mathrm{max}}}}}^ \ast ,y_{i,\;{{{\mathrm{min}}}}}^ \ast \le y_i^ \ast ,a_i,b_i,c \,> \,0.\)

As for the fitting IS data with the piecewise function, we used the Adam optimizer to solve this optimization problem. We employed the method of Lagrange multipliers, and the constraint is

$$\begin{array}{l}\mathop {\sum }\limits_{i \in P_ \ast } \left\{ R\left( {x_{{{{\mathrm{min}}}}}^ \ast - x_i^ \ast } \right) + R\left( { - x_{{{{\mathrm{max}}}}}^ \ast + x_i^ \ast } \right) \,+ \,R\left( {y_{i,{{{\mathrm{min}}}}}^ \ast - y_i^ \ast } \right) + R\left( { - a_i} \right) + R\left( { - b_i} \right) \right\} = 0,\end{array}$$

where \(R\left( x \right): = \max (x,0)\) is the ramp function. In our setting, the Lagrange multiplier was set to 100. We set the lower bound \(x_{{{{\mathrm{min}}}}}^ \ast = 3100\,[{{{\mathrm{count}}}}/{{{\mathrm{\mu l}}}}]\) and the upper bound \(x_{{{{\mathrm{max}}}}}^ \ast = 9800\,[{{{\mathrm{count}}}}/{{{\mathrm{\mu l}}}}]\) in the constraint condition to maintain the convergence values of normal WBCs within the normal levels of healthy humans. Furthermore, we set the lower bound \(y_{i,{{{\mathrm{min}}}}}^ \ast\) in the constraint condition to keep \({{{\mathrm{IS}}}}_i\left( t \right)\) beyond the detection limit of IS, i.e., \(y_{i,{{{\mathrm{min}}}}}^ \ast\) is obtained by solving \(\frac{{y_{i,{{{\mathrm{min}}}}}^ \ast }}{{cx_i^ \ast + y_{i,{{{\mathrm{min}}}}}^ \ast }} \times k_{{{{\mathrm{IS}}}}} \times 100 = 0.0032.\) A set of patient indices is denoted by \(P_ \ast\). In Fig. 3, the parameters are estimated by utilizing all patient data, i.e., \(P_ \ast\) contains all patients \((i = 1, \ldots ,32)\). In Fig. 4, we checked how our proposed method outperformed the existing guidelines in terms of DMR patient prediction using short time series. For this sake, we did leave-one-out cross-validation, i.e., \(P_ \ast\) contained 31 patients used for the training dataset, and the remaining one patient was the prediction-target patient. We repeated this data splitting 32 times employing subsequently all 32 patients as the prediction target patient.

Parameter estimation for the prediction-target patient

Here, we introduce an approach to estimate parameters of the prediction-target patient only from three data points. As we have discussed in Fig. 3b–d, the classification is performed on the plane of the CML cell convergence value \(y_i^ \ast\) and its reduction rate \(b_i\). Thus, we introduce a method to estimate the two parameters from three data points, which are specifically denoted by \(\hat y_i^ \ast\) and \(\hat b_i\). Notably, the gene expression ratio parameter c is already estimated in the previous training step. Hereafter, we assume that all ISs of the three data points are greater than the detection limit. Otherwise, clearly according to the definition of DMR, the patient must be a DMR patient. Under this assumption, from Eqs. (2) and (3), the estimated number of CML cells at time t, denoted by \(\hat y_i\left( t \right)\), is obtained as follows:

$$\hat y_i\left( t \right) = \frac{{c \times \widehat {{{{\mathrm{IS}}}}}_i(t)}}{{100 \times k_{{{{\mathrm{IS}}}}} + \left( {c - 1} \right) \times \widehat {{{{\mathrm{IS}}}}}_i\left( t \right)}}\widehat {{{{\mathrm{WBC}}}}}_i\left( t \right).$$

Then, the estimated convergence value \(\hat y_i^ \ast\) and the reduction rate \(\hat b_i\) are obtained as follows:

$$\hat y_i^ \ast = \frac{{\hat y_i\left( 0 \right)\hat y_i\left( {2{\varDelta} {{{t}}}} \right) - \hat y_i\left( {{\varDelta} {{{t}}}} \right)^2}}{{\hat y_i\left( 0 \right) - 2\hat y_i\left( {{\varDelta} t} \right) + \hat y_i\left( {2{\varDelta} t} \right)}},$$
$$\hat b_i = \frac{1}{{{\varDelta} {{{t}}}}}\ln \left( {\frac{{\hat y_i\left( 0 \right) - \hat y_i\left( {{\varDelta} {{{t}}}} \right)}}{{\hat y_i\left( {{\varDelta} {{{t}}}} \right) - \hat y_i\left( {2{\varDelta} {{{t}}}} \right)}}} \right),$$

where \({\varDelta} t\) represents the interval of measurement points. In our dataset, the N-road, the data are measured every 3 months, i.e., \({\varDelta} {{{t}}} = 3\;[{{{\mathrm{month}}}}]\), meaning that we utilized the data \(\{{{\widehat{{{{\mathrm{WBC}}}}}_i(t),\widehat{{{{\mathrm{IS}}}}}_i(t)}}\}\) for \(t = 0,3,\;{{{\mathrm{and}}}}\;6\;[{{{\mathrm{months}}}}]\). Based on the estimated parameters \(\hat y_i^ \ast\) and \(\hat b_i\), our model predicts DMR or non-DMR for the target patient. Note that, according to our DMR definition, we regard a patient as DMR, if the patient’s IS value satisfies the DMR definition (IS ≤ 0.0032% for MR4.5) for \(t = 0,\;3,\;{{{\mathrm{or}}}}\;6\;[{{{\mathrm{months}}}}]\).

DMR prediction criteria

We introduce an approach to determine the criteria for DMR prediction. As presented in Fig. 3d, DMR patients should show large enough \(b_i\) and small enough \(y_i^ \ast\). To estimate the parameters \((\hat y_i^ \ast ,\hat b_i)\), we define a parameter region \(\{ (\hat y_i^ \ast ,\hat b_i)|\hat y_i^ \ast \le Y\;{{{\mathrm{and}}}}\;\hat b_i \ge B\}\) for the criteria. If the estimated parameters \((\hat y_i^ \ast ,\hat b_i)\) are included in this region, the patient is predicted to be a DMR patient. These threshold values Y and B are determined by using the training data, i.e., patients whose ID are included in a set \(P_ \ast\). We explored the optimal Y and B that maximize the accuracy, sensitivity, or specificity. We used the exploration intervals 0.1 and 0.05 for Y and B, respectively. These indices are the common concept in the classification context as follows. The accuracy is the ratio of patients who are successfully predicted. The sensitivity, the true positive rate, is the ratio of DMR patients who are successfully predicted. The specificity, the true negative rate, is the ratio of non-DMR patients who are successfully predicted. We note that “positive” (“negative”) indicates “prediction as DMR” (“prediction as non-DMR”). In this sense, “true positive” (“false positive”) indicates that the prediction as DMR is “correct” (“incorrect”).

As described above, if a patient satisfies the DMR definition within its initial three time points, the patient is considered to be a DMR patient. In Fig. 4 and Supplementary Fig. 5, the numbers of such patients were 15 for MR4.0, 10 for MR4.5, and 2 for CMR. All these patients were estimated to be “Optimal” in the case of the ELN guidelines12,15. For MR4.0 and MR4.5, only one patient of them was estimated to be “High risk” in the case of the EUTOS score, and the others were estimated to be “Low risk”. For CMR, all the patients were estimated to be “Low risk” in the case of the EUTOS score. In the case of the ELTS score, only one of the patients for MR4.0 and MR4.5 was estimated to be “Low risk” and the others were estimated to be “High risk” or “Intermediate risk”, and all of the patients for CMR were estimated to be “Intermediate risk”.

Here, for each accuracy, sensitivity, and specificity, we remark how to choose the optimal threshold \((Y,B)\). In the case of maximizing accuracy, we obtained the multiple possible sets of optimized \((Y,B)\) that maximize accuracy. Then, from the possible sets, we selected the set \((Y,B)\) closest to the average values of the possible sets. In the case of maximizing sensitivity, we obtained the multiple possible sets \(\left( {Y,B} \right)\) which achieve the largest sensitivity. However, under this condition of maximizing sensitivity, we can freely magnify the parameter region corresponding to DMR. To keep the area of the region reasonable, from the above possible sets, we again selected parameter sets that achieve the maximal true negative rate. Finally, we selected the largest Y and the smallest B among the sets that had passed the both selections as the optimal set. In the case of maximizing specificity, similar to the case of sensitivity, we obtained the multiple possible sets \(\left( {Y,B} \right)\) which achieve the largest specificity. Then, from the possible sets, we again selected parameter sets that achieve the maximal true positive rate. Finally, we selected the smallest Y and the largest B among them as the optimal set.

Performance of DMR prediction

Through the above framework, by maximizing the accuracy, sensitivity, or specificity, we obtained three prediction results as shown in Fig. 4 (for MR4.5) and Supplementary Fig. 5 (for MR4.0, MR4.5, and CMR). We used accuracy, sensitivity, specificity, and F1 score to evaluate the classification performance of the prediction results. F1 score is defined as \({{{\mathrm{F}}}}1\;{{{\mathrm{score}}}}: = \frac{{2\;{{{\mathrm{TP}}}}}}{{2\;{{{\mathrm{TP}}}} + {{{\mathrm{FP}}}} + {{{\mathrm{FN}}}}}}\), where TP, FP, and FN indicate the number of true positive, false positive, and false negative cases, respectively.

We obtained the classification performances as follows. As we employed the leave-one-out cross-validation, the dataset with 32 patients is divided into two groups, one is the training set composed of 31 patients, and the other is the validation dataset composed of one patient. For the test patient, we set the correctness score as 1 or 0 if the DMR prediction successes or fails, respectively. After the cross-validation, i.e., the 32 times of prediction test, we obtain the classification performance as an average of the correctness scores among 32 patients.

Our results with specificity maximization provided good performance especially in terms of specificity. Thus, this method can successfully predict non-DMR patients from short-term measurements. In future studies analyzing a larger dataset, we plan to employ machine learning methods for the optimization of our framework. The prediction of characteristic features of CML patients may also represent a future research avenue. In Fig. 3c, one can find that some patients were outliers, i.e., they were separated from most DMR and most non-DMR patients. Further clinical analysis on these outliers may provide new characteristic features of CML patients. As an example of related works, M. Horn et al. showed that patient classification on parameter space predicted relapse23. Thus, prediction of relapse may be another possible application of our mathematical framework.