State-transition modeling of blood transcriptome predicts disease evolution and treatment response in chronic myeloid leukemia

Frankhouser, David E.; Rockne, Russell C.; Uechi, Lisa; Zhao, Dandan; Branciamore, Sergio; O’Meally, Denis; Irizarry, Jihyun; Ghoda, Lucy; Ali, Haris; Trent, Jeffery M.; Forman, Stephen; Fu, Yu-Hsuan; Kuo, Ya-Huei; Zhang, Bin; Marcucci, Guido

doi:10.1038/s41375-024-02142-9

Download PDF

Article
Open access
Published: 02 February 2024

CHRONIC MYELOGENOUS LEUKEMIA

State-transition modeling of blood transcriptome predicts disease evolution and treatment response in chronic myeloid leukemia

Leukemia volume 38, pages 769–780 (2024)Cite this article

1269 Accesses
16 Altmetric
Metrics details

Subjects

Abstract

Chronic myeloid leukemia (CML) is initiated and maintained by BCR::ABL which is clinically targeted using tyrosine kinase inhibitors (TKIs). TKIs can induce long-term remission but are also not curative. Thus, CML is an ideal system to test our hypothesis that transcriptome-based state-transition models accurately predict cancer evolution and treatment response. We collected time-sequential blood samples from tetracycline-off (Tet-Off) BCR::ABL-inducible transgenic mice and wild-type controls. From the transcriptome, we constructed a CML state-space and a three-well leukemogenic potential landscape. The potential’s stable critical points defined observable disease states. Early states were characterized by anti-CML genes opposing leukemia; late states were characterized by pro-CML genes. Genes with expression patterns shaped similarly to the potential landscape were identified as drivers of disease transition. Re-introduction of tetracycline to silence the BCR::ABL gene returned diseased mice transcriptomes to a near healthy state, without reaching it, suggesting parts of the transition are irreversible. TKI only reverted the transcriptome to an intermediate disease state, without approaching a state of health; disease relapse occurred soon after treatment. Using only the earliest time-point as initial conditions, our state-transition models accurately predicted both disease progression and treatment response, supporting this as a potentially valuable approach to time clinical intervention, before phenotypic changes become detectable.

Transcriptome free energy can serve as a dynamic patient-specific biomarker in acute myeloid leukemia

Article Open access 25 March 2024

Early dynamics of chronic myeloid leukemia on nilotinib predicts deep molecular response

Article Open access 13 October 2022

A lineage tree-based hidden Markov model quantifies cellular heterogeneity and plasticity

Article Open access 17 November 2022

Introduction

Chronic myeloid leukemia (CML) is characterized at the cytogenetic level by t(9;22), the Philadelphia chromosome that creates BCR::ABL [1]. This fusion gene encodes a ligand-free activated mutant tyrosine kinase that transforms normal hematopoietic stem cells into leukemia stem cells (LSCs), primitive leukemic cells capable of initiating and maintaining the disease. While the advent of tyrosine kinase inhibitors (TKIs) has revolutionized the treatment of CML by inducing long-term remissions and largely preventing disease evolution from a chronic phase (CP) into blast crisis (BC), these drugs often fail to fully eradicate LSCs [2, 3]. Lack of treatment compliance, drug intolerance or acquired BCR::ABL mutations or additional genomic “hits”, therefore may result in disease relapse or progression to BC [4,5,6]. Thus, while a subset of patients may eventually discontinue TKIs, most of them remain on lifetime treatment [7, 8].

Quantification of BCR::ABL fusion transcript levels is diagnostic and effective at assessing ongoing treatment response but is less useful for predicting timely disease evolution at the earliest time points. In recent years, the transcriptome has emerged as a promising tool for assessing treatment response and predicting outcome in cancer and leukemia [9, 10]. Gene expression profiling using high-throughput technologies has enabled comprehensive analysis of the transcriptome in CML patients and identified gene signatures that reveal molecular mechanisms of transformation and treatment response [11, 12]. Recently, Giustacchini et al. reported on a distinct molecular signature of LSCs that selectively persisted after TKI treatment [13]. Ross et al. investigated the transcriptomic landscape of CML patients at diagnosis and during TKI therapy and identified differentially expressed genes and pathways associated with cell proliferation, apoptosis, and drug resistance, thereby predicting treatment response and long-term outcomes [14]. Radich et al. explored transcriptomic changes in CML patients who discontinued TKI therapy and found that specific gene expression patterns, particularly involving immune-related genes, could distinguish patients who successfully maintained treatment-free remission from those who relapsed [15, 16]. However, these studies are mainly population-based, and are therefore difficult to apply to predict disease evolution and treatment outcome for individual patients. To our knowledge, no recent studies have examined how transcriptomic changes detected at the earliest time points of disease or at the start of treatment can accurately predict outcome in individual CML mouse or human, even before phenotypic changes had occurred.

Here, we applied state-transition theory to model the temporal dynamics of the CML transcriptome and predict CML evolution and treatment response [9, 10]. CML is an ideal model to test this hypothesis as the disease is initiated and initially maintained solely by BCR::ABL, which is targeted in the clinic by TKIs, a treatment effective in inducing disease remission, but often not curative. We showed that the transcriptome of peripheral blood mononuclear cells (PBMC) from a transgenic CML mouse can be modeled as a particle undergoing state-transition in a BCR::ABL-transformed potential landscape. Using this model, we accurately predicted the trajectories of disease evolution, treatment response, and relapse of individual mice from the earliest time-point. Furthermore, the state-transition framework allowed for identification of the patterns of gene expression that drove leukemogenesis, thereby connecting the transcriptome state-transition to molecular mechanisms of disease growth and treatment resistance.

Results

Peripheral blood transcriptome state-transition during CML development

To investigate the molecular mechanisms of CML initiation and evolution, we collected PBMC samples from the CP CML mice at sequential time points after BCR::ABL induction by tetracycline discontinuation (Tet-off; Fig. 1A). Samples were also collected from control mice that continued to receive tetracycline (Tet-on) in order to maintain BCR::ABL suppression. Bulk RNA-sequencing (RNA-seq) was performed on each PBMC sample. To study the transcriptomic changes that characterized disease evolution, we applied a state-transition approach that mapped the time-series transcriptomes of each mouse into a CML state-space. The resulting trajectories of each mouse in the state-space were modeled using a stochastic equation of motion to determine whether the disease evolution of each mouse could be predicted based on its initial condition.

**Fig. 1: Time-series peripheral blood transcriptome encodes the CML state-space.**

To start, the CML state-space was built using Principal Component Analysis (PCA). Briefly, we performed singular value decomposition [9, 10] (SVD; see methods) on all expressed genes (n = 39,927) measured with RNA-seq of the sequentially collected samples of both control and CML mice. By including all the time-sequential samples collected from the CML and control mice, we were able to capture the continuum of transcriptional states that both spanned the health to terminal states and also included intermediate disease states. Among all PCs, PC2 (hereafter called CML state-space) provided the greatest separation between samples collected from the healthy control mice and those collected from CML moribund mice. We obtained transcriptome trajectories of each individual mouse by plotting the time sequential samples in the CML state-space (Fig. 1B, left panel).

Supporting the ability of the CML state-space to identify distinct disease states, we demonstrated that among all the PCs, PC2 had the best correlation with BCR::ABL expression levels. Nevertheless, the correlation was relatively weak (R² = 0.48; Fig. 1B, right panel; Table S1), suggesting that BCR::ABL was not the only driver of transcriptome state-transition. In fact, immediately after tetracycline discontinuation (Tet-off), we observed that the CML mice began to move through the CML state-space, from the health state toward disease. Despite BCR::ABL expression not yet being detectable, it likely influenced disease state changes even at those lowest, initial levels (Fig. 1C).

CML transcriptome state-transition model predicts disease evolution

As the CML state-space contained the continuum of the transcriptomic states from health to disease, we used the state-space to build a state-transition model that predicted the disease trajectories of individual mice. These predictions were made by setting initial conditions for each mouse using only the state-space location from the first collected time point. The theoretical framework of this model postulated that changes in the transcriptome could be modeled as a particle undergoing Brownian motion in a potential energy landscape using the Langevin equation [9, 10]. Thus, the solution to the Fokker-Planck (FP) equation corresponding to the equation of motion was used to predict disease state-transition in the CML state-space.

To understand the system dynamics, we hypothesized that the CML potential energy landscape could be modeled using a three-node network that resulted in a tri-stable system (Fig. 2A) [17]. In this system, BCR::ABL was the input signal that interacted with the PBMC cell population. The changing PBMCs and the consequent transcriptome variation created a double negative feedback loop representing how leukemogenesis dysregulated the transcriptome and how the dysregulated transcriptome affected the composition (normal vs leukemic) of the cell population. The transcriptome, represented by the location of the sample in the CML state-space, was modeled as self-activating progressive dysregulation of gene networks during leukemia progression [18]. The potential landscape resulting from this system ranges from a single well centered at the health state (BCR::ABL off) to three wells representing the most energetically favorable, CML steady states (BCR::ABL maximum; Fig. 2B). The equations describing the dynamics of the transcriptome in the CML potential resulting from this three-node network are described in the Methods.

**Fig. 2: State-transition model accurately predicts time-to-disease.**

To test if the theoretically-derived leukemogenic potential reliably recapitulated the experimental observations, we considered the empirical distribution of the CML mouse samples in the state-space and derived an experimental CML potential (Fig. 2C; see Methods). The tri-stable CML potential was characterized by five critical points identified from the kernel density estimator of the empirical distribution (denoted ${c}_{1}-{c}_{5}$). We labeled the stable (minima) critical points as Early state (Es) at ${c}_{1}$, Transition state (Ts) at ${c}_{3}$, and Late state (Ls) at ${c}_{5}$ (Fig. 2C). The unstable (maxima) critical points were labeled as Early-Transition state (T-Es) at ${c}_{2}$ and Transition-Late state (T-Ls) at ${c}_{4}$; these unstable points represent the boundaries between stable states. We also used the sample density of the control mice to determine the location of a Health state (Hs) labeled ${c}_{h}$ as the single stationary state and critical point for the control mice. The samples from the control mice remained confined to a region of the CML state-space corresponding to the Health state (Hs) ${c}_{h}$. Importantly, the three wells of the experimentally derived potential directly mapped into those of the theoretically predicted potential (Fig. 2B, C).

To predict time-to-disease [i.e., the amount of time that elapsed before the mouse crossed the boundary of the Ls (${c}_{5}$)], we modeled the location and trajectory of each mouse’s transcriptome in the state-space using the CML potential and the Langevin equation of motion. We were able to generate predicted transcriptome trajectories that matched the observed ones using only the data from samples collected at the first time point (Fig. 2D). The Fokker-Planck (FP) solution of the Langevin equation, when solved forward in time, accurately predicted time-to-disease (Fig. 2E). As all the CML mice died after passing Ls (${c}_{5}$), we also compared the time-to-disease (defined as the first crossing of the Ls (${c}_{5}$) critical point) between the observed and the predicted trajectories. The predicted and observed time-to-disease curves overlapped and were not statistically different (p = 0.8, Log-rank; Fig. 2F), and were highly concordant (C-index = 0.75) demonstrating accuracy of the state-transition model’s prediction. Of note, one CML mouse never developed disease after Tet-off and remained healthy (Fig. 2D). Our model predicted that this mouse would not cross Ls (${c}_{5}$), thereby accurately reproducing the experimental observation.

State-transition critical points of BCR::ABL potential landscape defined distinct disease states

In order to identify the transcriptomic drivers of CML state-transition, we reasoned that samples from different mice must be compared at equivalent disease states. Since mice developed CML at different times, we could not simply rely on grouping transcriptome samples by their collection time point. Thus, we utilized the samples’ positions in the state-space to group them into distinct disease states. We then compared gene expression at each disease critical point disease [i.e., Es (${c}_{1}$), Ts (${c}_{3}$) and Ls (${c}_{5}$)] with that of control samples at health critical point [i.e., Hs (${c}_{h}$); Fig. 3A; Fig. S3A; Table S2].

**Fig. 3: Differential expression analysis performed on CML state-space disease states.**

A total of 78 differentially expressed genes (DEGs) were identified at Es (${c}_{1}$) as compared to Hs (${c}_{h}$). A large proportion of DEGs (55 out of 78 total DEGs) observed at the Es (${c}_{1}$) state were unique to this state and were not identified in the other disease states (Fig. 3B). Among the most upregulated genes, were Ins1 and Ins2 that encode insulin, in addition to Fam177a1 and several SnRNAs (Table S2). Fam177a1 encodes a protein localized to the Golgi complex and endoplasmic reticulum but of unknown function, and when mutated, is associated with developmental and neurological disorders [19]. Of note, BCR::ABL was not among the upregulated DEGs at the Es (${c}_{1}$). Among the most downregulated DEGs, there were several genes encoding for collagen and elastin. Compared to Hs (${c}_{h}$), the transition state Ts (${c}_{3}$) presented with 366 DEGs (Fig. 3B; Table S2). In addition to BCR::ABL, among the most upregulated DEGs, was Prrt4, encoding for a protein predicted to be an integral component of the cell membrane, and the lncRNA Gm6093, predicted to play a role in cell differentiation [20, 21]. Among the most downregulated DEGs, we observed collagen and elastin encoding genes, in addition to Il33. Compared to Hs (${c}_{h}$), Ls (${c}_{5}$) presented with 1860 DEGs (Fig. 3B; Table S2). In addition to BCR::ABL, among the most upregulated DEGs, there were Pgk1-rs7, involved in glycolysis, and Prrt4. Among the most downregulated DEG, there were several pseudogenes and other genes coding proteins with unknown function. Of note, 94% of DEGs identified at Ts (${c}_{3}$) were also DEGs and expressed at higher levels at Ls (${c}_{5}$) (Fig. 3B).

Using gene set enrichment analysis (GSEA), we identified Hallmark gene sets enriched at distinct disease states (Fig. 3C). Compared with Hs (${c}_{h}$), we observed that all significantly enriched gene sets were downregulated at Es (${c}_{1}$). These included Epithelial Mesenchymal Transition (EMT), Apical Junction, and Myogenesis, that have been previously associated with malignant transformation [22,23,24]. In contrast, compared with Hs (${c}_{h}$), all significantly enriched gene sets were upregulated in the Ts (${c}_{3}$) and Ls (${c}_{5}$) (Fig. 3C). Among the most upregulated gene sets, we found those related to metabolism, angiogenesis, and inflammation. Of note, while all enriched gene sets at the Ts (${c}_{3}$) and Ls (${c}_{5}$) were upregulated with respect to Hs (${c}_{h}$), we observed that these genes sets had higher expression at Ls (${c}_{5}$) compared with Es (${c}_{1}$), with the exception of Pancreas Beta Cells which was downregulated (Fig. S3B).

Eigengene analysis quantifies DEG contribution to CML development

To quantify the contribution of each DEG to CML development, we used the CML state-space loading value of each gene which we defined as CML eigengenes [25]. Located in the second column of the matrix ${V}^{* }$ from the SVD procedure that was used to construct the CML state-space (Table S2), the eigengene value described the direction and the magnitude of the effect that each gene had on the sample’s location in the state-space. Thus, by combining the eigengenes with the expression changes at each disease critical point with respect to Hs (${c}_{h}$), we quantified the impact of DEGs on disease onset and evolution and categorized each gene as either a pro- or anti-CML eigengene. A pro-CML eigengene had either a positive loading value with downregulated expression or a negative loading value with upregulated expression; an anti-CML eigengene had either a positive loading value with upregulated expression or a negative loading value with downregulated expression (Fig. 3D; S4A). The eigengenes were then visualized in a 2D space constructed by plotting the CML contribution vs the PC1 loading value (V₁*; Fig. S4A). Of note, since PC1 was not directly associated with BCR::ABL expression or CML development, its selection was made simply to map the CML contribution; any other PC could have been used instead.

The net contribution to CML of the Es (${c}_{1}$), Ts (${c}_{3}$) or Ls (${c}_{5}$) DEGs was calculated as the vector mean of the vectors representing individual DEGs (Fig. 3D; Fig. S4B). The DEGs at Es (${c}_{1}$) had a net anti-CML contribution, while those at Ts (${c}_{3}$) and Ls (${c}_{5}$) had a net pro-CML contribution. We also calculated the CML contribution of the significantly enriched gene sets as a mean vector of all the eigengenes in a given enriched gene set. All the Es (${c}_{1}$) enriched gene sets had a net anti-CML contribution (n = 13), whereas all the Ts (${c}_{3}$) and Ls (${c}_{5}$) enriched gene sets had a net pro-CML contribution (n = 17 and 26 respectively; Fig. 3E; Fig. S4C).

To assess whether the eigengene-based contributions corresponded to phenotypic differences between disease states, we compared myeloid counts at Es (${c}_{1}$), Ts (${c}_{3}$), and Ls (${c}_{5}$) states. We observed no myeloid cell growth at Es (${c}_{1}$); this was followed by an initial expansion at Ts (${c}_{3}$) and further growth at Ls (${c}_{5}$) (Fig. S3D). The growth of the myeloid population between Ts (${c}_{3}$), and Ls (${c}_{5}$) was associated with an expansion of the DEGs at Ls (${c}_{5}$) and an increase in enrichment in many of the shared processes (Fig. 3B; Fig. S3B). GSEA indicated these genes to be members of metabolism- and inflammation-related gene sets. In addition, 405 downregulated DEGs were unique to the Ls (${c}_{5}$) state (Fig. S3C; Table S2). However, over half of the downregulated DEGs were unnamed genes and failed to be significantly enriched in any pathway (265 of 405). Nevertheless, the eigengene analysis showed that all but one of the downregulated Ls (${c}_{5}$) DEGs had a pro-CML contribution.

Gene expression dynamics identify drivers of CML state-transition

While we used the state-space critical points as a classifier of distinct disease states, the continuous state-space coordinates could also be used to align transcriptomes by their progress toward CML. By doing so, we created a pseudotime ordering of the disease, which allowed us to identify the patterns of gene expression involved in CML transformation. To this end, we used a correlation-based analysis to identify subsets of genes with similar patterns of expression during disease evolution (hereafter called gene “modules”; Fig. S5). At Es (${c}_{1}$), we found no gene modules, meaning that the DEGs identified at Es (${c}_{1}$) were not co-regulated at subsequent states (Fig. 4A, left; Fig. S5A). At Ts (${c}_{3}$), we identified an “increasing module”, meaning that these DEGs were co-regulated and continued to increase during disease progression (Fig. 4A, middle; Fig. S5B). At Ls (${c}_{5}$), we identified two gene “modules”, one increasing and one decreasing (Fig. 4A, right; Fig. S5C).

**Fig. 4: CML state-space aligns samples and identifies transition driver genes.**

Different from the increasing gene module observed at Ts (${c}_{3}$), the increasing gene module at Ls (${c}_{5}$) had non-linear dynamics, as it presented an inflection point near T-Ls (${c}_{4}$) (Fig. 4A). The inflection represented the point where the rapid increase of DEG expression started to slow down and became constant. The “decreasing” Ls (${c}_{5}$) gene module also had a non-linear expression curve, with an inflection occurring at approximately T-Ls (${c}_{4}$) and continued through Ls (${c}_{5}$) (Fig. 4A). It is important to emphasize that the expression dynamics of the gene modules were not derived from any critical point information; rather were derived independently, using only the mean expression of the DEGs and the CML state-space location of each transcriptome sample. Thus, it was striking to us that the inflections of the gene modules occurred almost exactly at T-Ls (${c}_{4}$), an unstable critical point between Ts (${c}_{3}$), and Ls (${c}_{5}$). The observation of co-regulation at the unstable critical points led us to hypothesize that these types of coordinated expression changes could break the equilibrium of a stable critical point and drive the disease transition from that stable state of the disease to the next one.

To identify the driver genes of disease transition between two otherwise stable states, we correlated the expression of individual DEGs with the shape of the CML potential (Fig. 4B, C; Fig. S6A). For this analysis, we included all DEGs (n = 3304) resulting from the comparisons of the stable disease states with the health state (Hs). We identified 194 DEGs at the T-Es (${c}_{2}$) state and 497 at the T-Ls (${c}_{4}$) state whose expression followed the CML potential (Fig. S6B). All but one of the T-Es (${c}_{2}$) genes and all T-Ls (${c}_{4}$) genes had a pro-CML contribution, which supports the hypothesis that these genes drove the CML state-transition. Interestingly, only 23 DEGs were identified as being involved in both T-Es (${c}_{2}$) and T-Ls (${c}_{4}$) transitions, suggesting that unique and distinct processes likely drove disease evolution from across the distinct disease states (Fig. S6B).

To investigate which cellular processes were involved in the transition between two stable states [i.e., Es (${c}_{1}$) to Ts (${c}_{3}$) and Ts (${c}_{3}$) to Ls (${c}_{5}$)], we constructed protein-protein interaction networks and performed pathway enrichment analysis (Fig. S6C). Both T-Es (${c}_{2}$) and T-Ls (${c}_{4}$) were significantly enriched for protein interactions (p < 0.0001; 31 and 232 interacting proteins respectively). The T-Es (${c}_{2}$) transition point was characterized by inflammation and immune related processes and involved multiple collagen genes (Fig. 4B; Table S3). T-Ls (${c}_{4}$) transition point was characterized by metabolism-related processes in addition to inflammation- and immune-related processes also enriched in T-Es (${c}_{2}$) (Fig. 4C, S6C; Table S3). All enriched gene sets for both T-Es (${c}_{2}$) and T-Ls (${c}_{4}$) had a pro-CML contribution, supporting their role in driving disease evolution.

State-space transition predicts treatment response

To investigate how treatment impacted CML state-transition, we performed time-series experiments using two approaches (Figs. 1A, 5A, S7A). The first approach made use of the CML mouse model’s ability to induce and eventually suppress BCR::ABL expression upon tetracycline discontinuation (Tet-off) and subsequent re-administration (Tet-on); hence we referred to this treatment as Tet-off-Tet-on (TOTO). The TOTO approach represented a hypothetical best scenario where a treatment is able to completely suppress BCR::ABL expression (Fig. S7A). The second approach consisted of treatment with nilotinib, a TKI used in clinical practice (hereafter referred to as “TKI”). Because TOTO suppressed BCR::ABL expression, TKI was assumed to be less effective than TOTO since it inhibited BCR::ABL at the protein level. Both Tet-on and TKI treatments were initiated six weeks after Tet-off. In the TOTO group, Tet-on was continued for 12-weeks. In the TKI cohort, TKI was administered for 4 weeks, and the mice were followed for the subsequent 12 weeks, for a total of 18-weeks or until they succumbed to disease, whichever occurred first.

**Fig. 5: CML treatment effects determined using CML state-space disease states.**

To determine the effect of each therapy on CML state-transition, the transcriptome dynamics of mice receiving one of the two treatments were compared with those of the untreated diseased and healthy mice at each of the disease [Es (${c}_{1}$), Ts (${c}_{3}$), Ls (${c}_{5}$)] or health Hs (${c}_{h}$) states respectively (Fig. 5B, S7B; Table S4). In order to assess post-treatment CML disease states, we used the eigengenes to project the TOTO and TKI samples into the CML state-space (see Methods). Supported by an observed relationship between the number of DEGs and the mean location of sample groups in the state-space, we used both the number of DEGs and the state-space to assess transcriptomic similarities or differences (Fig. S7C). We defined treatment response as movement of the transcriptome in the state-space away from the CML LS (${c}_{5}$) state and toward Hs (${c}_{h}$), with a complete response being a return to health [Hs (${c}_{h}$)].

When disease states of TOTO and TKI treated mice were compared with those of controls, we observed significantly different results. In the TOTO cohort, we observed a rapid movement from LS (${c}_{5}$) toward Hs (${c}_{h}$) in the state-space soon after tetracycline was re-administered. Post-treatment TOTO samples were tightly distributed around the ${c}_{2}$ critical point, which represented a near-health, newly created stable state in the treatment transformed potential landscape (Fig. 5A). When treated mice at ${c}_{2}$ were compared to the controls, few DEGs were identified (Fig. 5A, B). The observed DEGs and enriched gene sets that were detected by comparing TOTO mice to healthy controls were similar to those observed as time-related effect in the control mice alone and therefore, were possibly associated with aging (Fig. S7D). However, after subtracting out the time-related DEGs, 38 DEGs remained and had a small pro-CML contribution (Fig. 5C), indicating that once BCR::ABL transformation had occurred, the transcriptome maintained some leukemic fingerprints even if BCR::ABL expression is completely suppressed.

Different from TOTO, TKI treatment moved the mouse transcriptomes more slowly from the initial pre-treatment Ls (${c}_{5}$) state. These transcriptomes reverted only to Ts (${c}_{3}$) and none of these mice achieved the response of the TOTO mice (Fig. 5A). When compared to controls, the TKI treated mice had 1269 DEGs (Fig. 5B), and a net pro-CML contribution with BCR::ABL the top pro-CML DEG (Fig. 5D). Using GSEA to compare TKI treatment with healthy controls, gene sets that were upregulated during CML remained upregulated during TKI treatment (Fig. 5E) and had an overall pro-CML contribution (Fig. S7F). Of note, BCR::ABL was the top ranked pro-CML DEG.

Modeling the effects of TOTO and TKI on the CML potential

We next tested whether treatment could alter the potential landscape and, in turn, whether the state-transition model could predict treatment response at the earliest time point after treatment initiation. We started by identifying which parameters should be perturbed to recapitulate the state-transition observed with each of the treatments. To model TOTO treatment, we set the BCR::ABL signal to zero ($S=0$) and introduced a treatment response parameter ${\gamma }_{{TOTO}}$ which modulated the action of the signal on the transcriptome with ${\gamma }_{{TOT}0}=1.1$ (Fig. 6A, left). This resulted in a potential with a new steady state occurring near the T-Es (${c}_{2}$) state, the exact location in the state-space where the TOTO mice mapped post-treatment (Fig. 6A, right). We observed that both a decrease in BCR::ABL expression and an increase in the transcriptome sensitivity to BCR::ABL were necessary to derive a post-treatment potential that matched the observed TOTO trajectories. Once BCR::ABL signal was set to zero, the modulation of the single parameter ${g}_{{ab}}^{* }$ was sufficient to transform a pre-treatment tri-stable CML potential (${V}_{{CML}}\left(x\right)$) into a system with only one stable state corresponding to ${c}_{2}$ following TOTO treatment (${V}_{{TOTO}}(x)$). Using the TOTO potential (${V}_{{TOTO}}(x)$), we then produced simulated TOTO trajectories that matched those observed (Fig. 6B; Fig. S8).

**Fig. 6: State-transition model predicts CML treatment dynamics.**

To model TKI treatment, we reduced the BCR::ABL signal (BCR::ABL > 0) and introduced a treatment response parameter ${\gamma }_{{TKI}}$ (Fig. 6C left). In contrast to TOTO, the post TKI potential was produced only when both the BCR::ABL signal was reduced ($S \, < \, {S}_{\max }$) and the response rate between the PBMCs and the BCR::ABL signal was increased, with ${\gamma }_{{TKI}}=1.4$. The change in this single parameter was sufficient to transform the tri-stable CML potential (${V}_{{CML}}(x)$) into a bi-stable potential with new stable states at ${c}_{1}$ and ${c}_{3}$ following TKI treatment (${V}_{{TKI}}(x)$). Importantly, the TKI model predicted that the only way TKI treated mice could return to Hs (${c}_{h}$) was if the BCR::ABL level approached zero (Fig. 6C right). When we used this potential to simulate TKI treatment, we again observed an accurate prediction of the observed transcriptome trajectories (Fig. 6D; Fig. S8).

Discussion

We have previously reported that state-transition models can be used to predict time-sequential transcriptome dynamics and disease evolution in biological systems such as murine acute myeloid leukemia (AML) [9, 10, 18]. Constructing a state-space to model biological transitions can capture changes produced by genetic, epigenetic, or microenvironmental perturbations on the transcriptome, which simultaneously deregulate multiple biological processes in order to generate a phenotype such as the leukemic transformation observed here [26,27,28,29]. We have previously shown that micro-RNA and protein abundance (proteomic) data can also be used to create a state-space and characterize state-transitions [10, 30]. Using a transgenic CP CML mouse model that recapitulated the human disease, we report here the application of state-transition theory to interpret the temporal transcriptomic changes involved both in CML development and in treatment (where CML is “reversed”) to accurately predict outcome and treatment response.

We demonstrate the feasibility of predicting state-transition dynamics and time to disease using the time-sequential transcriptomic data collected from a murine model of CML. Our results show that the movement of the blood transcriptome in the state-space could be modeled as that of a particle undergoing Brownian motion in a leukemogenic potential defined by stable and unstable critical points. We hypothesized that BCR::ABL creates a leukemogenic potential that arises due to the action of BCR::ABL on both the PBMCs and the transcriptome. We also derived a CML potential landscape based on the position of the critical points derived from the observed distribution of the CML mouse transcriptomes in the state-space. The theoretically and experimentally derived landscape potentials were qualitatively identical and could be used to predict the trajectory of disease development over time. We showed that at the earliest time points, the CML state-space was more sensitive at detecting the disease state transition than the BCR::ABL expression itself. Furthermore, we showed that our state-transition model predicted transcriptome trajectories that matched well the observed ones utilizing only the transcriptomic state derived from data collected at the first time point after Tet-off BCR::ABL induction.

The biological features that characterized the distinct disease states were inferred from the analysis of DEGs obtained by comparing gene expression at each of the CML critical points with the control health state. Because the SVD decomposed the gene expression into linear combinations of orthogonal basis vectors, we were able to assess the contribution of individual genes, DEGs, or entire Hallmark gene sets to CML initiation and growth. Although the SVD can produce artifacts when applied to time-series data, we observed no oscillations or artifacts; moreover, we observed a strong correlation between state-space trajectories and BCR::ABL expression as an immunophenotypic marker of disease, supporting our interpretation of the principal component as a transcriptomic state-space [31]. We observed that the Es (${c}_{1}$) state was largely characterized by anti-CML DEGs that resisted CML development whereas the Ts (${c}_{3}$) and Ls (${c}_{5}$) disease states were characterized by pro-CML DEGs associated with the movement of state-transition toward the late disease state. In other words, the anti-CML contributions present only at Es (${c}_{1}$), were suggestive that at this early state of the disease, the transcriptome changes opposed BCR::ABL-driven disease development, whereas the pro-CML contribution as observed at Ts (${c}_{3}$) and subsequently at Ls (${c}_{5}$) suggested that transcriptome changes at these points overwhelmed any initial anti-leukemic force and contributed to disease evolution.

Interestingly, while we noted that genes involved in inflammation and angiogenesis were enriched at both Ts (${c}_{3}$) and Ls (${c}_{5}$), we also observed upregulation of genes involved both in glycolysis and oxidative metabolisms at Ls (${c}_{5}$), suggesting an increase in the bioenergetic demand in the state of highest growth [5, 32, 33]. Studying the changes in the expression of individual DEGs, we also discovered that groups of co-regulated genes (i.e., gene modules) that increased or decreased concurrently. Some of these gene modules displayed non-linear expression trends with inflection points that mirrored the shape of the CML potential at the unstable critical points T-Es (${c}_{2}$) and T-Ls (${c}_{4}$). Therefore, we hypothesized that they were likely the major drivers of transition between two stable disease states. Among others, metabolic processes were involved in the transition between Ts (${c}_{3}$) and Ls (${c}_{5}$), suggesting that the increased metabolic demand was one of the possible causes of the transition to the final state of the disease, when maximum leukemic growth was observed.

Importantly, we also applied the state-transition model to predict response to therapy. To this end, first we treated CML mice in Ls (${c}_{5}$) with TOTO, which resulted in complete suppression of BCR::ABL. We showed that TOTO brought the system rapidly back close to Es (${c}_{1}$), without reaching it. In fact, TOTO-treated mice remained at a new stable critical point TOTO (${c}_{2}$), with a small number of pro-CML DEGs. Thus, it is likely that once the system is altered, some fingerprints of BCR::ABL transformation remain, despite complete suppression of the fusion gene. Importantly, using only the initial time point and treatment start date, our potential model accurately predicted treatment response and post-treatment dynamics.

In contrast, TKI treatment returned the transcriptome only to Ts (${c}_{3}$). Once TKI treatment was completed, the transcriptome rapidly returned to Ls (${c}_{5}$) and mice had disease relapse. This mimics a possible trend observed in those patients that are either not compliant to the treatment and need to stop due to intolerable side-effects or have developed TKI resistant disease that evolves to a blast phase. While the model accurately predicted TKI trajectories, it also predicted that with administration of higher dose or more effective TKIs, the system may potentially return to the healthy state, Hs (${c}_{h}$). However, the optimal dosage of TKI compatible with the return to health may not be attainable in vivo [34]. Although the treatment response parameters between TOTO and TKI may not be directly compared, it is interesting that the treatment response effect for TOTO (${\gamma }_{{TOTO}}$) was higher than that for TKI (${\gamma }_{{TKI}}$) suggesting that the suppression of BCR::ABL gene with tetracycline had a greater positive impact on the transcriptome than BCR::ABL protein inhibition by TKI.

In summary, we show here that state-transition dynamical models can predict disease evolution and treatment response in a murine model that recapitulates human disease. It is possible therefore that this approach could be applied to other -omic data types and translated into the clinic as an assay to predict disease evolution and treatment response at the earliest time points, even before phenotypic changes will occur.

Methods

Mouse model

Inducible transgenic SCLtTA/BCR::ABL mice of B6 background were maintained on tetracycline (tet) water at 0.5 g/L. Tet withdrawal (tet-off) results in expression of BCR::ABL and generation of a CML-like disease in these mice with a median survival of approximately 10 weeks after induction of BCR::ABL, which recapitulates human CP CML. Six to eight week old male and female SCLtTA/BCR::ABL mice were randomly divided into 4 groups: control (Tet on, n = 3 mice), CML (Tet off, n = 6 mice), Tet Off-Tet On (TOTO; Tet off 6 weeks then Tet on for 12 weeks, n = 4 mice), TKI [Tet off 6 weeks then TKI (Nilotinib, 50 mg/kg, oral gavage, daily) treatment 4 weeks, n = 7 mice]. Blood was collected from these mice weekly, i.e., before BCR::ABL induction (t = 0) and weekly after induction up to 18 weeks (t = 1 to 18) or when the mouse developed leukemia and became moribund, whichever event occurred first.

RNA-seq sequencing, data processing and analysis

For each mouse, peripheral blood samples were accrued for all timepoints before for RNA extraction. Total RNA was extracted from whole blood using the AllPrep DNA/RNA Mini Kit (Qiagen, Hilden, Germany); quality and quantity were estimated using a BioAnalyser (Agilent, Santa Clara, CA). Samples with a RIN > 4.0 were included. Sequencing libraries were constructed using the KapaHyper with RiboErase (Kapa Biosystems, Wilmington, MA), loaded on to a cBot system for cluster generation, and sequenced on a Novaseq 6000 (Illumina) with paired end 100-bp for mRNA-seq to a nominal depth of 40 million reads. To mitigate batch effects, samples were assigned to flow cells such that treatments were approximately evenly represented across runs and samples from the same mouse shared a run. Image processing and base calling were conducted using Illumina’s Real-Time Analysis pipeline.

Raw sequencing reads were processed with the nf-core RNASeq pipeline version 3.7 [35]. Briefly, trimmed reads were mapped using STAR to the GRCm39 reference (Genbank accession GCA_000001635.9) amended with the human BCR::ABL1 fusion gene sequence (Genbank accession EF158045.1) using GENCODE annotation Release m28 [36]. Each library was subjected to extensive quality control, including estimation of library complexity, gene body coverage, and duplication rates, among other metrics detailed in the pipeline repository [35]. Transcript abundance was estimated across genomic features using Salmon and merged into a matrix of pseudocounts per million transcripts (TPM) per gene for each sample [37]. BCR::ABL1 transcript abundance was measured by counting reads that spanned the fusion boundary. Surrogate variable analysis was used to check for confounding experimental effects [38]. None was apparent (data not shown); however, for all comparisons, sex was used as a covariate in differential expression analyses using the default setting of DESeq2 and counts generated by tximport using the Salmon abundance estimation [39]. The RNA-seq dataset is submitted to the Gene Expression Omnibus and assigned accession number GSE244990.

State-space construction

To construct the CML state-space, we performed singular value decomposition (SVD) on all expressed genes in the transcriptome (n = 39,927), which we defined as all genes expressed in at least one sample. By including all time points of the time-series peripheral blood samples in both the control and the BCR::ABL expressing CML mice, we both captured the full transcriptional system and spanned all intermediate disease states (Fig. 1B). The SVD is a dimensionality reduction technique that produces low rank approximations of the input data, called principal components (PCs), that capture sources of variation in the original data set (Fig. S1B). Among the principal components, we found an axis on the PC1 vs PC2 plane to be the component with the best separation between the control and CML endpoint samples. To align this axis with the y-axis, we found the rotation angle that made the linear fit line of all control samples parallel to the x-axis. After applying a 37.8° rotation to the PC1 vs PC2 plane, we identified the rotated PC2 to be the principal component that best encoded CML development. We determined this based on two criteria: 1) it had the largest separation between the control samples and the endpoint samples for the CML mice; and 2) it demonstrated the best correlation with both BCR::ABL expression (abundance and log-normalized) and the myeloid population in the CML mice (Fig. S1A; Table S1). Therefore, we defined PC2 as the CML state-space. When we plotted PC2 against time for all samples, the CML state-space provided disease trajectories for each mouse (Fig. 1B). Despite being low, the correlation with BCR::ABL was by far the best for the CML state-space compared to all other PCs (R² = 0.48; Table S1). Although PCA can be misinterpreted when applied to time-series data with temporal correlations, we are confident in our application and interpretation here because of the correlation with BCR::ABL expression, phenotypic disease manifestation, and the absence of oscillations or other artifacts in the PCs [31, 40]. This is due to the orthogonality of the control (tet on) and CML (tet off) mice transcriptome endpoints and dynamics, which enable an orthogonal decomposition of the data and identification of the leukemia signal from the background/noise.

Using the expression of the same genes used when performing SVD to construct the state-space, the projection of the TOTO and TKI treatment cohorts was performed as follows: the treatment expression values ${X}_{T}$, were multiplied by the state-space loading values, $V$, and the inverse of the eigenvalues, ${\Sigma }^{-1}$ (Fig. 3B). The second column of the resulting treatment cohort PC matrix, ${U}_{T}$, represented the coordinates of the treatment samples in the CML state-space.

Mathematical model

The three compartment mechanistic model used to generate the CML, TOTO, and TKI potentials is adapted from Dey and Barik 2021 [17]. Using the double negative feedback loop B (DNFL-B) circuit framework, BCR::ABL concentration is modeled as the signal (S) acting on PBMC cells (A) and the corresponding observable variation in the transcriptome (B), as

$$\frac{{dA}}{{dt}}={G}_{A}\left(S,B\right)-{k}_{A}A$$

$$\frac{{dB}}{{dt}}={G}_{B}\left(A,B\right)-{k}_{B}B$$

where ${G}_{A}(S,B)$ and ${G}_{B}(A,B)$ are rate functions relating the concentration of the signal to the cell and transcriptional states as

$$\begin{array}{l}{G}_{A}\left(S,B\right)={g}_{A,S-B-}{H}^{-}\left(S\right){H}^{-}\left(B\right)+{g}_{A,S+B-}{H}^{+}\left(S\right){H}^{-}\left(B\right)\\ \qquad\qquad\quad\;+\,{g}_{A,S-B+}{H}^{-}\left(S\right){H}^{+}\left(B\right)+{g}_{A,S+B+}{H}^{+}\left(S\right){H}^{+}(B)\end{array}$$

and similarly for ${G}_{B}(A,B)$, see Dey and Barik 2021. The functions ${H}^{-}$ and ${H}^{+}$ are switching Hill functions given by

$${H}^{-}\left(x\right)=\frac{1}{1+{\left(\frac{x}{{x}_{0}}\right)}^{n}}$$

and ${H}^{+}\left(x\right)=1-{H}^{-}(x)$. The parameters are as stated in Dey and Barik Table S2 for the DNFL-B circuit model.

The effective potential $V\left(x\right)$ for the system is given by integrating the effective force of the signal (S) on the cells (A) and observable transcriptome (B) as

$$F\left(B,S\right)={G}_{B}\left({G}_{A}^{{\prime} }\left(S,B\right),B\right)-{k}_{B}B$$

Where ${G}_{A}^{{\prime} }$ is the steady state rate of A $\frac{{dA}}{{dt}}=0$, so that

$$V\left(B,S\right)=-{\int }_{\!\!\!0}^{{B}}\left[{G}_{B}\left({G}_{A}^{{\prime} }\left(S,x\right),x\right)-{k}_{B}x\right]{dx}$$

is a function of the state variable $B$, which in our case is the state-space coordinate corresponding to PC2, which parametrically depends on the signal, or BCR::ABL expression level $S$.

Mean squared displacement analysis

Mean squared displacement (MSD) was calculated for each mouse from their CML state-space trajectories vs time (Fig. S2A). The MSDs of each experimental group was fit to a linear model and the slope of the line was used to estimate the diffusion coefficient ($\beta$) used in the state-transition model (Fig. S2B).

The MSD at a time $t$ is given as ${MSD}=\langle {|x(t)-{x}_{m}|}^{2}\rangle$, where $x\left(t\right)$ is the position of trajectory from the simulation by a treatment model and ${x}_{m}$ is the CML state-space trajectory of the mouse model. The MSD was calculated for each mouse for each time point $t$, and the average MSD for all time points was evaluated by changing the coefficient of treatment force $\gamma$ and post-treatment diffusion coefficients $\beta$. The MSDs were examined within the range of $\gamma$ for $0.1\le \gamma \le 2.0$ and $\beta$ for $0.001\le \beta \le 0.1$, and the combination of values of $\gamma$ and $\beta$ that has the minimum MSD was selected. The search ranges of parameter, $\beta$, was empirically estimated from the average slope of linear regression to CML and control mouse data. The upper bound of $\gamma$ was determined by the convergence of trajectories of the simulation.

DEG expression dynamics

Expression dynamics were investigated by performing hierarchical clustering on the correlation matrixes produced using all CML mouse time points for each of the unique Early, Transition, and Late DEGs. Genes with similar expression dynamics (gene modules) were determined by selecting cutting the dendrogram to obtain clusters and then selecting for clusters that had a median correlation coefficient greater than 0.25 (Fig. S5A–C). For each gene module, the expression dynamic plots were created by calculating the mean of the log-normalized mean-centered expression for each CML sample and then plotting that value vs the CML state-space coordinate of the sample (Fig. 4A).

Transition point driver genes

To identify which genes had expression dynamics that were most similar to the inflection points of the CML potential, we used the following approach (Fig. S6A). First, we included all DEGs obtained by comparing CML disease state vs both healthy (${c}_{h}$) and other CML disease states in the analysis. Second, we defined each transition point by selected the CML samples that were located near the T-Es (${c}_{2}$) (CML state-space coordinates = [${c}_{1},{c}_{3}$]) or the T-Ls (${c}_{4}$) (CML state-space coordinates = [${c}_{3},{c}_{5}$]). Finally, we performed both a correlation analysis and linear regression separately for the T-Es (${c}_{2}$) and the T-Ls (${c}_{4}$) transition points between the log-normalized mean-centered expression and the value of the potential at the corresponding CML state-space of each sample. The driver genes for each transition point were those genes with a correlation coefficient greater than 0.5 and a linear regression significance of p < 0.05 (Fig. S6B). To identify the processes that were involved at each transition point, we used STRINGdb protein-protein interaction database to extend the driver genes to all high-confident (interaction score > 900) interaction partners (Fig. 4B, C) [41]. Using the drivers plus their interaction partners, EnrichR gene set enrichment analysis was performed on the Hallmark gene sets to identify the significantly enriched transition point processes (Fig. 4B, C) [42, 43]. Since the T-Ls network was quite large and included a number of enriched gene sets, we identified the comprising subnetworks and performed EnrichR on each subnetwork to better resolve the involved processes (Fig. S6C)

TOTO and TKI treatment simulations

We integrated treatment into the state-transition treatment model by modeling the treatment as a force that affects both the potential ($\mathop{F}\limits^{ \rightharpoonup }$) and internal energy of the particle, or the diffusion rate ($\vec{R}$). The equation of motion for the transcriptome particle is then given by the Langevin equation as

$$d{X}_{t}=\left(-\nabla {U}_{p}+\mathop{F}\limits^{ \rightharpoonup }\right){dt}+\sqrt{2\mathop{R}\limits^{ \rightharpoonup }}d{B}_{t},$$

where the position of the particle in the CML potential is denoted by ${X}_{t}$, and ${B}_{t}$ is a Brownian stochastic process that is uncorrelated in time $\left\langle {B}_{i},{B}_{j}\right\rangle ={\delta }_{i,j}$.

The corresponding probability distribution $p\left(x,t\right)$ at time $t$ is given by a Fokker-Plank (FP) equation including the treatment force $\mathop{F}\limits^{ \rightharpoonup }$ and diffusion vector $\mathop{R}\limits^{ \rightharpoonup }$ as follows:

$$\frac{\partial }{\partial t}p=\frac{\partial }{\partial x}\left(-\nabla {U}_{p}+\mathop{F}\limits^{ \rightharpoonup }\right)p+\mathop{R}\limits^{ \rightharpoonup }\frac{{\partial }^{2}}{\partial {x}^{2}}p.$$

We described the Tet-off CML potential as $\nabla {U}_{p}=a\left(x-{c}_{1}\right)\left(x-{c}_{2}\right)\left(x-{c}_{3}\right)\left(x-{c}_{4}\right)\left(x-{c}_{5}\right)$, with critical points ${c}_{1}$, ${c}_{2}$, ${c}_{3}$, ${c}_{4}$ and ${c}_{5}$ as identified in the state-space. The state-transition potential ${U}_{p}$ was mapped to the potential ${V}_{{CML}}(x)$ using the critical points in the state-space as follows.

For TOTO treatment, the CML and normal state-transition potentials approximated those generated by the network model as

$$\nabla {U}_{p-{TOTO}-{CML}}=a\left(x-{c}_{1}\right)\left(x-{c}_{2}\right)\left(x-{c}_{5}\right)\approx {V}_{{TOTO}}(x),$$

$$\nabla {U}_{p-{TOTO}-{normal}}=a\left(x-{c}_{2}\right)\approx V(x).$$

The transformation from CML to normal was therefore given as

$$\nabla {U}_{p-{TOTO}} = \nabla {U}_{p-{TOTO}-{CML}}-\nabla {U}_{p-{TOTO}-{normal}}\\ = a(x-{c}_{2})\left({x}^{2}+{c}_{1}\left({c}_{5}-x\right)-{c}_{5}x-1\right).$$

The TOTO treatment started by reintroducing Tet at $t$ = 6 until the study endpoint, therefore, the treatment force, ${\vec{F}}_{{TOTO}}$, and diffusion vector ${\vec{R}}_{{TOTO}}$ were given as follows:

$${\vec{F}}_{{TOTO}}={\gamma }_{{TOTO}}H\left(t-{t}_{{Tet}-{on}}\right)\nabla {U}_{p-{TOTO}},$$

$${\vec{R}}_{{TOTO}}={\beta }^{-1}+\left({\beta }_{{Tet}-{on}}^{-1}-{\beta }^{-1}\right)H\left(t-{t}_{{Tet}-{on}}\right).$$

The parameter ${\gamma }_{{TOTO}}$ represents the treatment strength, $H$ is the Heaviside step function and ${t}_{{Tet}-{on}}$ is the timing when Tet treatment started.

For TKI, the CML and normal potentials were approximated using critical points and midpoints between ${c}_{1}$ and ${c}_{5}$ based on the observations from the network model given as follows:

$$\nabla {U}_{p-{TKI}-{CML}}=a\left(x-{c}_{1}\right)\left(x-{c}_{2}\right)\left(x-{c}_{5}\right)\approx {V}_{{TKI}}(x),$$

$$\nabla {U}_{p-{TKI}-{normal}}=a\left(x-{c}_{1}\right)\left(x-{c}_{2}\right)\left(x-{c}_{3}\right)\approx V(x).$$

The transformation from CML to normal for TKI treatment was therefore given as

$$\nabla {U}_{p-{TKI}}=\nabla {U}_{p-{TKI}-{CML}}-\nabla {U}_{p-{TKI}-{normal}}=a({c}_{3}-{c}_{5})(x-{c}_{1})(x-{c}_{2})$$

The TKI treatment was introduced from $t$ = 6–9, incorporating the nilotinib half-life, $\lambda$ [44]. The treatment force for TKI, ${\vec{F}}_{{TKI}}$, and ${\vec{R}}_{{TKI}}$, were defined as the following,

$${\vec{F}}_{{TKI}}={\gamma }_{{TKI}}\,\exp \left\{-\lambda \right.\left(t-{t}_{{TKI}-{off}}\right)\,\left.H\left(t-{t}_{{TKI}-{off}}\right)\right\}H\left(t-{t}_{{TKI}-{on}}\right)\nabla {U}_{p-{TKI},}$$

$${\vec{R}}_{{TKI}}= \, {\beta }_{{TKI}}^{-1}+\left({\beta }_{{TKI}-{on}}^{-1}-{\beta }_{{TKI}}^{-1}\right)(H\left(t-{t}_{{TKI}-{on}}\right)-H\left(t-{t}_{{TKI}-{off}}\right))\\ +\left({\beta }_{{TKI}-{off}}^{-1}-{\beta }_{{TKI}}^{-1}\right)H\left(t-{t}_{{TKI}-{off}}\right).$$

In the model, the transition from a normal potential to a CML potential was approximated using three critical points using ${c}_{1}$, ${c}_{2}$ and ${c}_{5}$ as $\nabla {U}_{p-{CML}}={a}\left(x-{c}_{1}\right)\left(x-{c}_{2}\right)\left(x-{c}_{5}\right)$, and $\nabla {U}_{p-{normal}}={a}\left(x-{c}_{1}\right)\left(x-{c}_{5}\right)$.

A FP equation was solved to generate the probability density in the state-space for each treatment, TOTO, TKI, and CML using the state-transition potentials. The FP equation for the TOTO system is given by

$$\frac{\partial }{\partial t}p=\frac{\partial }{\partial x}\left(-\nabla {U}_{p-{TOTO}-{CML}}+{\vec{F}}_{{TOTO}}\right)p+{\vec{R}}_{{TOTO}}\frac{{\partial }^{2}}{\partial {x}^{2}}p$$

For TKI, the FP equation is given by

$$\frac{\partial }{\partial t}p=\frac{\partial }{\partial x}\left(-\nabla {U}_{p-{TKI}-{CML}}+{\vec{F}}_{{TKI}}\right)p+{\vec{R}}_{{TKI}}\frac{{\partial }^{2}}{\partial {x}^{2}}p$$

and

$$\frac{\partial }{\partial t}p=-\frac{\partial }{\partial x}\nabla {U}_{p-{CML}}p+{\beta }_{{CML}}^{-1}\frac{{\partial }^{2}}{\partial {x}^{2}}p$$

for the CML system. The probability density and trajectories at a given time were quantified by solving each FP solution with the initial condition given by the data at the initial time point Table S5.

Time to disease prediction

To compute the probability of disease progression, we integrated the solution of the Fokker-Planck equation beyond the critical point ${c}_{5}$ over time as

$${{{{{\rm{P}}}}}}[{c}_{5}|t]=1-{\int }_{{c}_{5}}^{\infty }{\int }_{0}^{t}p(x,\tau )dxd\tau$$

where $p\left(x,t\right)$ is the solution to $\frac{\partial }{\partial t}p=\frac{\partial }{\partial x}(-\nabla {U}_{p}+\mathop{F}\limits^{ \rightharpoonup })p+\mathop{R}\limits^{ \rightharpoonup }\frac{{\partial }^{2}}{\partial {x}^{2}}{p}$ with initial conditions given by the PC2 values at time t = 0 of the CML (tet-off) samples. The continuous curve given by ${{\mbox{P}}}[{c}_{5}{|t}]$ was then evaluated at the sample observation timepoints, corresponding to 1-week intervals. The observed time to disease was given by the timepoint at which the CML state-space trajectory crossed the ${c}_{5}$ critical point and was computed for each mouse. The predicted and observed time to disease curves were compared using the log-rank test (p = 0.88). As an additional statistical test, we applied the concordance index (C-index = 0.747) to show that the predicted time to state-transition was concordant with the observed [45, 46].

Early vs late control analysis

An analysis comparing the early (first five time points) vs late (last five time points) control samples was performed to determine how the samples changed over time since they were observed to move downward in the CML state-space (Fig. S7D; top). The total CML contribution of the 118 DEGs identified showed a small pro-CML contribution in the late controls (Fig. S7E; bottom left). The significantly enriched Hallmark gene sets from GSEA were very similar to those identified when the TOTO post-Rx were compared to all Hs samples (Fig. S7E; bottom right). Because of the small observed pro-CML effects observed in this analysis that could be due to aging processes in the mice over the course of the experiment, the TOTO post-Rx DEGs were controlled for aging by removing the DEGs observed from the early vs late control analysis.

Code availability

All code used to analyze and model the data and to generate figures and tables are available at: https://github.com/cohmathonc/CML_mRNA_state-transition.

References

Cortes J, Pavlovsky C, Saußele S. Chronic myeloid leukaemia. Lancet (Lond, Engl). 2021;398:1914–26.
Article CAS Google Scholar
Goldman JM, Melo J V. Targeting the BCR-ABL tyrosine kinase in chronic myeloid leukemia. N. Engl J Med. 2001;344:1084–6.
Article CAS PubMed Google Scholar
Druker BJ, Tamura S, Buchdunger E, Ohno S, Segal GM, Fanning S, et al. Effects of a selective inhibitor of the Abl tyrosine kinase on the growth of Bcr–Abl positive cells. Nat Med. 1996;2:561–6.
Article CAS PubMed Google Scholar
Zhang B, Zhao D, Chen F, Frankhouser D, Wang H, Pathak K V, et al. Acquired miR-142 deficit in leukemic stem cells suffices to drive chronic myeloid leukemia into blast crisis. Nat Commun. 2023;14:1–21.
Google Scholar
Kuntz EM, Baquero P, Michie AM, Dunn K, Tardito S, Holyoake TL, et al. Targeting mitochondrial oxidative phosphorylation eradicates therapy-resistant chronic myeloid leukemia stem cells. Nat Med. 2017;23:1234–40.
Article CAS PubMed PubMed Central Google Scholar
Mojtahedi H, Yazdanpanah N, Rezaei N. Chronic myeloid leukemia stem cells: targeting therapeutic implications. Stem Cell Res Ther. 2021;12:1–27.
Article Google Scholar
Mahon FX, Réa D, Guilhot J, Guilhot F, Huguet F, Nicolini F, et al. Discontinuation of imatinib in patients with chronic myeloid leukaemia who have maintained complete molecular remission for at least 2 years: the prospective, multicentre Stop Imatinib (STIM) trial. Lancet Oncol. 2010;11:1029–35.
Article CAS PubMed Google Scholar
Ross DM, Branford S, Seymour JF, Schwarer AP, Arthur C, Yeung DT, et al. Safety and efficacy of imatinib cessation for CML patients with stable undetectable minimal residual disease: results from the TWISTER study. Blood. 2013;122:515–22.
Article CAS PubMed Google Scholar
Rockne RC, Branciamore S, Qi J, Frankhouser DE, O’Meally D, Hua WK, et al. State-transition analysis of time-sequential gene expression identifies critical points that predict development of acute myeloid leukemia. Cancer Res. 2020;80:3157–69. https://doi.org/10.1158/0008-5472.CAN-20-0354.
Article CAS PubMed PubMed Central Google Scholar
Frankhouser DE, O’Meally D, Branciamore S, Uechi L, Zhang L, Chen YC, et al. Dynamic patterns of microRNA expression during acute myeloid leukemia state-transition. Sci Adv. 2022;8:1664.
Article Google Scholar
Janowski M, Ulańczyk Z, Łuczkowska K, Sobuś A, Rogińska D, Pius-Sadowska E, et al. Molecular changes in chronic myeloid leukemia during tyrosine kinase inhibitors treatment. focus on immunological pathways. Onco Targets Ther. 2022;15:1123.
Article CAS PubMed PubMed Central Google Scholar
Iezza M, Cortesi S, Ottaviani E, Mancini M, Venturi C, Monaldi C, et al. Prognosis in chronic myeloid leukemia: baseline factors, dynamic risk assessment and novel insights. Cells. 2023;12:1703.
Article CAS PubMed PubMed Central Google Scholar
Giustacchini A, Thongjuea S, Barkas N, Woll PS, Povinelli BJ, Booth CAG, et al. Single-cell transcriptomics uncovers distinct molecular signatures of stem cells in chronic myeloid leukemia. Nat Med 2017;23:692–702.
Article CAS PubMed Google Scholar
Ross DM, Pagani IS, Irani YD, Clarson J, Leclercq T, Dang P, et al. Lenalidomide maintenance treatment after imatinib discontinuation: results of a phase 1 clinical trial in chronic myeloid leukaemia. Br J Haematol. 2019;186:e56–e60.
Article PubMed Google Scholar
Radich JP, Dai H, Mao M, Oehler V, Schelter J, Druker B, et al. Gene expression changes associated with progression and response in chronic myeloid leukemia. Proc Natl Acad Sci USA. 2006;103:2794–9.
Article CAS PubMed PubMed Central Google Scholar
Radich JP, Hochhaus A, Masszi T, Hellmann A, Stentoft J, Casares MTG, et al. Treatment-free remission following frontline nilotinib in patients with chronic phase chronic myeloid leukemia: 5-year update of the ENESTfreedom trial. Leukemia .2021;35:1344–55.
Article CAS PubMed PubMed Central Google Scholar
Dey A, Barik D. Potential landscapes, bifurcations, and robustness of tristable networks. ACS Synth Biol. 2021;10:391–401.
Article CAS PubMed Google Scholar
Hari K, Harlapur P, Saxena A, Haldar K, Girish A, Malpani T, et al. Low dimensionality of phenotypic space as an emergent property of coordinated teams in biological regulatory networks. bioRxiv 2023.02.03.526930 https://doi.org/10.1101/2023.02.03.526930 (2023).
Alazami AM, Patel N, Shamseldin HE, Anazi S, Al-Dosari MS, Alzahrani F, et al. Accelerating novel candidate gene discovery in neurogenetic disorders via whole-exome sequencing of prescreened multiplex consanguineous families. Cell Rep. 2015;10:148–61.
Article CAS PubMed Google Scholar
Shen R, Soeder RA, Ophardt HD, Giangrasso PJ, Cook LB & Author C. Identification of long non-coding RNAs expressed during early adipogenesis. OnLine J Biol Sci Orig Res Pap. https://doi.org/10.3844/ojbsci.2019.245.259 (2019).
Zhou JD, Zhang TJ, Xu ZJ, Deng ZQ, Gu Y, Ma JC, et al. Genome-wide methylation sequencing identifies progression-related epigenetic drivers in myelodysplastic syndromes. Cell Death Dis. 2020;11:997.
Article CAS PubMed PubMed Central Google Scholar
Oshi M, Yan L, Kunisaki C, Endo I & Takabe K. Association of enhanced epithelial-mesenchymal transition signature with tumor microenvironment, angiogenesis, and survival in gastric cancer. 41, e16002 https://doi.org/10.1200/JCO.2023.41.16_suppl.e16002 (2023).
Takahashi H, Oshi M, Yan L, Endo I, Takabe K. Gastric cancer with enhanced apical junction pathway has increased metastatic potential and worse clinical outcomes. Am J Cancer Res 2022;12:2146.
CAS PubMed PubMed Central Google Scholar
Greaves D, Calle Y. Epithelial mesenchymal transition (EMT) and associated invasive adhesions in solid and haematological tumours. Cells. 2022;11:649.
Article CAS PubMed PubMed Central Google Scholar
Alter O, Brown PO, Botstein D. Singular value decomposition for genome-Wide expression data processing and modeling. Proc Natl Acad Sci USA. 2000;97:10101–6.
Article CAS PubMed PubMed Central Google Scholar
Li CM, Klevecz RR. A rapid genome-scale response of the transcriptional oscillator to perturbation reveals a period-doubling path to phenotypic change. Proc Natl Acad Sci USA 2006;103:16254–9.
Article CAS PubMed PubMed Central Google Scholar
Ponnapalli SP, Miron P, Miskimen KLS, Waite KA, Sosonkina N, Coppens SE, et al. AI/ML-Derived whole-genome predictor prospectively and clinically predicts survival and response to treatment in brain cancer. 117–8 https://doi.org/10.1145/3624062.3624078 (2023).
Tran I, Vargas A, Wilkins R, Pizzillo I, Tokoro K, Afterman D, et al. Abstract 6689: Whole genome cell-free tumor DNA mutational signatures from blood for early detection of recurrence of low stage lung adenocarcinoma. Cancer Res. 2023;83:6689.
Article Google Scholar
Patel VN, Gokulrangan G, Chowdhury SA, Chen Y, Sloan AE, Koyutürk M, et al. Network signatures of survival in glioblastoma multiforme. PLOS Comput Biol. 2013;9:e1003237.
Article CAS PubMed PubMed Central Google Scholar
Vu L, Garcia-Mansfield K, Pompeiano A, An J, David-Dirgo V, Sharma R, et al. Proteomics and mathematical modeling of longitudinal CSF differentiates fast versus slow ALS progression. Ann Clin Transl Neurol. 2023;10:2025–42.
Article PubMed PubMed Central Google Scholar
Shinn M. Phantom oscillations in principal component analysis. Proc Natl Acad Sci. 2023;120:e2311420120.
Article CAS PubMed Google Scholar
Qiu S, Sheth V, Yan C, Liu J, Chacko BK, Li H, et al. Metabolic adaptation to tyrosine kinase inhibition in leukemia stem cells. Blood. 2023;142:574–88.
Article CAS PubMed Google Scholar
Zhang B, Zhao D, Chen F, Frankhouser D, Wang H, Pathak KV, et al. Acquired miR-142 deficit in leukemic stem cells suffices to drive chronic myeloid leukemia into blast crisis. Nat Commun 2023;14:5325.
Article CAS PubMed PubMed Central Google Scholar
Wang Z, Jiang L, Yan H, Xu Z, Luo P. Adverse events associated with nilotinib in chronic myeloid leukemia: mechanisms and management strategies. Expert Rev Clin Pharmacol. 2021;14:445–56.
Article CAS PubMed Google Scholar
Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, et al. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020;38:276–8. https://doi.org/10.1038/s41587-020-0439-x.
Article CAS PubMed Google Scholar
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: Ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21.
Article CAS PubMed Google Scholar
Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods. 2017;14:417–9.
Article CAS PubMed PubMed Central Google Scholar
Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The SVA package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–3.
Article CAS PubMed PubMed Central Google Scholar
Soneson C, Love MI, Robinson MD. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research. 2015;4:1521.
Article PubMed Google Scholar
Elhaik E. Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated. Sci Rep. 2022;12:1–35.
Article Google Scholar
Szklarczyk D, Gable AL, Nastou KC, Lyon D, Kirsch R, Pyysalo S, et al. The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 2021;49:D605–D612.
Article CAS PubMed Google Scholar
Kuleshov M V, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 2016;44:W90.
Article CAS PubMed PubMed Central Google Scholar
Xie Z, Bailey A, Kuleshov M V, Clarke DJB, Evangelista JE, Jenkins SL, et al. Gene set knowledge discovery with enrichr. Curr Protoc. 2021;1:e90.
Article CAS PubMed PubMed Central Google Scholar
Pushpam D, Bakhshi S. Pharmacology of tyrosine kinase inhibitors in chronic myeloid leukemia; a clinician’s perspective. DARU, J Pharm Sci. 2020;28:371–85.
Article CAS Google Scholar
Ponnapalli SP, Bradley MW, Devine K, Bowen J, Coppens SE, Leraas KM, et al. Retrospective clinical trial experimentally validates glioblastoma genome-wide pattern of DNA copy-number alterations predictor of survival. APL Bioeng. 2020;4:026106.
Article CAS PubMed PubMed Central Google Scholar
Gittleman H, Sloan AE, Barnholtz-Sloan JS. An independently validated survival nomogram for lower-grade glioma. Neuro Oncol. 2020;22:665–74.
Article PubMed Google Scholar

Download references

Acknowledgements

Research reported in this publication was supported in part by Robert and Lynda Altman Family Foundation and included work performed in the integrative genomics core and biostatistics and mathematical oncology Shared Resource supported by the National Cancer Institute of the National Institutes of Health under grant number P30CA033572 and grant U01CA250046. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Funding

Open access funding provided by SCELC, Statewide California Electronic Library Consortium.

Author information

These authors contributed equally: David E. Frankhouser, Russell C. Rockne.
These authors jointly supervised this work: Bin Zhang, Guido Marcucci.

Authors and Affiliations

Department of Computational and Quantitative Medicine, Beckman Research Institute, City of Hope National Medical Center, Duarte, CAL, 91010, USA
David E. Frankhouser, Russell C. Rockne, Lisa Uechi & Sergio Branciamore
Department of Hematologic Malignancies Translational Science, Beckman Research Institute and Division of Leukemia, City of Hope National Medical Center, Duarte, CAL, 91010, USA
Dandan Zhao, Jihyun Irizarry, Lucy Ghoda, Haris Ali, Stephen Forman, Yu-Hsuan Fu, Ya-Huei Kuo, Bin Zhang & Guido Marcucci
Department of Diabetes and & Cancer Discovery Science, Beckman Research Institute, City of Hope National Medical Center, Duarte, CAL, 91010, USA
Denis O’Meally
Translational Genomics Institute, Phoenix, AZ, 85004, USA
Jeffery M. Trent

Authors

David E. Frankhouser
View author publications
You can also search for this author in PubMed Google Scholar
Russell C. Rockne
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Uechi
View author publications
You can also search for this author in PubMed Google Scholar
Dandan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Branciamore
View author publications
You can also search for this author in PubMed Google Scholar
Denis O’Meally
View author publications
You can also search for this author in PubMed Google Scholar
Jihyun Irizarry
View author publications
You can also search for this author in PubMed Google Scholar
Lucy Ghoda
View author publications
You can also search for this author in PubMed Google Scholar
Haris Ali
View author publications
You can also search for this author in PubMed Google Scholar
Jeffery M. Trent
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Forman
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Hsuan Fu
View author publications
You can also search for this author in PubMed Google Scholar
Ya-Huei Kuo
View author publications
You can also search for this author in PubMed Google Scholar
Bin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guido Marcucci
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

GM, BZ, RCR, and Y-HK were involved in conceptualization of research. DZ and BZ carried out experiments. DEF, RCR, LU, SB, and DO performed computational analysis and interpretation. DEF, RCR, and GM wrote the manuscript. DEF and RCR share first authorship. BZ and GM share last authorship. All authors contributed to and approved the final version.

Corresponding authors

Correspondence to David E. Frankhouser, Russell C. Rockne, Bin Zhang or Guido Marcucci.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Figures

Table S1

Table S2

Table S3

Table S4

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Frankhouser, D.E., Rockne, R.C., Uechi, L. et al. State-transition modeling of blood transcriptome predicts disease evolution and treatment response in chronic myeloid leukemia. Leukemia 38, 769–780 (2024). https://doi.org/10.1038/s41375-024-02142-9

Download citation

Received: 20 October 2023
Revised: 22 December 2023
Accepted: 05 January 2024
Published: 02 February 2024
Issue Date: April 2024
DOI: https://doi.org/10.1038/s41375-024-02142-9