Combined Population Dynamics and Entropy Modelling Supports Patient Stratification in Chronic Myeloid Leukemia

Modelling the parameters of multistep carcinogenesis is key for a better understanding of cancer progression, biomarker identification and the design of individualized therapies. Using chronic myeloid leukemia (CML) as a paradigm for hierarchical disease evolution we show that combined population dynamic modelling and CML patient biopsy genomic analysis enables patient stratification at unprecedented resolution. Linking CD34+ similarity as a disease progression marker to patient-derived gene expression entropy separated established CML progression stages and uncovered additional heterogeneity within disease stages. Importantly, our patient data informed model enables quantitative approximation of individual patients’ disease history within chronic phase (CP) and significantly separates “early” from “late” CP. Our findings provide a novel rationale for personalized and genome-informed disease progression risk assessment that is independent and complementary to conventional measures of CML disease burden and prognosis.


INVENTORY OF SUPPLEMENTARY INFORMATION Supplementary Figures and Figure
Legends S1 through S11 Figure S1. Separation of population-based effects using stem and progenitor cell data from primary CML patients.        Supplementary Tables S1 through S2 Table S1 GO Enrichment analysis of genes differentially up-regulated between early and late CP CML.   show differentiation between disease stages (CP and BC), in contrast to subpopulation-specific data. Each asterisk represents single patients with disease stage indicated by colour as follows: CP (green), AP cyto (purple), AP (blue), and BC (red). 6 purified normal CD34 + cell samples were included in the analysis as reference for similarity with immature blasts in BP (turquoise). Correlation of patient CD34 + similarity scores of CD34 expression in mixed patient samples compared to purified CD34 + cell populations (x-axis), with CD34 ratio inferred from the model (R 2 = 0.934, p = 3e-25).      Entropies were normalized to interval [0 1] to highlight similarity of observed and simulated entropy minima with respect to disease time.  a. -f. Significance of differential expression of each gene (blue dots) between CML disease stages is plotted as log10 p-value. A cut-off of p < 0.05 (FDR-adjusted pvalue) is indicated (red line). T1 = "early", T2 = "late" chronic phase (CP), AP = advanced phase, BC = blast crisis.        The number of active HSCs is not increased in CML and is estimated to remain constant at ! ≈ 400, while patients do have increased counts of myeloid progenitors. CML is diagnosed when bone marrow output exceeds 10 12 cells per day as a consequence of reduced . According to the model, average clonal dynamics are approximated so that the number of cells in each compartment i ≥ 1 changes according the differential equation ! = − ! × ! + !!! × !!! , where ! = (2 − 1)× ! represents the rate at which cells are leaving compartment i, and !!! = 2× × !!! indicates the rate at which cells originating in compartment i -1 enter compartment i .

Supplementary Tables
In the event of a cancer mutation in the HSC compartment, healthy HSCs are reduced to ! − 1 so that for normal cells ! = − ! × ! + !!! × !!! at ! ≈ 0.85, whereas for CML cells ( ! !"# ) we consider !"# < ! , indicating a lower probability of differentiation in CML. Starting with one single CML HSC, disease expansion, assessed at hands of the growth of the BCR-ABL/BCR ratio, takes almost 6 years until it becomes clinically evident at !"# = 0.72 (>10 12 cells bone marrow output).

Bootstrapping analyses
Supplementary Figure S5: Significance of the p value for the correlation between simulated and clinically observed entropy of gene expression was challenged through 1,000 iterations of random sub-sampling, removing 10% of data points in each iteration without replacement, and calculating the p value of the resulting correlation on the remaining 90% of data points. Resulting p value frequencies for 1,000 iterations are indicated.
Supplementary Figures S9 and S10: The fractions (%) of genes differentially expressed between CML disease stages (see Supplementary Figure S8) were identified at a cut-off of p < 0.05 (FDR-adjusted p value) and significance of differential expression was assessed by 2-sided t-test as described in the methods section accompanying this manuscript (MATLAB statistical toolbox). To assess robustness of the fractions of differentially expressed genes identified between the CML disease stages using this approach, random subsampling considering 90% of the patient cohort was performed at 1,000 iterations.
Frequencies of fractions of differentially expressed genes identified across all iterations at p < 0.05 (FDR-adjusted p value) are shown and the mean and associated standard deviation (ST.DEV) were determined. The same methodology was applied to simulate effect of changes in patient cohort size on the mean fraction of genes differentially expressed between CML chronic phase (T1-CP) and blast crisis (BC). To this end, random sub-sampling at 1,000 iterations across patient cohorts reduced in size by discrete increments of 10% were performed, each time considering the remaining 90% of patients to determine the fractions (%) of genes differentially expressed genes. Corresponding mean fractions of differentially expressed genes and associated ST.DEV were identified accordingly.