Myelodysplastic neoplasms dissected into indolent, leukaemic and unfavourable subtypes by computational clustering of haematopoietic stem and progenitor cells

Myelodysplastic neoplasms (MDS) encompass haematological malignancies, which are characterised by dysplasia, ineffective haematopoiesis and the risk of progression towards acute myeloid leukaemia (AML). Myelodysplastic neoplasms are notorious for their heterogeneity: clinical outcomes range from a near-normal life expectancy to leukaemic transformation or premature death due to cytopenia. The Molecular International Prognostic Scoring System made progress in the dissection of MDS by clinical outcomes. To contribute to the risk stratification of MDS by immunophenotypic profiles, this study performed computational clustering of flow cytometry data of CD34+ cells in 67 MDS, 67 AML patients and 49 controls. Our data revealed heterogeneity also within the MDS-derived CD34+ compartment. In MDS, maintenance of lymphoid progenitors and megakaryocytic-erythroid progenitors predicted favourable outcomes, whereas expansion of granulocyte-monocyte progenitors increased the risk of leukaemic transformation. The proliferation of haematopoietic stem cells and common myeloid progenitors with downregulated CD44 expression, suggestive of impaired haematopoietic differentiation, characterised a distinct MDS subtype with a poor overall survival. This exploratory study demonstrates the prognostic value of known and previously unexplored CD34+ populations and suggests the feasibility of dissecting MDS into a more indolent, a leukaemic and another unfavourable subtype.

. Statistical summary of the diagnostic value of CD34 + populations page  S5.MDS classification based on K-means clustering page Table S6.Statistical summary of the differences in CD34 + populations between MDS subtypes page Figure S1.Flowchart of the study population Normal bone marrow (NBM) was collected from cardiothoracic surgery patients.Patients with reactive conditions, nutritional deficiencies or non-myeloid clonal disorders were considered as pathological controls (PCs).The PC cohort contained one patient with a monoclonal gammopathy of unknown significance and three patients with a monoclonal B cell lymphocytosis (MBL).Samples were excluded in case of incomplete data, experimental errors, treatment before specimen collection and wrong diagnoses that were not of interest for this study, i.e. inconclusive cases (n = 10), CMML (n = 5), myelofibrosis (n = 2), multiple myeloma (MM, n = 1), acute erythroid leukaemia (n = 1), polycythaemia vera (n = 1) and Diamond-Blackfan anaemia (n = 1).One NBM donor was considered a PC because of the presence of MBL.Two patients with MDS in addition to MM with 10% and 12% plasma cells within the BM aspirate were not excluded.The study population comprised 183 patients and controls.There was a significant difference in the number of sexes and the median age at diagnosis between diagnostic groups, with the highest male predominance among MDS patients and the lowest age among AML patients.The median age at diagnosis between MDS, NBM and PC did not differ significantly.The differences in age and number of sexes between groups was tested for statistical significance using the Kruskal-Wallis and Chi-Square tests, respectively.

Sample preparation
Samples from AML patients and controls were analysed fresh.Samples from MDS patients were analysed both fresh (70%) or after cryopreservation (30%).Fresh samples were deprived of erythrocytes using an The 8-colour LSC tube includes common markers (CD45, CD34, CD38) next to lineage-and leukaemia-associated markers.(1) Antigens without expression on normal stem cells are combined within the PE-channel (further referred to as "Combi"), since their cumulative expression on normal stem cells remains negative.CD45RA is absent on normal stem cells but studied separately as this marker was added later on, i.e. after having validated the Combi channel.CD33, CD44 and CD123 are expressed by normal stem cells and should therefore be studied separately to define overand underexpression.All antibodies were purchased from BD Biosciences, except for the CD366 antibody that was obtained from R&D Systems.
Pre-gated FCS files were subjected to pre-processing steps to remove technical errors potentially hindering biological interpretation.Quality control was performed using the R package flowAI that accounts for abrupt fluctuations in the flow rate, signal acquisition instability, and outliers and margin events.(2) Protein expression data were compensated using the spill-over matrix from the FCS file and transformed using hyperbolic arcsin with a cofactor of 150.A manufacturing change in antibody concentration resulted in a time-related difference in the CD34 expression.To minimise this batch effect and to account for any changes in antibody and scatter intensities over time, range scaling (1 -99%) between files was performed.
The median cell count of all files was 0.83•10 6 pre-gated and pre-processed MNCs per file (Figure S3).Files were randomly subsampled to a maximum of 0.5•10 6 MNCs and aggregated into a dataset of 81•10 6 MNCs from 183 subjects.

Figure S3. MNC and CD34 + cell counts
The histograms illustrate the counts of the MNC and CD34 + compartments.The MNC counts are derived from manual pre-gating on the pre-processed files.Differently, the CD34 + counts are derived from the computational selection of the CD34 clusters within the FlowSOM tree applied on the MNC compartment.The red and blue lines indicate median and mean values, respectively.

Clustering analysis
The unsupervised algorithm FlowSOM was used to cluster the MNC and CD34 datasets.S5).Note that 3 out of 26 principal components are shown, explaining only 43% of the variance.

Figure S2 .
Figure S2.Pre-gating strategy page Data pre-processing page Figure S3.MNC and CD34 + cell counts page Clustering analysis page Figure S4.Cell clustering by potential batch effects page Cell cluster interpretation page Figure S5.FlowSOM analysis of the MNC compartment page Figure S6.Distribution of MNC and HSC populations over diagnostic groups page TableS3.Statistical summary of the diagnostic value of CD34 + populations page

Figure S7 .
Figure S7.Clinical and prognostic value of CD34 + progenitors and LSCs page Table S4.Overview of MDS patients with sequential samples page Figure S8.MNC and CD34 + cell compartment changes during follow-up page Figure S9.K-means clustering on the principal components of the population abundancies page Table S5.MDS classification based on K-means clustering page

Figure S2 .
Figure S2.Pre-gating strategyThe FACS plots illustrate the gating strategy of the mononuclear cells (MNC, in light grey).The remaining mature erythrocytes, debris and doublets were excluded in FSC-A/SSC-A and FSC-A/FSC-H plots (upper panel).Remaining CD45 -cells were removed in CD45/SSC-A and CD45/CD34 plots in combination with a CD45 histogram (lower panel).

( 3 )
First, FlowSOM was applied on the aggregated dataset of 81•10 6 manually pre-gated MNCs using CD45, CD34 and the scatter properties as input for cell clustering.The MNC dataset was classified into 25 clusters and 10 metaclusters (or populations).Two CD34 + populations were identified.All cells within the CD34 + metaclusters were selected, ranging from 3.6•10 2 to 2.8•10 6 CD34 + cells (median: 8.3•10 5 CD34 + cells) per sample (FigureS3).The aggregated dataset of 16•10 6 CD34 + cells was subjected to FlowSOM again, discriminating 36 clusters and 25 metaclusters (or HSPC populations) based on all markers apart from the Combi channel.No clustering was observed on potential batch effects including the number of CD34 + cells, used flow cytometer and combined usage of fresh and cryopreserved samples.The moderate clustering over time matched the pattern from AML diagnosis, suggesting that these results were caused by biological signals rather than technical variations over time (FigureS4).The FlowSOM version 1.18.0 was used in R. A general demonstration of the R code of the FlowSOM analysis pipeline is available on GitHub.

Figure S4 .
Figure S4.Cell clustering does not reveal potential batch effects The five CD34 + FlowSOM trees are labelled by diagnosis and potential batch effects.The labels of the CD34 + count, the used flow cytometer and the use of cryopreserved-thawed samples are equally distributed among clusters.Labels indicating the time of experiment match diagnostic patters, suggesting an unequal inclusion of diagnosis over time rather than technical variation.

Figure S5 .
Figure S5.FlowSOM analysis of the MNC compartment (A) FlowSOM tree of CD34 + cells.The height of the plot pie visualises the expression of the surface markers and the scatter properties.The size of the nodes is proportional to the fraction of cells mapped to the node.Two CD34 + populations [IV, XIII] with high CD34 expression (encircled by the dotted line) were selected and subjected to the second FlowSOM clustering procedure.(B) Heatmap summary of scatter properties and marker expressions for each of the MNC populations.(C) FlowSOM trees coloured by the median expression of CD34.

Figure S6 .
Figure S6.Distribution of MNC and HSC populations over diagnostic groups (A) Stacked histograms illustrating the median percentages with the 95% confidence interval of the 10 MNC populations (see Figure S5B) relative to the total MNC compartment per diagnosis.(B) Stacked histograms illustrating the median percentages of 10 CD34 + populations that were classified as HSC subsets relative to the total HSC compartment per diagnosis.Compared to NBM, PCs show increased CD44 dim HSCs [6] , AML samples show increased LSCs [5] and MDS samples show both increased CD44 dim HSCs [6] and LSCs [5].Differences in the relative number of the populations were tested for statistical significance using the Mann-Whitney U test (P values incorporated in the result section).

Figure S7 .
Figure S7.Clinical and prognostic value of CD34 + progenitor and LSC frequencies (A) Relationship between the number of CD34 + population [VIII] and platelet levels and BM blast percentages as enumerated by morphology.The CD34 + population [VIII] contains CD34 + hematopoietic stem and progenitor cells, whereas the blast compartment may also include CD34 -cells.Median values with the 95% confidence interval are shown.Differences were tested for statistical significance by the Kruskal-Wallis test.(B) The Kaplan-Meier curves illustrate the prognostic value of the frequency of CD34 + population [VIII] stratified into three groups based on the 33rd and 67th percentiles in MDS patients for leukaemic transformation and disease progression.Survival distributions were compared using the log-rank test.(C) MDS patients with transformation towards AML during follow-up appear to have a higher frequency of LSCs [5] at diagnosis than MDS patients with a stable disease, although this difference reached no statistical significance based on the Mann-Witney U test.(D) The Kaplan-Meier curves illustrate the prognostic value of the frequency of LSCs [5] stratified into three groups based on the 33rd and 67th percentiles in MDS patients for leukaemic transformation and disease progression.The log-rank test indicated that the difference between the survival distributions was not statistically significant.Abbreviations: AMLt; transformation towards AML; EFS, event-free survival; LFS, leukaemia-free survival; PLT, platelets

Figure S8 .
Figure S8.MNC and CD34 + cell compartment changes during follow-up (supplemental to Figure 4) Samples at time of diagnosis (n = 9) were compared with sequential samples during follow-up (n = 10).From one patient (MDS73), sequential samples were collected during disease progression and after chemotherapy.(A) Heatmap summary of the median percentages of MNC populations at the time of diagnosis and follow-up in MDS patients with a stable disease, disease progression or leukaemic transformation and a residual disease or morphological complete remission after chemotherapy.The MNC populations included progenitors [IV,VIII], CD34 -progenitors [II], SSC low granulocytes [III], monocytes [V-VII], lymphocytes [I,IX] and erythroid cells remaining after cell lysis [X].(B)A principal component analysis applied on the CD34 + population frequencies for 19 samples derived from 9 MDS patients with a stable disease (n=3), progressive disease or leukemic transformation (n=3) and residual disease or morphological complete remission after chemotherapy (n=3).Only the diagnosis and follow-up samples from MDS20 cluster, whereas the other samples are scattered throughout the plot indicating their changed CD34 + cell composition during the disease course.Note that the sample from patient MDS73, which was considered as a morphological complete remission following chemotherapy despite disease progression afterwards again, does not overlap with the samples from stable disease patients.Abbreviations: FU, follow-up; mCR, morphological complete remission; noCR, no complete remission; PC, principal components; PD/AML, progressive disease or leukemic transformation; postCTx, after cytotoxic therapy (chemotherapy or allogeneic stem cell transplantation); SD, stable disease

Table of Contents
Figure S1.Flow chart of the study population page

Table S1 .
Clinical characteristics of study subjects page

Table 2 .
Leukaemic stem cell tube page

Table S3 .
Statistical summary of the diagnostic value of CD34 + populations The table summarises the statistically significant differences in CD34 + population frequencies between diagnostic groups as determined by the Mann-Whitney U test.P-values are presented for significant values (P <0.05) and for trends towards significant values (P = 0.050 -0.100), whereas P-values above 0.100 are presented as ns (not significant).Abbreviations: CLPs, common lymphoid progenitors; CMPs, common myeloid progenitors; GMPs, granulocyte-monocyte progenitors; HSCs, hematopoietic stem cell; LSCs, leukaemic stem cells; MEPs, megakaryocyteerythroid progenitors; MDP, macrophage/dendritic progenitors; prog, progenitors.

Table S4 .
Overview of MDS patients with sequential samples

Table S5 .
MDS classification based on K-means clustering.-meansclustering with k = 9 was applied on the 36 principal components of the frequencies of the FlowSOM populations (see FigureS9).Using NBM and AML samples as reference categories, the clusters with MDS patients (excluding cluster 9) were classified as MDS-IND (indolent), MDS-UN-L (unfavourable, leukaemic), and MDS-UN-O (unfavourable, other).The percentages are proportional to the diagnostic group. K