Deciphering the complex circulating immune cell microenvironment in chronic lymphocytic leukaemia using patient similarity networks

The tissue microenvironment in chronic lymphocytic leukaemia (CLL) plays a key role in the pathogenesis of CLL, but the complex blood microenvironment in CLL has not yet been fully characterised. Therefore, immunophenotyping of circulating immune cells in 244 CLL patients and 52 healthy controls was performed using flow cytometry and analysed by multivariate Patient Similarity Networks (PSNs). Our study revealed high inter-individual heterogeneity in the distribution and activation of bystander immune cells in CLL, depending on the bulk of the CLL cells. High CLL counts were associated with low activation on circulating monocytes and T cells and vice versa. The highest activation of immune cells, particularly of intermediate and non-classical monocytes, was evident in patients treated with novel agents. PSNs revealed a low activation of immune cells in CLL progression, irrespective of IgHV status, Binet stage and TP53 disruption. Patients with high intermediate monocytes (> 5.4%) with low activation were 2.5 times more likely (95% confidence interval 1.421–4.403, P = 0.002) to had shorter time-to-treatment than those with low monocyte counts. Our study demonstrated the association between the activation of circulating immune cells and the bulk of CLL cells. The highest activation of bystander immune cells was detected in patients with slow disease course and in those treated with novel agents. The subset of intermediate monocytes showed predictive value for time-to-treatment in CLL.


CONTENTS Tables:
. Combinations of surface and activation markers used to characterise the immune populations. Table S2. Description of test tubes of the panel with used surface markers, clones and fluorochromes (conjugates). Table S3. Correlations between absolute numbers of chronic lymphocytic leukaemia (CLL) cells and expression of activation markers on immune cell populations or percentages of immune cell subpopulations. Table S4. Characteristics of clusters detected in PSN (presented in Figure 3A in the main manuscript) based on immune cell activation markers in CLL patients. Tables S5. Comparison of immune subset percentages, cell counts and activation markers expressed on immune cells in untreated CLL patients and CLL patients after the immunochemotherapy treatment with the same CLL cell counts (20-80x10 9 /L). Table S6. Comparison of immune subset percentages and activation markers expressed on immune cells in CLL patients after the immunochemotherapy treatment and patients on novel therapy with the equal CLL cell counts (<10.0x10 9 /L). Table S7. Comparison of immune subset percentages and activation markers expressed on immune cells in CLL patients treated with ibrutinib, idelalisib and venetoclax. Figures: Figure S1. The silhouette of clusters detected in PSN (presented in Figure 3A in the main manuscript) based on immune cell activation markers in CLL patients. Figure S2. The difference in HLA-DR expression on intermediate monocytes (MON) between CLL patients with lower and higher numbers of intermediate monocytes (cut-off 5.4%).

Text:
Regardless of CLL cell count, the activation of circulating immune cells is dependent on the treatment regimen. Patient Similarity Network (PSN) analysis. Use of network layout to analyse trends. References Table S1. Combinations of surface and activation markers used to characterise the immune populations. Table S2. Description of test tubes of the panel with used surface markers, clones and fluorochromes (conjugates).    Figure 3A in the main manuscript) based on immune cell activation markers in CLL patients. The expressions of the used markers were normalised to the maximum value in the data set.

Regardless of CLL cell count, the activation of circulating immune cells is dependent on the treatment regimen
To recognise the differences in circulating cells between groups of patients with different treatment regimens, we compared studied parameters in patients with the comparable levels of CLL cells to reduce the impact of CLL cell number on studied parameters. We compared untreated patients (n=36) with treated patients with chemotherapy in the past (n=18) with CLL cell count from 20.0 to 80.0x10 9 CLL cells/L and previously treated patients with  immunochemotherapy (n=22) with patients treated with the novel drugs (n=53) with CLL cell count lower than 10.0x10 9 CLL cells /L. The untreated patient group was not compared with the group of patients treated with novel drugs because the CLL cell number in both groups was mostly incomparable.

Mean (CI)
Comparison of treatment-naïve patients with patients after chemotherapy (Table S4)  When we compared the patients with passed chemotherapy treatment with the patients on novel drug therapy, higher activation of immune cells was observed in patients treated with novel drugs (Table S5). Table S5. Comparison of immune subset percentages, cell counts and activation markers expressed on immune cells in untreated CLL patients and CLL patients after the immunochemotherapy treatment with the same CLL cell counts (20-80x10 9 /L).

Patient similarity network (PSN) analysis
The use of networks to analyse multivariate data is based on constructing a network from vector data. This construction uses a similarity between each pair of vectors in the dataset. In a patient network case, each vector describes one patient represented by one network vertex; components of that vector represent markers used.
The constructed network can be visualised using one of the algorithms, which allows displaying the pair-to-pair relationships between patients in a 2D layout (for example, onscreen). In this layout, such vertices that are sufficiently similar are connected by ties. In our case, one Patient Similarity Network (PSN) vertex represents one patient (later, the term patient is used instead of the term vertex). A natural consequence of the similarity between patients in the network is that similar patients are close to each other in the network layout. Conversely, dissimilar patients are in distant parts of the network layout.
Because of the similarities, groups (clusters) of similar patients can be identified in the network. Such clusters can be identified not only visually but also automatically. To detect nonoverlapping clusters, we use the Louvain method [S1], which is based on the optimisation of the so-called modularity. Modularity measures the strength of the network division into the clusters. The advantage of using methods to detect clusters in networks is that there is no need to estimate their number in advance, and the result can be clusters that differ significantly in size.
If patients in a cluster are highly similar, then the cluster is dense and is visually separated from the rest of the network. The more densely interconnected and more separated the cluster is from its surroundings, the more specific and unambiguously it can be interpreted as a common profile of patients in this cluster. Such a profile can be obtained as a vector representing a virtual (average) patient having individual markers equal to the arithmetic averages of the markers for all patients in the cluster.
The quality of such a profile can then be assessed from two perspectives. The first aspect is the confidence intervals of the averages of markers of all patients in the cluster. The second aspect is the degree of unambiguity with which patients are included in individual clusters. For this purpose, the so-called silhouette is used in data mining [S2], the value of which is from -1 to 1. The closer the value of the silhouette for the individual patient is to 1, the more clearly the patient is a member of the cluster to which they belong. However, in real datasets, there are situations in which some patients may not be clearly assigned. In this case, their silhouettes may be less than 0. This does not mean that they are incorrectly assigned to the cluster; it only shows that such patients could also be assigned to another cluster. Therefore, generally, if the silhouette values are positive for all patients in the cluster (the higher the values, the better), then the cluster defines a clear common profile of the patients in the cluster. Conversely, the more negative values of the silhouette in the cluster, the more problematic is the perception of this profile as an unambiguous characteristic.
The basis for the above considerations is the PSN construction from vector patient data.
Networks can be constructed not only by different algorithms with different settings but also from different subsets of studied markers. Generally, the key is to find a network that is good enough both in terms of modularity (separable clusters) and in terms of silhouette (unambiguous assignment of the patients). Therefore, our methodology is based on the automatic generation of different networks, automatically balancing the relationship between modularity and silhouette. All networks used in our manuscript were constructed by the LRNet algorithm [S3] and selected using the application of this methodology.

Use of network layout to analyse trends
Normally, clusters in networks are distributed in more complex structures, where each cluster is in the layout adjacent to several other clusters. In both networks constructed from the vector data in our study, it can be seen that the clusters are arranged almost linearly, which means that they gradually follow each other. For example, if we display the layout horizontally, the clusters follow each other from left to right. Figure 3A (in the main manuscript) shows the network layout with coloured clusters. It can be seen from the silhouette in the column chart in Figure   S1, patients in the first (blue) and third (green) clusters are least unambiguously included. This is because these two clusters cannot be easily separated from the others (to a lesser extent, this also applies to the other two clusters). However, the profiles of individual clusters (see the right side of Figure 3A in the main manuscript) and their visualisation in the layout show differences in the immune cell activation, CLL cell counts, as well as treatment strategy. This, in most cases, corresponds well to the clusters detected.
Using visualisation, we can add one extra marker to each patient, related to its horizontal position in the network layout. In our case, it is an x-coordinate, where we assigned the xcoordinate equal to zero to the centre of the layout; patients to the left of this centre have a negative x-coordinate, and patients to the right have a positive x-coordinate. The unit on the xaxis, in our case, is a pixel. Figure 4A (in the main manuscript) shows scatterplots expressing the relationships between selected markers and the patient's horizontal position in the layout. In each scatterplot, and in the network layout in Figure 4B (in the main manuscript), a linear trend emphasises the relationship between the patient's position in the PSN layout and the activation rate and treatment, respectively. Figure S1. The silhouette of clusters detected in PSN (presented in Figure 3A in the main manuscript) based on immune cell activation markers in CLL patients. The silhouette shows that most patients were correctly assigned to the individual clusters. The y-axis represents the silhouette value; the x-axis represents patients in individual clusters ordered by decreasing silhouette value.