Single-cell characterization of leukemic and non-leukemic immune repertoires in CD8+ T-cell large granular lymphocytic leukemia

T cell large granular lymphocytic leukemia (T-LGLL) is a rare lymphoproliferative disorder of mature, clonally expanded T cells, where somatic-activating STAT3 mutations are common. Although T-LGLL has been described as a chronic T cell response to an antigen, the function of the non-leukemic immune system in this response is largely uncharacterized. Here, by utilizing single-cell RNA and T cell receptor profiling (scRNA+TCRαβ-seq), we show that irrespective of STAT3 mutation status, T-LGLL clonotypes are more cytotoxic and exhausted than healthy reactive clonotypes. In addition, T-LGLL clonotypes show more active cell communication than reactive clones with non-leukemic immune cells via costimulatory cell–cell interactions, monocyte-secreted proinflammatory cytokines, and T-LGLL-clone-secreted IFNγ. Besides the leukemic repertoire, the non-leukemic T cell repertoire in T-LGLL is also more mature, cytotoxic, and clonally restricted than in other cancers and autoimmune disorders. Finally, 72% of the leukemic T-LGLL clonotypes share T cell receptor similarities with their non-leukemic repertoire, linking the leukemic and non-leukemic repertoires together via possible common target antigens. Our results provide a rationale to prioritize therapies that target the entire immune repertoire and not only the T-LGLL clonotype.

T cell Large Granular Lymphocytic Leukemia (T-LGLL) is a rare lymphoproliferative disease characterized by the accumulation of abnormal, clonally restricted, and activated effector T cells in the blood, bone marrow, and spleen 1,2 . Although immune-mediated cytopenias (most frequently neutropenia, 70-80%) and autoimmune manifestations (most frequently rheumatoid arthritis [RA], 10-18%) are commonly associated with T-LGLL, it usually presents as an indolent disease that is manageable with low-dose immunosuppressive therapies [3][4][5] .
Although chronic antigenic stimulation has been suggested to drive cytotoxic T cell lymphoproliferation in T-LGLL, little has been reported about the function of non-leukemic populations in driving or aiding T-LGLL pathogenesis. Altered B-cell activities (dyscrasias, hypergammaglobulinemia, enhanced production of immunoglobulins, including autoantibodies) 18 , elevated levels of multiple cytokines (e.g., IL-15, TNF, IL-6) [19][20][21][22] , and that IL-15 expression by monocytes can initiate T-LGLL in transgenic mice 19,20 suggest the possible function of non-leukemic cells in the disease. As changes in leukemia cell burden cannot be associated with therapy responses 12,23,24 , and multiple symptoms can be attributed to elevated cytokine expressions 3 , a holistic understanding of the total immune repertoire behind T-LGLL is an unmet need.
Here, we use single-cell RNA and TCR sequencing (scRNA +TCRαβ-seq) to separate T-LGLL clonotypes from their nonleukemic repertoire and compare them with healthy controls, other cancers, and autoimmune disorders to identify the position of T-LGLL in the intersection of cancer, autoimmune disorders, and chronic inflammation. We extend our findings with bulk-RNA-seq, TCRβ-seq, flow cytometry, serum protein profiling, and ex vivo validations. Our systems immunology analysis highlights the synergistic function of clonal and non-clonal immune repertoires in the pathogenesis of T-LGLL and suggests that future therapies should be geared toward attenuating the entire immune system and not the T-LGLL clone alone (Fig. 1a).

T-LGLL cells show elevated cytotoxicity and exhaustion.
To gain an unsupervised view of the immune system in T-LGLL, we analyzed over 150,000 flow cytometry-sorted CD45 + blood mononuclear cells (Supplementary Fig. 1a) from 11 T-LGLL samples from nine individuals and six age-matched healthy controls with scRNA+TCRαβ-seq (10X Genomics, Supplementary Data 1). After initial clustering of the entire dataset (Fig. 1b, Supplementary  Fig. 2a-d), we focused on cells expressing TCR and reclustered these (Fig. 1c, Supplementary Fig. 3a-d). Despite similarities with the clonally expanded CD8 + T cells (defined as at least two cells with identical TCR) from the healthy controls, the clonally expanded CD8 + T cells from T-LGLL patients also had unique T-LGLL-specific characteristics, and they were overrepresented in several CD8 + T cell clusters ( Supplementary Fig. 3e).
As expected, samples from patients with T-LGLL were more clonal and had more expanded cells than those from healthy controls (P < 0.01, Mann-Whitney test) (Fig. 1d). This was invariant to the chosen threshold for expanded clones (Supplementary Fig. 4a-b). To identify transcriptomic differences between T-LGLL cells and reactive cytotoxic clonotypes, we extracted the hyperexpanded clonotypes (defined as at least 10 cells with identical TCR) from patients with T-LGLL and healthy controls and annotated them with the previously calculated clusters (Fig. 1c, e). In healthy controls, the hyperexpanded cells had preferentially CD8 + effector memory (CD8 + T EM ) phenotype (P < 0.0001, two-sided Fisher's test), whereas in T-LGLL, the hyperexpanded cells were phenotypically more heterogeneous (Fig. 1e, Supplementary Fig. 5a). In comparison with hyperexpanded reactive cells from the healthy, the top upregulated genes in T-LGLL cells included multiple cytotoxicityassociated transcripts (GZMB, PRF1, KLRB1, KLRD1), where the most significantly upregulated was NKG7, which is essential in the mobilization of cytotoxic vesicles 25  , and genes associated with T cell exhaustion (LAG3 and TIGIT) (Fig. 1g). The DE genes translated to the top upregulated pathways in T-LGLL being cell killing, T cell activation, and response to IFNγ signaling pathways (Fig. 1h, Supplementary Data 2). In healthy controls, the top DE genes, including other cytotoxic genes (GNLY, LYZ), genes forming calprotectin (S100A8 and S100A9), and CD52, were not enriched to any immune-associated pathway ( Fig. 1f, g, Supplementary Fig. 5b).
To validate the higher cytotoxicity profile in T-LGLL in comparison to reactive cells, we performed flow cytometry analysis with six T-LGLL and six healthy control samples (Supplementary Data 1). Putative T-LGLL clonotypes (CD8 + CD57 + ) were confirmed to express more cytotoxic proteins (GZMA/GZMB P < 0.01, PRF1 P < 0.05, Mann-Whitney test) than the CD8 + CD57 + T cells from healthy controls (Fig. 1i, Supplementary Fig. 6a). In addition, the CD8 + CD57 + T-LGLL cells failed to respond well to anti-CD3/CD28/CD49 antibody-mediated TCR stimulation. Their degranulation responses (CD107a/b) were deficient (P < 0.01), and their cytokine production (TNF/IFNγ) in response to stimulation was diminished (P < 0.01) relative to the healthy CD8 + CD57 + cells (Fig. 1j, Supplementary Fig. 6a). However, basal levels of TNF/IFNγ were higher in T-LGLL ( Supplementary Fig. 6b). Since higher levels of TNF/IFNγ have been previously reported in T cells of patients with latent infections like CMV 26 , we hypothesized that this phenomenon is suggestive of antigen-experienced T cell exhaustion, which is concordant with DE genes.
Wild-type STAT3 T-LGLL clones are more cytotoxic than mutated clones. Notably, STAT3 mutated and wild-type clonotypes imputed from scTCRαβ-seq and amplicon sequencing data were partly grouped separately in the dimensionality-reduced space (Fig. 2f). To validate our manual inference of STAT3 status, we analyzed off-target reads from the scRNA-seq data and identified 83 cells that expressed mutated STAT3 and 200 cells that expressed wild-type STAT3 (Fig. 2f, Supplementary Fig. 8a). STAT3 cells expressing Y640F, S614R, and D661Y mutations were enriched in the CD52 + memory-like cluster 3 (P < 0.001, two-sided Fisher's exact test), whereas the wild-type STAT3 cells were enriched in the cytotoxic cluster 2 (P < 0.0001). The largest cluster (cluster 0) contained both mutated and wild-type STAT3 cells.
DE gene and pathway analysis showed that the wild-type STAT3 cells were more activated and displayed increased cytotoxicity compared with the mutated STAT3 cells. The top DE genes in the wild-type STAT3 cells included GNLY, KLRG1, and CD5, and the most upregulated pathways included T cell activation, upregulated TCR signaling, and response to IFNγ (Fig. 2g, h, Supplementary Data 2). Furthermore, the wild-type STAT3 T-LGLL clonotypes also demonstrated a higher cytotoxicity score 28 (P < 0.0001, Mann-Whitney test) and a lower exhaustion score (P < 0.0001) than the mutated STAT3 T-LGLL clonotypes (Fig. 2i, Supplementary Fig. 8b). On the contrary, the upregulated genes in the mutated STAT3 clonotypes included genes associated with T cell survival (JUND, KLF2) and cytokine signaling (CCL3, CCL4L2, IL2RG), and the top upregulated pathways were associated with protein translation and response to type I interferons (IFNα, IFNβ) although none were significant after P-value adjustment ( Supplementary Fig. 8c).
To validate the differences between mutated and wild-type STAT3 T-LGLLs, we profiled additional patients with mutated STAT3 (n = 10) and wild-type STAT3 (n = 5) CD8 + T-LGLL together with CD8 + T cells from healthy donors (n = 5) with bulk-RNA-seq (Supplementary Data 1). The bulk-RNA-seq data confirmed that the wild-type STAT3 samples were separated from the mutated STAT3 ones in the dimensionality-reduced space by principal component 2 (PC2) which explained 15.85% of the variance (Fig. 2j). The bulk-RNA-seq data also validated the higher cytotoxicity scores of CD8 + T cells in the wild-type STAT3 patients compared with the mutated STAT3 patients (P < 0.05, Mann-Whitney test) (Fig. 2k).
Leukemic and non-leukemic TCRs share structural similarities. Although direct evidence is lacking, it is generally hypothesized that the T-LGLL clones originate from antigen-specific immune responses 3 . As the underlying antigen specificities of the T-LGLL clonotypes remain largely unknown, we combined previously TCRβ-seq profiled T-LGLL clonotypes 24,29 together with our samples profiled with scTCRαβ-seq (n = 11), TCRβ-seq from CD8 + sorted samples (n = 8), and TCRαβs inferred from bulk-RNA-seq (n = 15) data (Supplementary Data 1) to form the largest described dataset of 199 T-LGLL clones from 170 patients (Supplementary Data 3). By genotyping or inferring the HLAtypes 30 from scRNA-seq and bulk-RNA-seq data, we were able to determine the HLA type for 31% of the clonotypes (62/199), and 69% (43/62) were HLA-A*02 + . All T-LGLL clonotypes were restricted to individual patients, and no structural amino acidlevel similarities were identified by GLIPH2 31 , even when the analysis was focused only on the 43 HLA-A*02 + T-LGLL clones. This suggests the absence of shared target antigen(s) driving the clonal expansions in T-LGLL.
Next, we hypothesized that despite there being no shared antigen between patients, the non-leukemic clonotypes could target the same eliciting antigen in individual patients, which would be observed as shared TCR motifs between leukemic and non-leukemic clonotypes. Iterative GLIPH2 analysis performed on CD8 + (n = 8) and mononuclear cell (MNC) sorted (n = 17) TCRβ-seq patient samples indicated that the leukemic T-LGLL clones indeed shared amino acid-level similarities with their nonleukemic repertoire in 72% of patients with T-LGLL (6/8 CD8 + and 12/17 MNC-sorted samples, Fig. 3a, b, Supplementary Fig. 9, Supplementary Data 3). To avoid bias due to differences in the sequencing depth, the samples were subsampled to the same read-depth (30,000 reads per sample) before analysis. Similar results were also obtained after excluding the leukemic clone and subsampling only the non-leukemic TCRs to 30,000 reads per sample ( Supplementary Fig. 10a, Supplementary Data 3). These findings denote that the majority of the leukemic T-LGLL clonotypes are likely to target the same antigen as clonotypes in their non-leukemic repertoire, and therefore, we termed them antigen-driven clonotypes.
To understand whether the antigen drive is restricted to T-LGLL, we performed a similar analysis using TCRβ-seq samples where the left panel denotes the different cohorts, and the right panel highlights the main findings. b Uniform Manifold Approximation and Projection (UMAP) representation of CD45 + sorted cells from T-LGLL (n = 11) and healthy donor (n = 6) samples profiled with scRNA+TCRαβ-seq. Different colors indicate clusters, and cells with detected TCR are highlighted in red. c UMAP representation of the reclustered T cells and their phenotypes (left). Cells with detected TCR (right) were divided into singletons (TCR detected once), expanded (TCR detected ≥2 times), and hyperexpanded (TCR detected ≥10 times) clonotypes. d Proportion of the cells from hyperexpanded (TCR detected ≥10 times) clonotypes as compared between T-LGLL (n = 11) and healthy (n = 6). The definition of box plot visualization is stated in the Methods section Data visualization. P-value was calculated with two-sided Mann-Whitney test. e Focused UMAP of the cells with hyperexpanded TCRs from panel c without reclustering (left). Distribution of the cells from patients with T-LGLL and healthy controls are shown separately (right). f Differentially expressed genes (P adj < 0.05, calculated with a Bonferroni corrected t-test) between hyperexpanded T-LGLL and healthy clonotypes. Top 30 differentially expressed genes from T-LGLL and top 10 from healthy are labelled. X-axis denotes the average log2 fold-change between the two conditions and Y-axis denotes the P adj -value in a negative log10. Dashed line denotes P adj = 0.05. g The scaled expression of the selected top differentially expressed genes highlighted using the same UMAP representation as in panel e. h Top upregulated GOpathways (P adj < 0.05, Benjamini-Hochberg corrected Fisher's one-sided exact test on differentially expressed genes) in T-LGLL clonotypes in comparison to hyperexpanded clonotypes from healthy controls. Colors indicate whether the pathway can be associated to immune function by manual curation. i Protein level expression (mean fluorescence intensity, MFI) of cytotoxic proteins (GMZA, GZMB, and PRF1) from patients with T-LGLL (n = 6) and healthy controls (n = 6) in flow cytometry cohort. P-values were calculated with two-sided Mann-Whitney test. j Protein level expression (log2 fold-change of MFI) of cytokines (TNF and IFNγ) and degranulation markers (CD107a and CD107b) between TCR stimulated and unstimulated conditions. P-values were calculated with two-sided Mann-Whitney test. ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-022-29173-z from patients with RA 32 (n = 45), metastatic melanoma sampled from blood 33 (SKCM, n = 29), and healthy controls (CD8 + sorted 32 , n = 38; MNC-sorted n = 785 34 ) (Supplementary Data 1) with similar subsampling. The antigen drive was the most prevalent in T-LGLL (P < 0.05, two-sided Fisher's exact test) and the least seen in healthy controls (Fig. 3b, Supplementary Data 4). The antigen-driven clonotypes were detected in both mutated (n = 6) and wild-type STAT3 patients (n = 6) in equal proportions (Fig. 3c).
We next asked whether the antigens causing these polyclonal responses in T-LGLL are caused by commonly encountered antigen epitopes. Less than half (74/199, 37.19%) of the T-LGLL clonotypes were found at least once in the healthy (n = 785) TCRβ repertoires 34 , and they explained <1% of the healthy Interestingly, the antigen-driven clonotypes were more frequently observed in the healthy controls' (n = 785) TCRβ repertoires than the non-antigen-driven clonotypes (P < 0.01, Kruskal-Wallis test, Supplementary Fig. 10e), suggesting that antigen-driven clonotypes could recognize commonly encountered antigens. Therefore, we predicted the antigen specificities for T-LGLL clonotypes with a supervised machine-learning method TCRGP 35 against common viral epitopes from CMV, EBV, Influenza A, and HSV2. Only 2 of 199 (1.0 %) T-LGLL clonotypes were predicted to recognize these antigens and both clonotypes were targeting CMV pp65 epitope (Supplementary Data 3). As the TCRGP-models have been trained using data from HLA-A*02 donors, we next focused only on the 43 T-LGLL clonotypes detected in HLA-A*02+ patients. None of the T-LGLL clonotypes from either HLA-A*02+ positive (n = 43) or HLA-A*02 negative (n = 19) patients were predicted to recognize these viruses (Supplementary Data 1 and 3). The two TCRs predicted to target CMV pp65 were from patients from which HLA type was not available. Overall, these results suggest that these four viruses do not contain major driver antigens for T-LGLL.
We next studied a cohort of T-LGLL patients from which follow-up samples were available 24 (n = 17, 38 samples). We noted that the same patient could harbor both antigen-driven and non-driven clones, and that the clonotypes with antigen drive were larger than the non-driven clonotypes during follow-up (P < 0.05, Mann-Whitney test, Fig. 3d, Supplementary Fig. 10f) suggesting that antigen drive can potentially provide a growth advantage for the clones.
To further understand the evolution of the antigen-driven and non-driven clones, we studied samples from a patient with both types of clones (patient 1). The analysis was performed using three peripheral blood samples collected seven years apart (2011-2018). Although the patient had no treatment for her T-LGLL disease, the dominant STAT3 mutated clone (78% → 10%) which was not antigen-driven, was replaced by an antigen-driven STAT3 wild-type clone (5% → 24%) carrying a different TCRαβ (Fig. 3e, f, Supplementary Fig. 11a-e). The non-antigen-driven STAT3 mutated and the antigen-driven STAT3 wild-type clones were phenotypically different and possibly two different maturation endpoints as suggested by trajectory analysis with Slingshot 36 (Fig. 3f). The expanding, antigen-driven wild-type STAT3 clone was more cytotoxic than the shrinking STAT3 mutated clone (Fig. 3f, Supplementary Fig. 11a-e). The top DE genes in the expanded clone included GZMH, GNLY, and FCGR3A (CD16) and the clone had a higher cytotoxicity score than the shrinking one (P < 0.0001, Mann-Whitney test, Fig. 3f, Supplementary Fig. 11d, Supplementary Data 2). Conversely, the STAT3 mutated shrinking clone presented an attenuated CD8 + T EM phenotype marked by the expression of GZMK and upregulated SOCS1 and SOCS3, which are known to inhibit JAK-STAT signaling 19 .
To conclude, our results suggest that a majority, but not all, T-LGLL clonotypes could originate from an initial polyclonal antigen-response. Also, the antigen-driven clonotypes are larger and often display a more cytotoxic phenotype than the nondriven T-LGLL clonotypes.
In T-LGLL non-leukemic T cell populations are mature and clonal. After finding the connection that clonal and non-clonal immune cell repertoires are possibly connected via antigen preferences, we sought to investigate the phenotypes of non-leukemic immune cell populations in T-LGLL in detail. As the presence of T-LGLL clonotypes biases the immune repertoire, we removed the clonally expanded T-LGLL cells from scRNA+TCRαβ-seq data and compared it with similar data from solid cancers 37 (n = 3), hematologic cancers 38 (n = 8), and healthy controls (n = 6) (Supplementary Data 1). After clustering (Fig. 4a, Supplementary  Fig. 12a-c), we observed that, in comparison with the other cancers, the proportion of conventional dendritic cells (cDCs) (P < 0.01 and P adj = 0.052, Benjamini-Hochberg corrected Mann-Whitney test) and naïve B-cells (P < 0.05 and P adj = 0.16) were reduced, and the proportion of mature CD4 + T EM -cells (clusters 2 and 13, both P < 0.01 and P adj = 0.052) was increased in T-LGLL (Fig. 4b, Supplementary Fig. 13a-b). When including only patients with hematologic cancers (n = 8), CD4 + T EM cells were still markedly elevated in T-LGLL (Fig. 4b). Similar results were obtained when compared to healthy controls (n = 6, Supplementary Fig. 13c, d).
A patient cohort profiled with flow cytometry validated that patients with T-LGLL had a significantly higher percentage of mature terminally differentiated, antigen-experienced CD4 + CD57 + T cells 39,40 when compared with the healthy controls (P < 0.05, Mann-Whitney test, Fig. 4c, Supplementary Fig. 13e-g). To support the maturity of the CD4 + T cells in T-LGLL, we noted that the proliferative capacity of the total CD4 + T cell compartment, measured as carboxyfluorescein succinimidyl ester (CFSE) dilution upon TCR ligation and TLR stimulation, was reduced in T-LGLL compared with the healthy controls (P < 0.05, Fig. 4d, Supplementary Fig. 13f). LGLL clones are phenotypically heterogeneous and wild-type STAT3 clones are more cytotoxic than mutated STAT3 clones. a Clonal expansion of clonotypes in T-LGLL (n = 11) and one representative healthy control. Each box denotes a unique T cell clonotype in a sample as detected with scTCRαβseq and the size of the box corresponds to its frequency in the repertoire occupancy. b UMAP representation of the transcriptomes of the selected 18 T-LGLL clonotypes highlighted in panel a from 11 T-LGLL samples (n = 9 patients). c Scaled expression of selected differentially expressed genes between T-LGLL clusters highlighted in the same UMAP representation as in panel b. d Heatmap showing scaled expression of differentially expressed genes (P adj < 0.05, calculated with Bonferroni corrected two-sided t-test) between T-LGLL phenotype clusters. Cluster (cl) numbers referring to panel b are marked on right side of the heatmap. e Proportion of the different patient-specific T-LGLL clonotypes in different clusters. Each bar represents an individual T-LGLL clone. f The imputed and detected STAT3 mutation status presented in the UMAP representation. The inferred STAT3 mutation status was obtained clonotype-wise as in panel a (shown in the panel on the left) and the detected STAT3 mutation status was retrieved from the scRNA-seq data using a variant detection tool Vartrix (shown in the panel on the right). g Differentially expressed genes between (P adj < 0.05, calculated with Bonferroni corrected two-sided t-test) the mutated and wild-type STAT3 T-LGLL clones. Top 20 genes from each condition are labeled. X-axis denotes the average log2 fold-change between the two conditions and Y-axis denotes the P adj -value in a negative log10 transformed scale. h Top upregulated GO-pathways (P adj < 0.05, Benjamini-Hochberg corrected Fisher's one-sided exact test on differentially expressed genes) in wild-type STAT3 in comparison to mutated STAT3 clonotypes. i Cytotoxicity score of individual cells in wild-type STAT3 clones in comparison to mutated STAT3 clones in scRNA-seq. P-value was calculated with two-sided Mann-Whitney test. j Principal component analysis (PCA) plot from bulk-RNA-sequencing data from 10 mutated and 5 wild-type STAT3 T-LGLL patients' and 5 healthy donors' CD8 + -sorted T cells. k Cytotoxicity score of the wild-type STAT3 patients (n = 5) as compared to the STAT3 mutated (n = 10) patients' scores in the bulk-RNA-seq validation cohort. P-value was calculated with two-sided Kruskal-Wallis test. . T-LGLL patients have more antigen-driven cases than the rest of the conditions (P < 0.05, Fisher's one-sided exact test). All samples were subsampled to the same read-depth (30,000 reads per sample). Results where the T-LGLL clone was excluded before downsampling and in which the subsampling was only done for the non-leukemic library are shown in the Supplementary Fig. 10a. c The proportion of mutated (n = 6) and wild-type STAT3 patients (n = 6) where antigen-driven or no antigen-driven clonotypes were detected in the MNC-cohort. d The evolution of antigen-driven and non-antigen-driven T-LGLL clonotypes in multiple timepoints. Individual lines correspond to individual T-LGLL clonotypes while the bolded line shows the median. P-value was calculated with two-sided Mann-Whitney test. e Flow cytometry analysis of Vβ repertoires and variant allele frequency (VAF) of STAT3 Y640F clone (located in the shrinking Vβ1 clone) are used to demonstrate T-LGLL clonal dynamics (clonal drift) in patient 1. f UMAP representation of the CD8 + T cells from patient 1 from two different timepoints. The left panel highlights the different clusters, and the superimposed lines correspond to predicted maturation trajectories (pseudotime) calculated with a pseudotime algorithm Slingshot. The middle panel illustrates the expanding and shrinking clones. The panel on the right highlights the previously defined cytotoxicity score in T-LGLL clonotypes. Besides increasing T cell maturity, antigen-driven processes increase T cell repertoire clonality. Therefore, we compared the clonality of the non-leukemic CD8 + T cells in T-LGLL to CD8 + sorted healthy and RA samples 24,32 and found that the non-leukemic CD8 + T cells in patients with T-LGLL had a more restricted TCR repertoire than patients with RA (P < 0.01, Mann-Whitney test) and healthy controls (P < 0.05), latter of which was validated in the MNC-cohort (P < 0.0001, Fig. 4e). In addition, the non-leukemic repertoires of wild-type STAT3 patients were more clonal than those of the mutated STAT3 patients (P < 0.0001) (Fig. 4e). IFNγ drives activation of the non-leukemic immune cell repertoire. Besides changes in cell abundances, scRNA-seq also showed the activation of different non-leukemic subsets in T-LGLL in comparison with patients with other cancers and healthy controls. For example, the expression of different cytokines (CCL2/3/4/ 5), co-stimulatory genes (CD27, TNFRSF4 [HVEM], TNFRSF14 [OX40], TNFRSF25 [DR3]), and IFNγ response genes (e.g., B2M, TAP1, HLA molecules) were upregulated in different non-leukemic NK-cells, monocytes, and B-cell clusters in comparison with healthy controls (Fig. 5a), other cancers ( Supplementary Fig. 14a), and patients with blood cancers (Supplementary Fig 14b, Supplemen-tary Data 2). Notably, the expression of different cytotoxic genes (GZMA/B/H, PRF1, NKG7) was upregulated in non-leukemic CD8 + , CD4 + , and NK-cell clusters. The phenotype of the nonleukemic T cells was validated in the flow cytometry cohort, where CD8 + CD57 − and total CD4 + populations in T-LGLL expressed higher levels of GZMA/B (P < 0.01 for CD8 + CD57 − and CD4 + , Mann-Whitney test) and PRF1 (P < 0.01 for CD8 + CD57 − and CD4 + ) than healthy (Fig. 5b).
To understand the pathways driving immune activation in T-LGLL, we performed pathway analysis among patients with T-LGLL, patients with other cancers, and healthy subsets. The most upregulated pathways in T-LGLL included IFNγ-response (upregulated in 12 (Fig. 5c, Supplementary Fig. 14c, d).
We focused on the IFNγ response, as it was among the most upregulated pathway in all comparisons and quantified its effect by calculating an IFNγ response module score 41 in all immune subsets in individual patients. The strongest IFNγ response was seen in different myeloid subsets (CD16 + monocytes, CD16 − monocytes, cDCs), NK-cells, and CD8 + T EM cells (Fig. 5d). In unsupervised clustering, the samples were split into two groups, high IFNγ and low IFNγ, based on their IFNγ-response scores. The samples from the T-LGLL group were enriched to the high-IFNγ group (P < 0.05, Fisher's one-sided exact test), confirming that IFNγ response is more strongly activated in T-LGLL than in other cancers. A similar analysis with the NF-κB pathway did not show enrichment of T-LGLL samples ( Supplementary Fig. 14e). Interestingly, T-LGLL cells expressed higher amounts of IFNG than non-leukemic cells (P < 0.0001, Mann-Whitney), where the highest expression was seen in cytokine-secreting T-LGLL cluster 4 (Fig. 5e).
In addition to being the most important cytokine producers, monocytes were also the most transcriptionally altered subpopulations between T-LGLL and other conditions in the DE gene analysis (first, CD16 + monocytes; fourth, CD16monocytes) ( Supplementary Fig. 15b-g). Flow cytometry analysis also confirmed that although the total number of monocytes was reduced (P < 0.05, Mann-Whitney test), the distribution of different monocyte subsets was altered, and T-LGLL patients had a bigger proportion of CD16 + cells (P < 0.05, Supplementary  Fig. 16a, b) out of the CD14 + monocytes. The upregulated DE genes in monocyte populations included multiple HLA molecules and classical scavenging receptors (e.g., CLEC10A, CD44, CLEC2B, CLEC9A, MRC1), translating into upregulated HLA class II 28 (P < 0.0001, Kruskal-Wallis test) and scavenging scores (P < 0.0001, Fig. 6c, Supplementary Fig. 16c, Supplementary  Data 2). To analyze the antigen-presenting function of the monocytes, we incubated blood MNCs with fluorescent microspheres and found that the proportions of bead-adhering CD14 + CD16 + and CD14 dim CD16 + monocytes were increased in T-LGLL compared with healthy controls (P < 0.05 Mann-Whitney test, Fig. 6d, Supplementary Fig. 16d), which may indicate higher scavenging potential.
Next, we calculated ligand-receptor interactions with CellPhoneDB 43 between T-LGLL clonotypes and other immune cells and compared that to the interactome of hyperexpanded clonotypes from healthy controls. The interactome analysis implicated an increased number of interactions between T-LGLL clonotypes and other immune cells in comparison with healthy hyperexpanded clonotypes (Fig. 6e). The majority of the differences could be tracked to T-LGLL-monocyte interactions, and many of the predicted interactions could be attributed as costimulatory (e.g., CD2-CD58, CD48-CD244, CLEC2B-KLRF1, TNFSF14-TNFRSF14); while only a few interactions were inhibitory (e.g., LGALS9-HAVCR2) (Fig. 6f). Based on the number of ligand-receptor interactions, T-LGLL clonotypes formed three clusters: (1) strongly interacting (the highest number of interactions), (2) interacting, and (3) immune independent (the lowest number of interactions), which was also evident in the focused clustering of T-LGLL clonotypes (Fig. 6g). Immune independent STAT3 mutated T-LGLL clonotype from patient 2 had the lowest number of interactions, and during the 1-year follow-up, the size and the phenotype of this clone were stable ( Supplementary Fig. 17a-e).

Discussion
The asset of scRNA+TCRαβ-seq in this study is its accuracy in identifying the TCR-sequence-restricted clonal expansions. In other non-T cell-malignancies, similar cell-specific markers are rarely available or require simultaneous DNA sequencing to define clonal cells. With scRNA+TCRαβ-seq, we were able to perform detailed and precise characterizations of the expanded T-LGLL clones from the oligo-and polyclonal CD8 + T cell repertoires and show evidence of a strong antigen-driven immune response that shapes the entire immune cell repertoire in T-LGLL.
T-LGLL clonotypes are known to overexpress cytotoxic and T cell activation-associated genes in comparison with their healthy reactive counterparts 44,45 . Here, we also found exhaustionassociated genes, such as LAG3 and TIGIT, among the most upregulated genes between T-LGLL and hyperexpanded cells in healthy controls, but not the previously found PDCD1 (PD1) and HAVCR2 (TIM-3) 44,45 . Our in vitro validation suggested that TCR ligation fails to trigger normal degranulation and cytokine production in the T-LGLL clonotypes. These findings may partly explain why T-LGLL usually presents only with moderate lymphocytosis and rarely develops into a more aggressive (proliferative) disease despite highly activating STAT3 and STAT5B mutations 46,47 .
Our data demonstrate inter and intrapatient heterogeneity, where T-LGLL clones with the same TCR rearrangement can harbor multiple phenotypes. Importantly, the CD16 + CCL4 + LAG3 + TOX + phenotype was identified as the dominant phenotype in most clonotypes (13/18, 72.2%) and was seen in patients with either mutated or wild-type STAT3, further unifying these diseases besides the noted shared JAK-STAT activity 19 . As this phenotype differs significantly from the effector memory phenotype of hyperexpanded CD8 + T cells in healthy controls, it could aid the diagnostic process, particularly in the distinction of wild-type STAT3 cases from reactive processes. However, the finding that wild-type STAT3 clonotypes have higher T cell activity, cytotoxicity and non-leukemic clonality than those with mutated STAT3, which was not seen in a previous publication with flow cytometry 14 , proposes that mutated STAT3 T-LGLL, wild-type STAT3 T-LGLL, and reactive processes arise from different pathogeneses.
The eliciting antigen in T-LGLL has remained elusive. We analyzed TCRs in both unsupervised and supervised manners with the current best-practice bioinformatics tools 31,35,48 but found no evidence of common putative, known or unknown antigens, even in individual patients. The obvious limitation in these analyses is that they were done independently of HLA-genotype or involved only T-LGLL clones from patients with HLA-A*02+ background. Unfortunately, HLA information was not available from all patients that were included in the previously published T-LGLL TCR datasets 24 . Supervised TCRGP tool has shown to outperform other similar methods 35 , when the genotype of the analyzed TCR repertoire is known. However, as the training data of epitopespecific TCRs is limited, it is probable that the existing TCR analysis tools do not capture the full heterogeneity of antigen-specific repertoire, resulting in false negatives even in the cases of HLA-A*02 + patients. Nevertheless, our results imply that the common denominator underlying T-LGLL patients is perhaps not the antigen, but rather the environmental, genetic, and/or immunological factors that support the expansion and persistence of T-LGLL clonotypes. These results are in accordance with Gao et al. 49 , who profiled alemtuzumab treated T-LGLL patients with scRNA +TCRαβ-seq, and no shared T-LGLL clonotypes or T-LGLL clonotypes targeting known antigens were observed.
Our results do not, however, contradict that T-LGLL is driven by an abnormal response to an antigen. On the contrary, in an analysis that is invariant to HLA genotypes, we observed that over half (72%) of the T-LGLL clonotype TCRs share structural similarities with TCRs from the same patients' non-leukemic repertoires. Our results from the antigen drive support the view that the antigen response in T-LGLL is poly-or oligoclonal, rather than monoclonal. Our results are in line with the previous data suggesting that STAT3 mutation follows the initial clonal expansion and is an event that solidifies the clonal dominance 3 . The antigen-driven clonotypes in T-LGLL patients were larger, and they could occur concomitantly with non-antigen-driven clones. Interestingly, in one patient with follow-up samples, the mutated STAT3 clone was replaced by a more cytotoxic wild-type STAT3 clone. Further, the non-leukemic CD8 + and CD4 + T cell repertoires in T-LGLL were more mature, cytotoxic, and clonally restricted than in other cancers, in RA, and in healthy controls, suggesting the strong immune-editing capacity of a driving antigen. The advent of high-throughput epitope-MHC-TCRscreening tools 50 and their use in T-LGLL will provide invaluable information about the antigen-specific response in general.
Also other findings, besides non-leukemic CD8 + and CD4 + T cells, further support the idea of an aberrant oligoclonal immune response against a patient-specific antigen as a disease-inducing and evolution-driving trigger in T-LGLL. We noted increased costimulatory cell-cell interactions between T-LGLL clonotypes and monocytes and enhanced antigen-presenting cell function of monocytes. The immunological factor driving these differences was a response to IFNγ and it was the most evident in monocyte populations. The IFNG was preferentially expressed by T-LGLL clonotypes, and not by monocytes, linking the leukemic and nonleukemic repertoires into a vicious cycle. With only incidental cases of clonal drift seen in our data, we cannot pinpoint whether the non-leukemic immune repertoire caused the transformation of a T cell clonotype to a T-LGLL clone or vice versa, which needs to be addressed in future studies.
Current therapies in T-LGLL, including corticosteroids, methotrexate, and cyclosporine A, offer unsatisfactory results, as over half of patients eventually relapse 18 , posing a need for combined or sequential therapies. Current salvage therapies include T cell depleting anti-CD52 (alemtuzumab) and anti-CD3 (anti-thymocyte globulin) regimens 3,51,52 . These approaches also target non-T-LGLL clones which could explain why TCR repertoire does not diversify after alemtuzumab treatment 49 and why treatment responses do not correlate with the STAT3 mutation status or clonal burden 23 . Moreover, treatments that attenuate the entire immune system have shown encouraging results, both as first-line (cyclophosphamide, >70% response rate) 53 and salvage therapies (tofacitinib, a JAK3 inhibitor >60% response rate) 54 .
In conclusion, our study highlights how the entire immune cell repertoire, including hyperexpanded CD8 + T-LGLL cells, nonleukemic CD8 + cells, CD4 + cells, and monocytes, contribute to the CD8 + T-LGLL disease phenotype. An aberrant antigen-driven immune response shapes the repertoire and maintains the persistence of the hyperexpanded T-LGLL clonotypes. Our results imply that future therapies should not only target the T-LGLL clonotypes but also other immune cell types and their interactions to transform the outcome of patients with T-LGLL.   5 IFNγ secretion by T-LGLL clonotypes drives the activation of the non-leukemic immune cell repertoire. a Expression of selected differentially expressed genes (P adj < 0.05, calculated with Bonferroni corrected two-sided t-test) grouped by their functional pathways between the non-leukemic CD45 + sorted cells from patients with T-LGLL (n = 9) and healthy controls (n = 6). Values are presented as log2 fold-change (log2fc). b Left: Protein level expression (MFI mean fluorescence intensity) of cytotoxic proteins GZMA/B and PRF1 in CD8 + CD57 − cells in T-LGLL patients (n = 6) and healthy controls (n = 6). Right: The proportion of GZMA/B and PRF1 + CD4 + cells in the flow cytometry cohort. P-values calculated with two-sided Mann-Whitney test. c Upregulated HALLMARK-category pathways (P adj < 0.05, Benjamini-Hochberg corrected Fisher's one-sided exact test on differentially expressed genes) in non-leukemic cells from T-LGLL (n = 9) in comparison with healthy (n = 6). d Median expression of the IFNγ response module score in different immune subsets in patients with T-LGLL (n = 9), healthy controls (n = 6), and patients with other cancers (n = 11). The T-LGLL samples were enriched in the IFNγ high cluster (P < 0.05, Fisher's one-sided exact test). Clustering was performed with Ward's linkage. e Left: Scaled expression of IFNG in leukemic (red) and non-leukemic (green) populations. P-value was calculated with two-sided Mann-Whitney test. Right: Scaled expression of IFNG in different leukemic (red) and non-leukemic populations (green). Cluster numbers refer to Fig. 2b (leukemic clusters) and Fig. 4a (non-leukemic clusters). Amplicon sequencing. To detect STAT3 mutations, locus-specific primers were designed covering the Src homology 2 (SH2) domain of STAT3 (exons 19-24) as reported previosly 6 . The list of primers used in this study is provided in Supplementary Data 1. Illumina HiSeq System was used as described previously 55 . Briefly, 2 step PCR protocol (Illumina) was used with coverage of over 100,000× and variant allele frequency detection sensitivity of 0.5%. It was then sequenced using Illumina HiSeq Reagent Kit v4 100 cycles kit or Illumina MiSeq System using MiSeq 600 cycles kit (Illumina, San Diego, CA, USA).
Single-cell RNA and TCRαβ-sequencing and data analysis. Viably frozen cells from 11 T-LGLL samples from 9 T-LGLL patients and 6 age-matched healthy samples were thawed in PBS with 2 mM EDTA and stained with anti CD45 + APC-H7 (Cat#: 560178 BD Biosciences) antibody. CD45 + cells were selected with Sony SH800 (Sony Biotechnology Inc.). Single-cells were partitioned using a Chromium Controller (10X Genomics) and scRNA-seq and TCRαβ-libraries were prepared using Chromium Single Cell 5′ Library & Gel Bead Kit (10X Genomics) (CG000086 Rev D) as done in Kim et al. 56 . Briefly, from individual samples 17,000 cells were suspended in 0.04% BSA and then loaded to a Chromium Single Cell A Chip. After generation of single-cell barcoded cDNA the remaining steps were performed in bulk. To amplify full-length cDNA 14 cycles of PCR (Veriti, Applied Biosystems) were run. Chromium Single Cell Human T cell V(D)J Enrichment Kit (10× Genomics) was used to amplify TCR cDNA. Illumina NovaSeq, S1 flowcell (read length configuration: Read1 = 26, i7 = 8, i5 = 0, Read2 = 91) was used for sequencing gene expression libraries. Illumina HiSeq2500 in Rapid Run (read length configuration: Read1 = 150, i7 = 8, i5 = 0, Read2 = 150) was used for sequencing TCR-enriched libraries. The raw data were processed using Cell Ranger (ver 2.1.1) with GRCh38 as the reference genome. Additional scRNA-seq data from CD45 + sorted samples from patients with chronic myeloid leukemia (n = 4), chronic lymphocytic leukemia (n = 4), non-small cell lung carcinoma (n = 1), and renal cell carcinoma (n = 3) were also gathered as stated in Supplementary Data 1.
For the T-LGLL samples, specific quality control thresholds were used for individual samples to retain the T-LGLL cells since T-LGLL samples showed considerable heterogeneity and viability levels during library preparation (Supplementary Data 1). For the healthy samples and the non-leukemic analyses for T-LGLL with comparison data from CLL, CML, RCC, and NSCLC data cells with >15% mitochondrial transcripts, <10% or >50% ribosomal transcripts, <250 or >4,500 expressed genes or <1,000 or >20,000 UMI counts were removed from the analysis. For the non-leukemic analysis, the leukemic cell populations from T-LGLL, CLL, and CML samples were removed as well as a cluster that was specific to T-LGLL and healthy samples produced for this project.
To overcome batch-effect, we used scVI (ver 0.5.0) 57 with default parameters where each sample was treated as a batch. The obtained latent embeddings were then used for graph-based clustering and uniform mainifold approximation and projection (UMAP) dimensionality reduction implemented in Seurat (ver 3.0.0) 58,59 . The datasets were scaled with 3,000 most highly variable genes with the FindVariable-function and ScaleData-functions with default parameters. For each different clustering, the genes related to V(D)J-recombination were removed and the resolution values in FindClusters-function were inspected visually within the range of 0.1-3 with intervals of 0.1, where the chosen values were within 0.2-0.5 to prevent overclustering (for 0.2 for Fig. 1b, 0.5 for Fig. 1c and the same clusters are in Fig. 1e, 0.2 for Fig. 2b, 0.2 for Fig. 3g, 0.5 for Fig. 4a, and 0.3 for Supplementary  Fig. 17a). Clusters are named in descending order (cluster 0 contains the most cells) and were annotated by analysis of canonical markers, differentially expressed genes, relationship to other clusters, signature scores, T cell receptor repertoire clonalities, and reference-bases cell-type annotation with SingleR 60 (ver 1.2.4) with Blueprint 61 as a reference. For UMAP-dimensionality reductions, the default parameters in RunUMAP-function were used throughout. Pseudotime analyses were done with Slingshot (ver 1.1.4) 36 on unsupervised mode on precalculated UMAP coordinates with default parameters.
Differential expression analyses were performed based on the t-test, as suggested by Soneson et al. 62 , and P-values were adjusted with Bonferroni correction. Enrichment analyses were performed with the up or downregulated genes (P adj < 0.05) with hypergeometric testing implemented in ClusterProfiler (3.16.0) 63 with GO-and HALLMARK-categories gathered from MSigDB. GO-categories were inspected manually, and redundant pathways were removed from visualizations but retained in Supplementary Data 2.
Different scores were calculated with the AddModuleScore-function, as suggested by Tirosh et al. 41 , which briefly considers the expression of a given set of genes and subtracts a similarly counted expression of a randomly selected gene set. Cytotoxicity score was calculated with genes defined by Dufva and Pölönen et al. 28  Ligand-receptor interaction analyses were performed with CellPhoneDB (ver 2.0.0) 43 with default parameters for subsets with at least 50 cells and 1,000 iterations for the permutation testing. The co-stimulatory and coinhibitory receptor-ligand pairs were gathered from Dufva and Pölönen et al. 28 .
To calculate regulons for T-LGLL clonotypes phenotypes, the SCENIC 27 (ver 1.2.4) vignette was followed with the default parameters.
Heat maps were performed with the ComplexHeatmap package (ver. 2.4.2), where different clustering analyses were performed with Ward's linkage with default parameters and seed as 123. For clustering based on the IFNγ and NF-κB scores, k was chosen as 2 and for the interactome analysis as 4 after visual inspections for values of k between 2-10.
For scTCRαβ-seq, and only TCR productive full-length sequence information were considered and all ambiguous cells with multiple TCRα and/or TCRβ chains were removed. Clones were defined as exact same CDR3 amino acid sequence in both TCRαβ-chains, if available, or just in TCRβ-chain. The clonotypes for individual samples have been named in descending order (clonotype 1 contains the most cells). T-LGLL clonotypes were inferred as stated in the manuscript by manually curating data from scTCRαβ-seq, Vβ flow cytometry, and STAT3 amplicon sequencing data. From scTCRαβ, wild-type T-LGLL clonotype had to explain at least >5% of total TCR repertoire (in any time point, if multiple timepoints present). For patient 1, clonotype 4 was seen in both timepoints in scTCRαβ-seq data but was filtered during quality control in scRNA-seq data in time point 2015.
Bulk-RNA sequencing and data analysis. Bulk-RNA-sequencing was performed as described by Savola et al. 32 . Briefly, Qiagen miRNeasy micro kit (cat. no 217084) and SMART-Seq v4 Ultra Low Input RNA Kit (cat. no. 634890) was used to extract RNA. Sequencing was conducted using Illumina Nextera XT kit (FC-131-1096). Data filtering was done using Trimmomatics (filtering parameters leading: 3, trailing: 3, sliding window: 4:15 and minlen: 36). STAR aligner was used for alignment using the human reference genome (Ensembl GRCh38). EdgeR (3.3.3) 66 was used to count the DEGs, where read counts have normalized with the Trimmed Mean of M-values (TMM) method with exact Test-function implemented in edgeR with dispersion = "common" option. Cytotoxicity scores were calculated as geometric means as suggested by Dufva and Pölönen et al 28 , with the same genes as Fig. 6 T-LGLL clonotypes have increased amounts of predicted cell-cell interactions, especially with monocytes. a Differentially expressed (unadjusted two-sided Mann-Whitney test) plasma cytokines between patients with T-LGLL (n = 9) and healthy controls (n = 8), where cytokines P < 0.05 (horizontal line) are labeled. b Median expression of differentially expressed cytokines in the scRNA-seq data in non-leukemic and leukemic immune cell subsets in the patients with T-LGLL (cluster numbers refer to Fig. 4a). Heatmap clustering was performed with Ward's linkage. Values are scaled for each column. c HLA class II module score in T-LGLL in different monocyte clusters (as seen in Fig. 4a) in comparison with healthy controls and other disease cohorts. P-values were calculated with two-sided Kruskal-Wallis test. d Proportion of bead-adhering (fluorescent microspheres) CD16 + and CD16 + CD14 dim monocytes in patients with T-LGLL (n = 6) in comparison to healthy controls (n = 6). P-values were calculated with Mann-Whitney test. e Number of significant ligand-receptor interactions (P < 0.05, CellPhoneDB permutation test) between T-LGLL clonotypes (as shown in Fig. 2e) or the top expanded hyperexpanded clonotypes (>50 TCRs) from healthy controls and different non-leukemic immune cell subpopulations, calculated CellPhoneDB. Clustering was performed with Ward's linkage. Color scale from blue to red marks the number of predicted interactions between cell types. f Number of significant ligand-receptor interactions of T-LGLL clonotypes with different immune subpopulation. Shown receptor-ligand pairs are statistically significant interactions that have been attributed as co-stimulatory or inhibitory. The color indicates the number of T-LGLL clonotypes that have the interaction with the cell type. g Left: UMAP representation showing the distribution of the T-LGLL clonotypes with different numbers of interactions with their non-leukemic counterparts. Right: UMAP representation of the transcriptomes of the selected 18 leukemic T-LGLL clonotypes as seen in Fig. 2b. above in scRNA-seq analysis. TCRαβ-sequences were gathered from bulk-RNAsequencing data with MiXCR (ver 3.0.13) 67 with default parameters.
TCRβ-sequencing and data analysis. TCRβ-sequencing from the genomic DNA was conducted with ImmunoSEQ assay by Adaptive Biotechnologies Corp as per manufacturers guidance and as previously described by Savola et al. 32 . Additional TCRβ data from CD8 + sorted samples from patients with rheumatoid arthritis from diagnosis (n = 32), metastatic melanoma from diagnosis (n = 29), and healthy control samples (n = 38) from peripheral blood and MNC-sorted samples from patients with T-LGLL (n = 38) or healthy controls (n = 785) from peripheral blood were also gathered as stated in Supplementary Data 1.
Analyses were done with VDJtools (ver 1.2.1) 68 , where non-functional clonotypes were removed and diversity indices calculated with CalcDiversityStatsfunction. To allow reliable diversity metrics, all samples were subsampled to 30,000 reads and samples that had fewer reads were removed from further analyses (n = 28; 13 RA samples and 15 T-LGLL samples from Kerr et al.).
TCRs were grouped based on amino acid-level-similarities decided by GLIPH2 (1.0.0) 31 , with default parameters and CD8 as reference sets for CD8 + -sorted samples and CD4CD8 for MNC-sorted samples. To detect antigen-driven clonotypes, the subsampled TCRβ-seq or scTCRαβ-seq samples were inputted individually to GLIPH2. The analysis was repeated also for samples where the nonleukemic repertoire in T-LGLL or to samples without the largest clone for the rest of the cohorts were subsampled to the same read-depth of 30,000 reads to avoid biases. In TCRβ-seq data T-LGLL, the clonotypes in the CD8 + data explaining >5% of the repertoire, in the MNC data reported in the original publication by Kerr et al. 24 , or in scTCRαβ data found as in Fig. 2b were assumed to be T-LGLL clones (Supplementary Data 3). Similarly for other datasets, the antigen drive for the largest clone was analyzed. The presence of antigen drive was defined if GLIPH2 notified a statistically significantly enriched cluster with at least two TCRs against the reference dataset included in GLIPH2.
HLA genotyping and HLA phenotyping inference from sequencing data. The healthy samples profiled with scRNA+TCRαβ-seq (n = 6) were typed at the Histocompatibility Testing Laboratory, Finnish Red Cross Blood Service accredited by European Federation for Immunogenetics. The HLA specificities were reported based on the current World Health Organization (WHO) nomenclature for the HLA system. The typing for HLA-A, -B, -C, and -DRB1 loci was performed using the Luminex bead array technology together with sequence-specific oligonucleotide probes (Commercial LabType kits RSSO1A, RSSO1B, RSSO1C, RSSO2B1, One Lambda, Los Angeles, CA). The bead array data were interpreted according to the manufacturer's recommendations using the HLA Fusion software 3.2 (One Lambda).
HLA phenotypes were inferred from the paired-end scRNA-seq and bulk-RNAseq data with PHLAT (v 1.1) and bowtie (v 2.7.0) with default parameters, ran on paired-end mode. Convincingly, PHLAT arrived at the same six-digit allele as in HLA-A, -B, -C and -DRB1 loci in 47/48 (97.91%) of the alleles in the healthy donors, where the only difference was in one individual where the HLA-C*07:02 was predicted to be HLA-C*4:01. In the T-LGLL samples profiled with scRNA-seq, we had two time series samples to consider how reproducible the algorithm is for different samples from the same individual. When the two-digit accuracy was considered, the agreement between different timepoints for Pt1 was 8/8 (100%) and for Pt2 7/8 (87.5%), where the different HLAs were HLA-B*07 and HLA-B*40. When considering six-digit accuracy, the accuracy was 7/8 (87.5%) for Pt1 and 5/8 (62.5%) for Pt2.
For the proliferation assay, cells were washed and incubated with CFSE Cell Division tracker kit (Biolegend Cat. 423801) for 20 min at 37°C protected from light. Fluorescence was quenched by adding RPMI, washed, resuspended in complete RPMI and incubated at room temperature for 10 min. Cells were then added to wells, pre-coated with CD3 okt-3 (BD, Cat. 555329) on the day before. Stimulants for cell proliferation were added as follows: α-CD49d + α-CD28 or LPS For the phagocytosis assay, on day 2 cells were incubated with FluoSpheres fluorescent beads (FluoSpheres Carboxylate-Modified Microspheres, 1.0 µm, yellow-green fluorescent (505/515), 2% solids-F8823, Thermofisher) at a concentration of cells to beads ratio of 1:10. Cells were incubated for 30 min at 37°C protected from light in only a serum-containing medium. Cells were trypsinized and washed with 1 ml of PBS-EDTA-BSA and stained with the following markers-2.5 ul of CD14 Pe-Cy7 (BD Cat. 562698), 2.5 ul of CD16 PerCP-Cy5.5 (BD Cat. 560717), and 2 ul of CD45 APC-H7 (BD Cat. 641417). All stained samples were analyzed with Cells were acquired on BD FACSVerse and FlowJo software (Version v10.7, Becton Dickinson).
Statistical testing. P-values were calculated with nonparametric tests, including Mann-Whitney test (two groups), Kruskal-Wallis test (more than two groups), and Fisher's exact test where the alternative hypotheses are reported. P-values were corrected with Benjamini-Hochberg adjustment. All calculations were done with R (4.0.2) or Python (3.7.4).
Data visualization. In the box plots, center line corresponds to the median, the box corresponds to the interquartile range (IQR), and whiskers 1.5 × IQR, while outlier points are plotted individually where present.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The processed scRNA-sequencing data for both the T-LGLL and healthy samples generated in this study are available at ArrayExpression under accession code E-MTAB-11170. The raw scRNA-sequencing and bulk-RNA-sequencing are available in the European Genome-Phenome Archive under accession code EGAS00001005297. The TCRαβ-sequencing data, TCRβ-sequencing data, and Seurat-objects are available at Zenodo under: https://doi.org/10.5281/zenodo.4739231 [https://zenodo.org/record/ 4739231] with restricted access due to GDPR regulations and data can be accessed by placing a request via Zenodo. The publicly available scRNA+TCRαβ-sequencing and TCRβ-sequencing data used in this study are listed in Supplementary Data 1. Source data are provided with this manuscript. Source data are provided with this paper.