Identification of novel STAT5B mutations and characterization of TCRβ signatures in CD4+ T-cell large granular lymphocyte leukemia

CD4+ T-cell large granular lymphocyte leukemia (T-LGLL) is a rare subtype of T-LGLL with unknown etiology. In this study, we molecularly characterized a cohort of patients (n = 35) by studying their T-cell receptor (TCR) repertoire and the presence of somatic STAT5B mutations. In addition to the previously described gain-of-function mutations (N642H, Y665F, Q706L, S715F), we discovered six novel STAT5B mutations (Q220H, E433K, T628S, P658R, P702A, and V712E). Multiple STAT5B mutations were present in 22% (5/23) of STAT5B mutated CD4+ T-LGLL cases, either coexisting in one clone or in distinct clones. Patients with STAT5B mutations had increased lymphocyte and LGL counts when compared to STAT5B wild-type patients. TCRβ sequencing showed that, in addition to large LGL expansions, non-leukemic T cell repertoires were more clonal in CD4+ T-LGLL compared to healthy. Interestingly, 25% (15/59) of CD4+ T-LGLL clonotypes were found, albeit in much lower frequencies, in the non-leukemic CD4+ T cell repertoires of the CD4+ T-LGLL patients. Additionally, we further confirmed the previously reported clonal dominance of TRBV6-expressing clones in CD4+ T-LGLL. In conclusion, CD4+ T-LGLL patients have a typical TCR and mutation profile suggestive of aberrant antigen response underlying the disease.

Up to 55% of CD4+ T-LGLL patients have been shown to harbor STAT5B mutations [4,5]. In CD8+ T-LGLL, the most common mutated gene is STAT3 [6], whereas STAT5B mutations are rare and often associated with an aggressive disease form [7][8][9]. All reported STAT5B mutations in CD4+ T-LGLL are point mutations within the SH2 or transactivation domains of STAT5B. N642H and Y665F are the most common STAT5B mutations, and they both have been shown to increase STAT5B protein activity [4,10,11].
The etiology of CD4+ T-LGLL remains unknown. An initial antigen-driven expansion of CD4+ T cells, followed by the occurrence of oncogenic events (i.e., somatic mutations), has been suggested to lead to the persistence of abnormal T-cell clones [11]. Non-self-antigen(s) instead of autoantigens [5] are proposed targets of CD4+ T-LGL clones as CD4+ T-LGLL is not associated with autoimmune diseases. In some earlier reports, CD4 + T-LGL clones have been implied to recognize cytomegalovirus (CMV) antigens [12,13]. In CD4+ T-LGLL [3], the enrichment of the Vβ13.1 gene usage has also been reported, differentiating it from CD8+ T-LGLL where no enrichment of specific Vβ gene usage has been observed [14,15]. Moreover, CD4+ T-LGLL patients with a monoclonal expansion of TCRVβ13.1 display a common HLA-DRB1*07:01 genotype and are reported to display an identical motif (QG) in the middle of the CDR3 sequence [16,17]. These observations suggest that the evolution of the expanded CD4+ T-LGLL clones is not a stochastic process, but rather a result of an antigen-driven immune response [18][19][20][21][22].
As CD4+ T-LGLL is a rare disease, previous studies evaluating the clinical impact of STAT5B mutations have been limited in size, and no deep TCR profiling has been performed with modern sequencing and bioinformatic tools [23][24][25]. Therefore, we aimed to collect a large cohort of CD4+ T-LGLL patients (n = 35) and examine by deep amplicon sequencing the STAT5B mutation status and correlate the genotype information with the clinical data. Additionally, by deep TCRβ sequencing, we studied the landscape of both the leukemic and non-leukemic T-cell repertoires in CD4+ T-LGLL and compared that to the landscapes from the healthy controls and patients with CD8+ T-LGLL.

MATERIALS AND METHODS Patient cohort
CD4+ T-LGLL patients (n = 35) were recruited from the hematology units of 3 academic institutions, namely Padova (Italy; n = 22 patients), Helsinki (Finland; n = 6 patients) and Shinshu (Japan; n = 7 patients). Buffy coats from healthy blood donors (n = 37) were provided by the Finnish Red Cross Blood service (Helsinki, Finland). The study was conducted in accordance with the Ethics Committees (Padova University Hospital ethics committee, approval number 4213/AO/17; Helsinki University Hospital ethics committee, approval number 303/12/03/01/2011; Shinshu University Hospital ethics committee, approval number 581). All patients gave written informed consent prior to their inclusion in the study in accordance with the Declaration of Helsinki.

Sample preparation
Peripheral blood mononuclear cells (PBMC) were isolated with Ficoll-Hypaque (Sigma Aldrich) gradient centrifugation. Leukemic clones were purified using magnetic micro-beads coated with monoclonal anti-human CD4, CD57, or CD56 antibodies (Miltenyi Biotec). When available, cryopreserved cells (n = 15) were first subjected to depletion of CD8+ cells, after which the CD8-negative population (or CD8+ in case of CD4+ CD8+ double-positive LGL clone) was further purified with anti-human CD4-conjugated microbeads. Alternatively, CD57 (n = 8) or CD56 (n = 3) purified LGLs or PBMC (n = 9) were used depending on sample availability. A similar approach was used to purify CD4+ cells from buffy coats obtained from healthy controls (n = 37). Flow cytometry analysis was conducted to control the purity of the obtained cell fractions (>95% among lymphocytes) with FACSCanto II in Padova and FACS Verse in Helsinki; data were analyzed with FACS Diva and FACS Suite software, respectively (all Becton Dickinson). Surface staining was performed according to the purification: CD4 FITC CD8 PE or CD4 FITC CD57 PE or CD4 FITC CD56 APC (all Becton Dickinson). DNA was extracted using Gentra Puregene Cell and Tissue kit (Qiagen) and quantified using the Qubit 2.0 fluorometer with DNA high sensitivity kit (ThermoFisher Scientific).

Amplicon sequencing of STAT5B
STAT5B sequencing was performed from LGL or PBMC fraction ( Fig. 1) to detect somatic mutations. Firstly, targeted sequencing was performed starting from 50 ng of DNA. Libraries, generated according to the manufacturer's instructions, were sequenced on an Illumina MiSeq instrument using MiSeq Reagent kit v2 or v3 and following the TruSeq Custom Amplicon Assay (Illumina) pipeline. Bioinformatic variant calling was performed using MiSeq Reporter software (version 2.5.1) with default settings while for variant annotation Illumina Variant Studio 2.0 was used. Next, the observed mutations were confirmed using amplicon sequencing as previously described in Kim et al. [26]. Amplicon primers used for both techniques are reported in Supplementary Table 1. Germline variants were excluded by examination of a non-leukemic cell population from each patient. All identified variants were visually confirmed using the Integrative Genome Viewer (IGV) [27].
Western blot assay and STAT5B luciferase reporter assay Please refer to Supplementary methods and Supplementary Table 2.

TCRβ sequencing and data analysis
TCRβ sequencing from genomic DNA (the same DNA as used for amplicon sequencing, Fig. 1 and Table 1) of CD4+ T-LGLL patients (n = 27) was conducted with Adaptive Biotechnologies' ImmunoSEQ assay with "Survey" resolution [28]. Only productive, i.e., complete, in-frame, TCRβ rearrangements were included in the analyses. Sample clonality was assessed using the Simpson clonality as provided by the Adaptive Immunoseq Analyzer (ver 3.0). Healthy controls (n = 785) from peripheral blood [29] were also used after downsampling to 40,000 reads.
VDJtools was used to generate circus plots for V gene usage from TCRβ-sequencing data [30]. The query against the VDJ database (VDJdb; ver 2021-01-10) [24] was done for all available TCRs from CD4+ cells. To evaluate possible amino acid level similarities of TCRs, GLIPH (ver 1.0.0 and 2.0.0) [23,31] was used on a remote server with default parameters.

Statistical methods
Statistical analysis for the evaluation of differential STAT5B transcriptional activities through luciferase reporter assay was performed with a two-way ANOVA followed by Dunn's multiple correction test. Fisher's exact, Chi-square, or Bonferroni corrected Mann-Whitney tests were used to analyze the relationship between STAT5B mutational status, TCRβ repertoire covariates, and relative clinical features of the cohort of patients profiled (see Table 2 for more details). Fisher's exact test with Bonferroni multiple correction was used to calculate V genes enrichment for CD4+ T-LGLL clones. All the analyses were performed TRBV06-05*01 81% TRBV02-01*01 13% CASSYQGQSKQYF CASSTSTGGYSPLHF TRBV05-01*01 TRBV10-03*01 STAT5B mutation status, type of the STAT5B mutation and TCR Vβ chain sequences (>5%) with their relative percentage (%) are given for each patient. For patients #17 and #24, two additional clones <5% are reported considering the putative match with the low STAT5B mutation VAF (in italics). The CDR3 amino acid sequences are listed according to the % of the Vβ expansions (from the highest to the lowest). Color codes the cell of origin used in the sequencing: green, CD4+ sorted cells; orange, CD4+CD8+ sorted cells; blue, CD57+ sorted LGLs; yellow, CD56+ sorted LGLs; pink, PBMC. VAF variant allele frequency, wt wild-type, PBMC peripheral blood mononuclear cells.
By targeted amplicon sequencing, we discovered that 23 out of 35 patients (66%) harbored somatic STAT5B point mutations (Fig. 1). No differences were detected in the median age between the two groups, whereas STAT5B mutations were more frequent in males (Fisher's exact test, p = 0.035; Fig. 2a and Table 2b). Patients with STAT5B mutations displayed increased LGL counts and lymphocytosis when compared to the STAT5B wild-type patients (Mann-Whitney test, p = 0.006 for both variables; Fig. 2b, c and Table 1b). There was no statistically significant difference in the platelet counts between the mutated and wild-type patients, while a tendency towards higher ANC and Hb levels in the STAT5B mutated patients was noted (p = 0.07 and p = 0.06, respectively; Fig. 2d, e).
Besides the previously identified STAT5B mutations in CD4+ T-LGLL (N642H, Y665F, Q706L, and S715F [4]), we discovered additional six novel somatic STAT5B mutations in the cohort. T628S (n = 5) and P685R (n = 1) mutations are located in the SH2 domain, whereas the V712E mutation (n = 2) is in the The patients present a double expansion, in six cases the main clone is CD4+CD8− and the minor one is CD4+CD8+ whereas only one case presents a major clone CD4+CD8− and a minor one CD4−CD8+. The most frequent mutation was N642H (9/23, 39%), followed by T628S (5/23, 22%), S715F (4/23, 17%) and Y665F (3/23, 13%). Two different base substitutions led to the same missense T628S mutation: the wild-type nucleotide triplet ACC was found to mutate to either AGC or TCC, both inducing the conversion from tyrosine (T) to serine (S). The first nucleotide alteration has already been listed in COSMIC [32], whereas the latter is novel. Interestingly, the two possible substitutions leading to the same amino acid conversion were both identified in different alleles (i.e., in different reads) of patient #12. LGL count, c WBC, d ANC, e Hb All clinical features were analyzed with Mann-Whitney U-test except for sex prevalence which was analyzed with Fisher's exact test. WBC white blood cells, LGL large granular lymphocytes, ANC absolute neutrophil count, Hb hemoglobin, wt wild-type.
E433K and V712E are novel activating STAT5B mutations associated with CD4+ T-LGLL Functional characterization of E433K, V712E, and P685R mutations was performed to determine their impact on the transcriptional activity of the STAT5B gene; known activating variant N642H and the wild-type STAT5B constructs were used as positive and negative controls, respectively (Fig. 3a). Q220H and P702A mutations were excluded from the analysis because of their low VAF (2% and 3%, respectively). T628S mutation with gain-offunction property has already been described before in T-prolymphocytic leukemia (T-PLL) [33]. As shown in Fig. 3a, both E433K and V712E were found to be highly activating when compared to wild-type STAT5B (p < 0.001, two-way ANOVA followed by Dunn's multiple correction test). There was at least a 10-fold increase in the luminescent signal for the E433K mutation and a 15-fold increase for the V712E mutation when compared to the wild-type STAT5B. The signal from the P685R mutated plasmid did not differ from the wild-type construct. This mutation was detected in the patient with coexisting N642H and E433K mutations. In addition, HeLa cells transiently carrying the STAT5B GOF mutants exhibited upregulated phosphorylation of STAT5B protein compared to the cells carrying WT STAT5B (Fig. 3b). Furthermore, STAT5B mutated cells had increased protein expression levels of c-MYC, Bcl-2, and Pim-1 which are downstream targets of STAT5 [34].
Both the leukemic and non-leukemic TCRβ repertoires are highly clonal in CD4+ T-LGLL TCRβ sequencing (TCRβ-seq) was performed on samples from CD4+ T-LGLL patients (n = 27) and healthy controls (n = 37). From T-LGLL patients, genomic DNA was isolated from bead-separated cells (CD4+, CD4+CD8+, CD56+ or CD57+) or PMBC depending on sample availability (Table 1). In healthy controls, CD4+ selected fractions were used. As expected, CD4+ T-LGLL patients harbored larger T-cell clones than in healthy controls (Fig. 4a). Both monoclonal and oligoclonal expansion patterns were observed. By focusing on clones with clone size exceeding 5% of the repertoire, we identified a total of 59 possible T-LGLL clones from the 27 patients (Supplementary Table 4).
For the CD4+ sorted samples, the overall TCRβ clonality was higher in CD4+ T-LGLL (n = 6) as compared to healthy controls (n = 37) (Fig. 4b, p adj < 0.05, Bonferroni corrected Mann-Whitney test). Even after removing the leukemic T-LGLL clones from the patient samples, the observed difference in the clonality persisted ( Fig. 4c, p adj < 0.05, Bonferroni corrected Mann-Whitney test). Thus, the non-leukemic TCR repertoires in CD4+ T-LGLL patients are less diverse than the CD4+ TCR repertoires in the healthy controls. By repeating the analysis with varying thresholds of 3-7% for selecting the putative T-LGLL clones, we showed that this was invariant to our chosen threshold of 5% ( Supplementary  Fig. 1).
Most CD4+ T-LGLL-associated STAT5B mutations are clonally restricted For most of the patients (71%) presenting with a STAT5B mutated clone over 15%, the VAF of the STAT5B mutation corresponded to the size of the main T-LGLL clone: the mutation VAFs were half of the size of the TCR clones suggesting heterozygous mutations (Table 1). Only in patients #2 and #15, the mutation VAFs were slightly higher, suggesting that there was more than one T-cell clone harboring the same mutation. In patient #21 the mutation VAF (23%) matched with the combined size of the two main T-LGLL clones (24% and 22%), suggesting that the mutation is present either in one clone in homozygous/hemizygous fashion or in both clones as heterozygous (Table 1). Interestingly, patients with low VAF (<5%) STAT5B mutations (#8, #13, #17, and #24) had nevertheless large T-LGLL clones (37-71%) suggesting that they had either subclonal STAT5B mutations or the mutations were located outside the main clone (Table 1).
In patients with multiple STAT5B mutations, the clonal structure was evaluated based on the VAF and TCR clone size. In patient #9, the three mutations co-existed in the same clone based on their similar VAFs (47%) matching with the size of the main clone (97%) ( Table 1). In patient #18, the Y665F (VAF 34%) and N642H (VAF 5%) mutations were detected in different subclones, and accordingly, also two expanded TCR clonotypes were detected (Table 1). Patient #11 had one large main TCR clone (90%) but two different STAT5B mutations with different VAFs (Y665F: 32% and V712E: 11%) ( Fig. 1 and Table 1). This suggests possible clonal evolution and the presence of different genetic lesions in individual cells bearing the same TCR.

CD4+ T-LGLL clonotypes are predominantly restricted to individual patients
To understand whether the 59 identified CD4+ T-LGLL clonotypes (Supplementary Table 4) could be a result of convergent evolution, we examined if these clones were shared between patients.
We performed a similar analysis with our CD8+ T-LGLL cohort. None of the CD8+ leukemic clones (defined similarly as clones above 5%) was shared between the patients (Supplementary Fig.  2a). 50% of these clones were found in the Emerson et al. [29] cohort in low frequencies ( Supplementary Fig. 2b).
Next, we calculated the generation probabilities of the leukemic TCRs with OLGA [25] as high generation probabilities could indicate a bias in the recombination and low generation probabilities towards convergent evolution. The TCR generation probabilities were higher in clones that were shared between healthy individuals. Conversely, the generation probabilities in clones that were exclusive to CD4+ T-LGLL were low (Supplementary Fig. 3).

Skewed Vβ gene usage in CD4+ T-LGLL clones
Previous studies using Vβ flow cytometry analysis have suggested a biased usage of the Vβ13.1 family (consisting of TRBV06-05, TRBV06-06, and TRBV06-09 genes) in CD4+ T-LGLL [16]. In our data, the predominant V-gene of CD4+ T-LGLL clonotypes belonged to the TRBV06 family in 20% (12/59) of CD4+ T-LGLL clonotypes (Supplementary Table 6, Supplementary Fig. 4). By selecting the single most expanded clones, we found that 8 out of 27 T-LGLL clones belonged to the TRBV06 family, in comparison to only 2 out of the 35 clones belonging to the TRBV06 family in healthy (Fig. 6a, p = 0.01, Fisher's two-sided exact test). Representative examples from CD4+ T-LGLL, with both monoclonal and polyclonal expansions, and healthy are shown in Fig. 6b. LGLL as compared to one representative healthy sample. c Logo plot showing the non-conservative middle part of TCRs CDR3, which is usually within 5 Å of the antigen. The "QG" motif that was found in the middle CDR3 in multiple CD4+ T-LGLL clones is shown in CDR3s that are 13 amino acids long.
To test whether the CD4+ T-LGLL clones target a known viral antigen (from e.g., CMV, EBV, or Influenza A), we queried the 59 T-LGLL clones against the largest database containing known antigen specificities (VDJdb [24]) but no exact matches for TCRs recognizing epitope bound to class II HLAs were found. The antigen specificities of the non-leukemic TCRs returned multiple matches, with Influenza A as the most common target (Supplementary Fig. 5a).
Since structurally similar TCRs are known to recognize similar antigens [23,31,37], we used GLIPH to identify significantly enriched motifs within CD4+ T-LGLL clonotypes in comparison to healthy naïve repertories. Interestingly, we identified two conserved motifs within the 59 T-LGLL clonotypes. The first one, "SDP", in three T-LGLL clones all of which shared the TRBV24 gene, and the second one, "SLRG", in three other T-LGLL clones with different V genes (Supplementary Table 7). No correlation was observed between motifs and the mutational statuses of the patients. Next, we analyzed whether the T-LGLL clonotypes shared motifs within their repertoire, i.e., whether the T-LGLL responses could be polyclonal antigen-specific responses. In 33% (2/6) of CD4+ sorted patients' samples, we could identify a motif that was shared between leukemic and non-leukemic repertoires. Structurally similar TCRs from patient #18 are shown in Supplementary  Fig. 5b, where the leukemic clone can be seen to share similarities with the multiple non-leukemic TCRs.
Here, we performed a detailed molecular characterization of 35 CD4+ T-LGLL patients, which is the largest cohort described so far. We identified novel gain of function STAT5B mutations and found that leukemic clonotypes are predominantly private to the patients.
At least one STAT5B mutation was detected in 66% of the 35 CD4+ T-LGLL patients. This is the highest frequency of STAT5B mutations thus far reported in the literature. While the previously published studies have mostly focused on the mutational hotspot regions of STAT5B [5,9], our sequencing assay covered the whole protein-coding sequence of STAT5B. Moreover, the use of a highly sensitive method allowed us to detect low-frequency mutations down to 1% VAF.
Functional analyses of the newly discovered STAT5B mutations confirmed STAT5 pathway activation, as the downstream targets of STAT5 (c-MYC, Bcl-2, and Pim-1 proteins) [34] were overexpressed. Similarly, a previous study has shown that STAT5B mutant-transduced cell lines upregulate Bcl-2 [43]. c-MYC is an essential regulator of cell proliferation and survival [34], and Pim-1, an anti-apoptotic gene, is known to synergize with c-MYC in leukemogenesis [44]. Altogether, this data suggests that the STAT5B GOF mutations are likely to be involved in clonal dominance and maintenance, probably by conferring proliferation advantage for the mutated cells. Accordingly, STAT5B mutated CD4+ T-LGLL patients had higher lymphocyte and LGL counts compared to wild-type patients.
In addition to the earlier described N642H, S715F, and Y665F mutations, we discovered the T628S mutation to be common in CD4+ T-LGLL (22% of our cohort). This mutation has been previously reported as a highly activating mutation in T-PLL [33,45] and hepatosplenic T-cell lymphoma (HSTCL) [42]. Although SH2 and TAD domains are the known mutation hotspot area in the STAT5B gene, we also identified novel variants in the CCD (Q220H) and DBD (E433K/G) of STAT5B. Similarly, the majority of the STAT3 mutations are in the SH2 domain in CD8+ T-LGLL but additional mutations are also found in other domains [4] underlining the importance of covering all mutation hotspots in the diagnostic tests of LGLL.
Among the STAT5B mutated CD4+ T-LGLL patients, 22% harbored multiple STAT5B mutations either in the main T-LGLL clone or in separate smaller clones. Interestingly, mutation VAFs did not always correlate with the T-cell clone size based on TCRβ sequencing, suggesting sub-clonal heterogeneity. Similarly, 17% of CD8+ T-LGLL patients have been reported to harbor multiple STAT3 mutations [46]. Our results are consistent with previous data, suggesting that in T-LGLL additional somatic mutations can arise in the pre-expanded clonotypes [47].
In CD8+ T-LGLL, STAT3 mutations have been associated with autoimmune manifestations such as rheumatoid arthritis and neutropenia [2,5,8,48,49]. In CD4+ T-LGLL, neutropenia is uncommon, and similarly, in our cohort STAT5B mutated patients had normal ANC levels but increased lymphocyte and LGL counts. Irrespective of the mutation status, the phenotype of the CD4+ T-LGLL clone was in most (75%) cases CD16−CD56+. Although CD16+CD56+ phenotype was detected in only four cases, all these had mutated STAT5B. CD56+ immunophenotype has also been associated with STAT5B mutations in CD8+ T-LGLL [9] in which the LGLs are usually CD56-.
TCRβ-seq has provided new insights into the nature of clonal expansions in CD8+ T-LGLL [14] but has not been performed in CD4+ T-LGLL earlier. In accordance with the previous flow cytometry-based data, in CD4+ LGLL the TRBV06 family was enriched in the most hyper-expanded clones. Interestingly, 25% of T-LGLL clonotypes were also found among the non-expanded TCR repertoire in CD4+ T-LGLL patients and 27% were found in healthy individuals. This was not observed in the CD8+ T-LGLL disease, consistent with previous results [14]. Although our data was not HLA-matched, this could hint that the eliciting antigen in CD4+ T-LGLL could be commonly encountered. Further, in CD4+ T-LGLL also the non-leukemic T-cell compartment was significantly more clonal than in healthy controls. It has been shown previously, that CD4+ T cells in healthy controls are much richer in their diversity as compared to CD8+ T cells [50]. Hence, it was interesting to note that the overall clonality was higher in CD4+ T-LGLL compared to CD8+ T-LGLL, but this could also be due to the relatively small sample size. However, the eliciting antigen(s) remain unknown, as no exact matches were found for the known antigen-specific TCRs, which could be due to the small amount of known class II pMHC-TCR pairs and potential bias in the HLA genotypes of our samples. Additionally, since previous literature has indicated that a significant proportion of CD4+ T-LGLL patients has an underlying disease or malignancy [3], it cannot be excluded that the triggering event is abnormal antigen stimulation such as tumor antigen.
More research is warranted to clarify whether CD4+ T-LGLL is a true hematological malignancy or a reactive disorder of the immune system as clonal T cell proliferations and somatic mutations are also observed in healthy controls and non-malignant diseases [26,51]. Following the initial event, activating STAT5B mutations may boost the aberrant proliferation and clonal persistence. Despite the STAT5B hyperactivation and the presence of multiple STAT5B mutations, the clinical course of CD4+ T-LGLL is indolent, and patients rarely have symptoms, differentiating it from CD8+ T-LGLL and other mature and immature CD4+ T-cell leukemias and lymphomas.