## Introduction

The signal transducer and activator of transcription (STAT) family play key regulatory roles in cellular proliferation, metabolism, differentiation, and survival, and are expressed at a significantly higher level than corresponding cytokine receptor chains and upstream kinases1. The paradigm of STAT cytosolic-nuclear shuttling is initiated by cytokine/growth factor association with specific extracellular receptors. Unphosphorylated STATs form an anti-parallel dimer/monomer equilibrium, and upon efficient recruitment to the cognate receptor–kinase complex, are phosphorylated on a specific tyrosine residue, which promotes parallel dimerization by conformational rearrangement2. The activated STAT dimer translocates to the nucleus, initiating transcription of target genes and altering chromatin involved in loop formation in enhancer-promoter landscapes. Tyrosine dephosphorylation deters STAT activation and facilitates cytosolic shuttling/recycling, whereas impeding the STAT-activating Janus kinases (JAK) and cytokine receptor chains are intrinsically slower processes involving ubiquitination and degradation through suppressors of cytokine signaling (SOCS) ubiquitin ligases. This pleiotropic cascade is tightly regulated by a variety of ligands and effector proteins to provide extensive control over key cellular processes3. Hyperactivation of upstream effectors, including all JAK kinase family members, most childhood or adult cancer kinase translocations such as breakpoint cluster region-Abelson (BCR/ABL), and growth receptor hyperactivations such as FMS-like tyrosine kinase receptor-3 internal tandem duplication (FLT3-ITD) or v-erb-b2 erythroblastic leukemia viral oncogene homolog 2 (ERBB2), results in malignant transformation1,2,4. Leukemogenic driver mutations that arise during tumorigenesis have frequently been identified in the mutational landscape of hyper-activated STAT proteins, most prominent in T-cell prolymphocytic leukemia (T-PLL) or other mature T-cell neoplasms5. Notably, STAT5BN642H was identified and validated as a severe oncogenic driver mutation5,6,7,8. We have shown that transgenic mice harboring STAT5BN642H maintain a lower threshold to cytokine activation and rapidly develop aggressive mature CD8+ T-cell neoplasia5. This is consistent with in vitro models, suggestive of STAT5BN642H promoting prolonged STAT phosphorylation and dimerization7. The therapeutic implications of patients identified with STAT5BN642H include increased drug resistance, risk of relapse and poor outcomes9. Although novel therapeutic avenues have been proposed, including multimodal treatments with FDA-approved JAK inhibitors (ruxolitinib, baracitinib, and tofacitinib) in combination with Aurora kinase-targeting small molecules10, there are currently no clinical candidates targeting STAT5B or driver mutations thereof. This is partly due to the limited structural information available on the STAT proteins, as well as challenges associated with targeting transcription factors, including dynamic, non-contiguous interacting interfaces with shallow binding sites.

Herein, we further explore the aggressive transforming capacity of STAT5BN642H revealing widespread organ infiltration of proliferative mature T-cells of not only CD8+ but also CD4+ and γδ T-cell subsets in the STAT5BN642H transgenic mouse model. Notably, we demonstrate STAT5BN642H-driven transformation of γδ T-cells in an in vivo syngeneic transplant model, which may have utility as an important pre-clinical model for aggressive human γδ T-cell disease. Moreover, we reveal the crystal structures of human STAT5B and STAT5BN642H. In association with biochemical and structural data, our findings suggest that the SH2 domain of STAT5BN642H adopts two unique conformations in the solid state and has an enhanced affinity for self-dimerization coupled with a marked resistance to dephosphorylation. These studies provide a rationale for understanding the molecular basis of the aggressive STAT5BN642H driver mutation, as well as important structural information and insight into pre-clinical models for targeted therapeutic intervention of hyper-activated STAT5B.

## Results

### STAT5B expression and mutations in hematological cancers

Clinical and physiological studies have highlighted the functional dichotomy between the STAT5A and STAT5B gene products, and the increasingly influential role of STAT5B in tumorigenesis and proliferation. An important role for STAT5B, but not STAT5A, has been demonstrated for the pathogenesis of various disease-drivers, such as BCR/ABL11,12 and NPM-ALK13, where STAT5A was described as a tumor suppressor14. Notably, sequencing patient samples for gain-of-function (GOF) variants of STAT5B is becoming more frequent and has revealed multiple recurrent mutation hot-spots within the SH2 and C-terminal domains5 (Fig. 1a). Most notably, the N642H mutation has been detected in over 150 patients with various hematological malignancies, predominantly of T-cell origin. The most common T-cell diseases, representing over 60% of patients harboring the N642H mutation, include T-cell acute lymphoblastic leukemia (T-ALL), T-PLL, and monomorphic epitheliotropic intestinal T-cell lymphoma (MEITL; previously classified as enteropathy-associated T-cell lymphoma type II, EATL-II) (Fig. 1a). Interestingly, in addition to GOF mutations, mining of publicly available gene expression datasets also revealed significantly increased levels of STAT5B mRNA in patients with hematopoietic cancers of chronic lymphocytic leukemia (CLL), T-ALL, B-cell acute lymphoblastic leukemia (B-ALL), and adult T-cell leukemia/lymphoma (ATLL) origin (Fig. 1b). Indeed, increased expression of STAT5B protein was previously reported in CLL patients and correlated with poorer overall survival15. Furthermore, transgenic mice overexpressing STAT5B in the lymphoid compartment develop T-cell lymphomas16. Therefore, these patient data highlight the oncogenic potential of STAT5B in lymphoid neoplasia. As such, it is of interest to understand and characterize the molecular mechanisms of oncogenesis driven by hyper-activated STAT5B to devise better treatment strategies for these largely incurable diseases.

### STAT5BN642H is a strongly aggressive oncogene

Recently, we confirmed STAT5BN642H as a driver mutation in T-cell neoplasia5. Transgenic mice expressing STAT5BN642H within cells of the hematopoietic compartment rapidly succumb to mature T-cell lymphoma/leukemia, where the most dominant disease-causing cells are effector memory CD8+ cytotoxic T-cells5. We therefore wanted to further characterize these mice and assess their suitability as a pre-clinical model for human T-cell neoplasia. As previously observed, and in line with human patients suffering from mature peripheral and cutaneous T-cell neoplasias, STAT5BN642H mice develop skin lesions resulting from disease-cell infiltration (Fig. 2a), as well as lymphadenopathy and splenomegaly (Fig. 2a, b). Interestingly, these mice also have significantly increased liver weight (Fig. 2b). Closer examination and immunophenotyping of different T-cell subtypes in the lymph nodes confirmed a shift in T-cell populations, with an increase in the proportion of CD8+ T-cells and a corresponding decrease in CD4+ T-cells (Fig. 2c). We also quantified γδ T-cells in the lymph nodes although no change was observed in the proportion of these relatively rare cells (Fig. 2c). Notably, the STAT5BN642H mutation rendered bone marrow (BM) cells hypersensitive to various cytokines, resulting in significantly increased colony formation compared with BM cells from human STAT5B or wild-type mice (Fig. 2d). Additionally, in the absence of any cytokine, BM cells from STAT5BN642H mice could still consistently form a small number of colonies, in contrast to BM cells from human STAT5B or  wild-type mice (Fig. 2d). These data demonstrate that the aggressive N642H mutation can support cytokine-independent proliferation and render cells hypersensitive to cytokine signaling.

### STAT5BN642H transforms multiple T-cell subsets

STAT5BN642H has been detected in patients with diseases of varying immunophenotypes. In our STAT5BN642H transgenic mouse model, mature CD8+ T-cells were identified as the predominant disease-initiating cells (Fig. 2c)5. We therefore sought to investigate the transforming and infiltrative capacity of other mature T-cell subsets, since patients with GOF mutations in the JAK1/3-STAT3/5B pathway also suffer from mature γδ or CD4+ T-cell diseases5,17. Upon immunohistological analysis of various peripheral organs from STAT5BN642H mice, it was evident that these mice have severe, proliferative T-cell infiltrations into the skin and lung, as previously observed5, as well as in the liver and brain (Fig. 3a–d). These infiltrations are observed largely as foci spread throughout the organs. The infiltrating T-cells within different organs were subsequently immunophenotyped by flow cytometry. Notably, in addition to the dominant CD8+ T-cell infiltrations, CD4+ T-cells, as well as γδ T-cells were also found in increased numbers in various organs of STAT5BN642H mice (Fig. 3e–h). Interestingly, where the proportion of infiltrating CD4+ T-cells was approximately half that of CD8+ T-cells across all organs examined, there was a notable increase in the proportion of infiltrating γδ T-cells into the liver of STAT5BN642H mice compared with other organs (Fig. 3e–h).

These observations are interesting in the context of the different disease phenotypes of patients carrying the STAT5BN642H mutation. It is clear that in our model, where STAT5BN642H is expressed in all hematopoietic cells, CD8+ T-cells are most sensitive to transformation and therefore become highly proliferative and invasive. Surprisingly, we observed significant T-cell infiltration into the brain in the STAT5BN642H mice, with the main T-cell phenotype being TCRβ+ CD8+ (Fig. 3d, f). T-ALL, one of the diseases with the highest number of STAT5BN642H-positive patients, is associated with infiltration into the central nervous system (CNS) upon relapse18. Overall, T-ALL patients with STAT5BN642H were found to be at higher risk of relapse and have poorer event-free survival compared to other leukemia patients6. Furthermore, a patient from a primary CNS T-cell lymphoma cohort was reported to harbor the STAT5BN642H mutation, and was diagnosed with a CD8+ CD4 peripheral T-cell lymphoma, not otherwise specified (PTCL, NOS)19. An increasing number of large granular lymphocytic leukemia (T-LGLL) patients have also been reported to possess the STAT5BN642H mutation (Fig. 1a), most of which had a CD4+ disease burden20,21, but one STAT5BN642H-positive CD8+ T-LGLL case was reported to be particularly aggressive and fatal22. T-PLL also has a mature predominantly CD4+ T-cell phenotype23, and in line with the considerable number of T-PLL patients carrying the STAT5BN642H mutation (Fig. 1a), we also observed CD4+ T-cells infiltrating into peripheral organs in the STAT5BN642H transgenic mice (Fig. 3e–h).

Hepatosplenic T-cell lymphoma (HSTCL) is predominantly a disease of γδ T-cell origin primarily affecting the liver, spleen and often BM24. The STAT5BN642H mutation was detected in 33.3% (7/21) of patients from a HSTCL patient cohort24. It is therefore interesting that γδ T-cells from the STAT5BN642H transgenic mice were observed in a relatively greater proportion in the liver compared with the other organs examined (Fig. 3e–h). It was previously reported that the diseased γδ T-cells in around one third of HSTCL patients harboring the N642H mutation were also CD8+24, and we indeed observed a large proportion of γδ-TCR+ CD8+ cells in the liver of the STAT5BN642H mice (Supplementary Fig. 1). No STAT5BN642H patients have been reported to carry γδ-TCR+ CD4+ cells, again consistent with our results (Supplementary Fig. 1).

### STAT5BN642H-transformed γδ T-cells reconstitute disease

In order to verify whether the observed organ infiltration of these T-cell subsets represents true transformation by STAT5BN642H, we carried out a syngeneic transplant experiment as previously performed with neoplastic CD8+ T-cells from STAT5BN642H mice5. Here, we chose γδ T-cells as our model transplant system because human T-cell diseases of γδ origin are among the most aggressive with extremely poor survival rates, and currently no pre-clinical models for these rare entities exist17. Strikingly, 3 months post-transplant of γδ T-cells from the STAT5BN642H mice into immunocompetent recipients (Fig. 4a), an aggressive disease developed in a subset of the transplanted mice, characterized by lymphadenopathy and splenomegaly (Fig. 4b). Importantly, flow cytometric analysis of the lymph nodes of recipient mice confirmed a substantial increase in neoplastic γδ T-cells (Fig. 4c) but no considerable change in levels of CD8+ or CD4+ T-cells (Supplementary Fig. 2). Immunohistological analyses revealed infiltration of proliferative T-cells into the liver of recipient mice (Fig. 4d), consistent with observations in the STAT5BN642H transgenic mice (Fig. 3c, e) and with the pathology observed in patients with aggressive γδ-driven HSTCL.

These data provide evidence of the transforming capacity and oncogenic nature of STAT5BN642H in different T-cell subsets. Given that pre-clinical models for many of the rare and aggressive human T-cell neoplasia entities do not currently exist, as for the γδ T-cell subtypes, further development and characterization of such transplant models that recapitulate pathologies observed in these diseases will therefore be extremely valuable for testing much needed novel treatment strategies.

### Crystal structure of human STAT5B

Given the aggressive nature of a single point mutation in transforming T-cell subsets, we sought to identify the molecular and structural basis of STAT5BN642H activation. Currently, the structure of unphosphorylated mouse STAT5A has allowed for several computational studies of human STAT5B, modeled in the activated state based on crystal structures of activated STAT1/3 without N- and C-terminal domains25,26. These studies have provided insights into the distinct molecular mechanisms of action for each STAT5 gene product. In order to gain insight into the structural basis of the N642H mutation, as well as to validate modeling studies on the differences between STAT5A/B, we crystalized human STAT5B. We subjected full-length STAT5B (M1-S787) to a range of crystallization conditions, however, the protein crystals were not amenable to high-quality diffraction. Based on these results we prepared a more stable construct (A136-Q703, also referred to as the unphosphorylated STAT core fragment), that eliminated highly flexible regions of the protein and corresponded to similar domain boundaries employed for crystallization of STAT5A27, as well as other STAT proteins28,29. With these domain boundaries, human STAT5B was crystallized and diffracted to 3.29 Å resolution (Fig. 5a). The structure was solved via molecular replacement with mouse STAT5A (PDB: 1Y1U) and refined to Rwork and Rfree values of 0.266 and 0.306, respectively. Data collection and refinement statistics are summarized in Supplementary Table 1.

The overall architecture of human STAT5B is analogous to mouse STAT5A with a rmsd of 1.21 Å, and a rmsd of 1.89 Å and 2.03 Å, in comparison to mouse STAT3 and human STAT1 respectively. The canonical domains of the STAT family are all present with the coiled-coil domain (CCD, S138-T331), DNA-binding domain (DBD, F332-V470), linker domain (H471-D591), and SH2 domain (Q592-P685). Within these domain boundaries, there are 29 amino acid differences between human STAT5B and mouse STAT5A, which are predominantly located within disordered regions and the SH2 domain (Fig. 5a–c, Supplementary Fig. 3).

Archetypical STAT-type SH2 domains are comprised of an anti-parallel β-sheet created by three β-strands (conventionally labeled βB-βD) surrounded by three α-helices (αA, αB’, and αB)30, which are also observed in STAT5B (Fig. 5d). Conventional phospho-peptide binding occurs perpendicular to the central β-sheet with the phospho-Tyr accessing the N-terminal (pY) pocket on one face of the β-sheet and the flanking residues occupying the C-terminal (pY + 3) cavity on the opposite face31. The pY-binding pocket is positively charged and harbors a number of evolutionarily conserved residues, such as an invariant arginine (R618 on βB5), that form part of the complex phosphate coordination network. Notably, a critical His at the βD4 position that is found in 80 of 121 human SH2 domain-containing proteins31, is absent for all STAT proteins. The hydrophobic pY + 3 cavity serves as the peptide selectivity pocket and is predominantly formed by the βD strand and αB helix. As such, the surface-exposed βD strand controls access to both the phosphate-coordinating βB5 position and the selectivity residues located deeper in each of the N- and C- terminal pockets. Several of the variations between STAT5A and STAT5B occur within the βD strand and CD loop (Q636, M639, F640, M644), which ostensibly correlates to differences in peptide selectivity between the isoforms.

### N642 is a critical site for modulating STAT5B interactions

As illustrated in Fig. 1a, the SH2 domain is a hot-spot for several mutations with the most prevalent mutations occurring either on the βD strand (such as N642H) or in close proximity ( < 5 Å apart), suggesting a common mechanism of action. To investigate whether STAT5BN642H perturbs the protein structure, we crystalized the mutant using the same domain boundaries as the  wild-type protein (A136-Q703). The crystal structure of STAT5BN642H was solved to 3.21 Å with a similar space group to the wild-type protein (Supplementary Table 1 and Fig. 5c). The βD strand containing N642H was observed with excellent electron density and showed two distinct conformations (Fig. 5e–j). In one STAT5BN642H conformation, the βD strand associates securely with the βC strand forming additional H-bonding partners (Fig. 5f, i), not observed in the wild-type conformation (Fig. 5e, h), thereby fully completing the anti-parallel β-sheet. In this conformation, the H642 likely coordinates the phosphate, similar to that observed in other SH2 domains such as Src and phosphatidylinositol-4,5-bisphosphate 3-kinase (PI3K). Interestingly, the substitution of N642H fulfills a similar role to the βD4 His which, as previously described, is a residue critical for peptide binding in these SH2 domain-containing proteins31, but absent in all STATs. Mutation of the βD4 His in these proteins abolishes peptide binding31, further suggesting STAT5BN642H directly coordinates the phosphorylated species, consistent with previous computational predictions26. The second conformation showed the N642-containing βD strand dissociated from the βC and βB strands, resulting in an incomplete β-sheet (Fig. 5g, j). This partially completed β-sheet places the SH2 domain in an open, potentially inactivated, state. Alternatively, it is also possible that this conformation is more capable of activating STAT5B either through facilitating interactions with the C-terminal region and dimer interface or providing greater access to the pY pocket to increase phospho-ligand association kinetics. In any case, it is interesting that STAT5BN642H can adopt either a hyper-activated or hyper-inactivated state.

To investigate whether conformational changes between STAT5B and STAT5BN642H are the result of modified intramolecular or intermolecular interactions, we examined protein binding through fluorescence polarization (FP) with a fluorescein-labeled phospho-Tyr peptide derived from the EPO receptor32. Notably, the affinity of the phospho-peptide with the full-length STAT5BN642H (Kd = 15 ± 1 nM) was ~6–7-fold higher than wild-type (Kd = 102 ± 11 nM, Fig. 6a). The change in peptide affinity is consistent with our structural findings of the SH2 domain pocket, as well as previous surface plasmon resonance (SPR) studies of STAT5BN642H with phosphorylated peptide-dimer interface mimetics7. Interestingly, there is a significant increase in peptide binding with the corresponding STAT5B core fragments (Kd = 28.9 ± 1.9 nM for STAT5B, Kd = 2.6 ± 0.4 nM for STAT5BN642H, Supplementary Fig. 4). The tighter affinity of the core fragments (S136-Q703) compared to the full-length (M1-S787) is likely related to the increased accessibility of the SH2 domain in the absence of the C-terminal domain.

Previous molecular docking simulations have proposed favorable electrostatic interactions between the mutant imidazole and phosphate moiety of the peptide7. To examine the influence of electrostatics on this interaction, we varied the pH of the buffer and determined the dissociation constants (Fig. 6a). At lower pH values, stabilization of the interaction would be expected to increase the affinity for the peptide. While this is observed for  wild-type STAT5B, the binding affinity of STAT5BN642H remains unchanged, suggesting the stabilizing interaction is not solely facilitated by electrostatics. To investigate the importance of sterics and electrostatics at this position, we generated a series of mutants and examined their phospho-peptide binding affinity. STAT5BN642Q is analogous to the wild-type protein, with a slight perturbation arising from an extra methylene group in the side chain. However, the substitution of N642Q drastically reduced the binding affinity (Kd = 229 ± 35 nM). In contrast, removal of additional functional groups from the side chain with STAT5BN642A also reduced the observed peptide binding (Kd = 612 ± 72 nM) further suggesting that this site does not tolerate changes in sterics. Contrastingly, STAT5BN642D maintains the same shape of the wild-type asparagine but has opposite electrostatic properties. The negatively charged group was also found to have reduced peptide binding affinity (Kd = 261 ± 29 nM). Finally, STAT5BN642E, which possesses both steric bulk, as well as inferior electrostatic properties, completely abolished protein–peptide binding (Kd > 1 µM). All mutants also exhibited an increased peptide affinity at lower pH values, similar to wild-type STAT5B. STAT5BN642D demonstrated the largest increase in peptide binding affinity at pH 6.8, compared to all the mutants generated, likely due to increased protonation and neutralization of the negative charge. In contrast to FP-binding experiments, thermodynamic denaturation profiles revealed the reverse trend in the stability of the STAT5B mutants (Fig. 6b). The electrostatic charge of STAT5BN642D and the additional steric bulk of STAT5BN642Q resulted in a significant stabilization of the protein (STAT5B Tm = 44 °C, STAT5BN642D Tm = 47 °C, STAT5BN642Q Tm = 47 °C). The combined effect of the changes in STAT5BN642E greatly increased protein stability (STAT5BN642E Tm = 51 °C). Consistent with the lack of binding observed in the FP experiments, the introduction of peptide did not increase the thermodynamic stability of STAT5BN642E (Tm = 51 °C), but stabilized STAT5BN642D (Tm = 50 °C) and STAT5BN642Q (Tm = 49.5 °C). The collective implications of these results are interesting in the context of the two STAT5BN642H conformations crystalized. Increased steric bulk and electronegative potentials at the N642 site promote a more stable and rigid STAT structure, likely the completed β-sheet domain conformation, but reduce the ability to bind phosphorylated peptides/species and possibly reduce overall STAT activity. The open conformation of the N642H-containing βD strand may allow for increased access to the peptide binding pocket. Once phospho-peptide binding occurs, the STAT5BN642H may adopt a more stabilized structure further resulting in tighter binding. In the absence of phospho-peptide, STAT5BN642H likely samples both conformations explaining the minimal changes in protein stability, but possibly favors the open state in the presence of peptide, allowing for tighter binding.

### STAT5BN642H enhances anti-parallel dimer kinetic stability

To examine the effects of the N642H mutation on the conformational mobility of STAT5B, we carried out molecular dynamics (MD) simulations of the unphosphorylated STAT5B and STAT5BN642H anti-parallel dimers. Three independent simulations were performed for each dimer (Supplementary Movie 12). In all the tested systems, the STAT5B dimer exhibited marked instability and dissociated rapidly. In contrast, the STAT5BN642H dimer remained intact over the entirety of the simulations (Fig. 7a, b). The STAT5BN642H dimer was observed to be highly flexible, populating different dimer interfaces. The interactions between the CCD of chain A and DBD and linker domains of chain B (interface 1, Fig. 7c) form the most stable dimer interface, possibly responsible for the integrity of the anti-parallel dimeric structure. The second dimer interface, comprised of the CCD of chain B and the DBD and linker domains of chain A, is highly flexible, exchanging between multiple different conformations in each of the three simulation trajectories (interface 2, Fig. 7c). Further examination of the frequency of inter-chain contacts revealed several key residues in the CCD and linker domains responsible for the sustained inter-chain interactions in the anti-parallel STAT5BN642H dimer (Fig. 7d and Supplementary Table 2). Since the CCD and linker domains contain the majority of these hot-spot residues, it appears that they play a more significant role in maintaining the stability of the interface than the DBD. Notably, these contacts are not close to the N642H mutation, suggesting a possible allosteric communication pathway between the CCD, linker and SH2 domains. These studies also suggest that STAT5B and STAT5BN642H are regulated allosterically but differently and are influenced by the character of motions (collective and local) of the 642 residue, its environment and the interdependence of these motions.

To validate the in silico observations concerning anti-parallel dimer stability, we examined the hydrodynamic radius of STAT5B and STAT5BN642H by dynamic light scattering (DLS) (Fig. 7e). Measurements of the particle size distribution of STAT5B via DLS suggest the full-length protein exists predominantly as a dimer (d = 21.42 ± 0.29 nm), which is also supported by gel filtration experiments (Supplementary Fig. 5). Similar experiments with STAT5BN642H yielded a slightly smaller average particle size (d = 18.89 ± 0.12 nm), suggesting a more compact or stable anti-parallel dimer and reduced dissociation kinetics consistent with the MD simulations. (Fig. 7e).

Simulations of the anti-parallel STAT5B dimer were terminated upon observing dimer dissociation. To study the dynamics of the wild-type STAT5B protein, we carried out three independent simulations of the unphosphorylated STAT5B monomer. The root mean square fluctuation (RMSF) of the α-carbon atoms of the SH2 domain (Fig. 7f) indicates that N642H mutation leads to a more stable and rigid SH2 domain (except for residues 630 to 645, which are near the mutation site). The N642H mutation has little to no effect on the flexibility of CCD, DBD, and linker domains (Supplementary Fig. 6). We hypothesize that a more rigid SH2 domain may reduce the entropic cost of phospho-peptide binding and increase peptide-binding affinity.

### The STAT5N642H parallel dimer resists dephosphorylation

We probed the physiological relevance of tighter mutant-peptide interactions by examining the interaction of STAT5B and STAT5BN642H with the upstream ABL1 kinase33 through SPR (Fig. 8a). ABL1 kinase was immobilized onto the chip surface and the response upon treatment with increasing concentrations of STAT5B or STAT5BN642H was observed and corrected with blank injections. A significantly improved fitting (χ2STAT5B = 8.6, χ2STAT5BN642H = 7.1) was obtained with a two-state kinetic model using the global data analysis option available within BiaEvaluation 3.0 software. This suggests that the ABL kinase-STAT interaction involves a two-step sequential process or conformational change. The kinetics of the first process appear to be substantially different between the wild-type and mutant protein, but the overall KD values are not significantly different at a level that would be relevant at physiological concentrations. Additionally, no significant difference was observed between the total in vitro phosphorylation of the wild-type or mutant proteins upon detection via Pro-Q diamond gel staining (Supplementary Fig. 7).

To examine the molecular basis of dephosphorylation, kinetic experiments were performed to track the stability of the STAT5B activating phosphorylation site. To assess this, IL-3-dependent murine Ba/F3 cells were stably transduced with either human wild-type STAT5B, STAT5BN642H, STAT5BY665F (the second most frequent STAT5 mutation), or vector-GFP control. All stable cell lines were confirmed positive for GFP expression by flow cytometry (Supplementary Fig. 8A). Consistent with our observations, the N642H mutation promoted retention of Y699 phosphorylation in the absence of IL-3, which was also observed for STAT5BY665F but not for wild-type STAT5B overexpression or the endogenous protein (Fig. 8b). Examining phosphorylation kinetics, we found that STAT5BN642H promoted the stability of Y699 phosphorylation up to 12 h upon removal of IL-3, whereas endogenous STAT5 and exogenous wild-type STAT5B were dephosphorylated by 4 h post IL-3 removal (Fig. 8c), a trend also consistent in murine 32D cells (Supplementary Fig. 8B) and previous reports in other cellular models5,7. These observations for both STAT5BN642H and STAT5BY665F mutations may point to a convergent mechanism for STAT5 SH2 domain GOF mutants, which involve stabilization of the dimerization interface enabling prolonged activation.

It is interesting to note that the 32D cells displayed a considerably different immunoblot banding pattern for the GOF mutants, whereby they appeared to run predominantly as smaller molecular weight species (~80 kDa) but still seemed to influence the activation of full-length STAT5 proteins (Supplementary Fig. 8B). This phenomenon has been previously described for STAT5 proteins in 32D cells34, where it was proposed to not occur naturally but rather represent an artifact of cell extract processing due to the release of high levels of nuclear cathepsin G and proteolytic cleavage of STAT5, which is not observed in Ba/F3 cells. However, based on our observations, it is tempting to speculate that expression of oncogenic STAT5B mutants in these myeloid cells may specifically trigger activity of this protease as a negative regulatory mechanism to prevent cell transformation. This hypothesis could potentially explain the lower incidence of oncogenic STAT5B mutations found in myeloid diseases compared with T-cell neoplasia, but this will need to be further tested experimentally. Overall, in the context of in silico modeling, our in vitro and in cellulo experiments suggest STAT5BN642H maintains a tighter dimer interface that resists dephosphorylation, allowing for prolonged stimulatory gene regulation as a consequence of a longer activation lifetime.

Overexpression of STAT5B is strongly correlated with the proliferation of multiple hematopoietic cancers and several oncogenic activating mutations have been identified. Here, we have revealed the structural origins and biochemical consequences of how an aggressive driver mutation such as STAT5BN642H develops during cancer cell evolution to enhance phospho-Tyr:SH2 domain interactions and escape negative regulatory phosphatase attack. The STAT5BN642H driver mutation was crystalized with the SH2 domain βD strand adopting two different conformations. These conformations may potentially result in hyperactivation or hyperinactivation of STAT5B, thereby suggesting a possible mechanism for the altered oncogenic activity. Moreover, we shed light on the transforming capacity of STAT5BN642H in different T-cell lineages, particularly in γδ T-cells, allowing us to better understand and develop models for human disease. Overall, these studies offer insight into the molecular activation of STAT5BN642H, and the structural data, as well as pre-clinical in vivo models described here will assist with the future development of novel therapeutic strategies, which are urgently needed in rare but aggressive mature T-cell diseases with limited treatment options.

## Methods

### Human patient mutation and gene expression data

Data of patients harboring the STAT5BN642H mutation were collected from previously published studies6,7,8,19,20,21,22,24,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52. Gene expression analysis of STAT5B was performed for various human hematopoietic cancers using public gene expression datasets available from the Oncomine database53. For the analysis, the p-value threshold was set to 0.01, and both the fold-change and gene rank thresholds were set to all.

### Mammalian expression plasmids

Mammalian expression constructs containing FLAG-tagged human STAT5B or GOF variants in a pMSCV-IRES-GFP plasmid were previously generated5,39. The integrity and orientation of all complementary DNAs and the presence of point mutations was verified by Sanger sequencing.

### Transgenic animals and in vivo transplant experiments

Transgenic mice expressing either human STAT5B or STAT5BN642H under the control of the Vav1 promoter were previously described5. For transplant experiments, γδ T-cells from hSTAT5BN642H mice or  wild-type littermates were isolated from pooled lymph node and spleen cell suspensions by FACS, and sorted cells were re-checked by flow cytometry for their purity. Cells (1.2 × 104) were transplanted by lateral tail vein injection into nonirradiated C57BL/6NRj mice. Recipient mice were monitored daily and evaluated at the first sign of disease onset.

All animal studies were discussed and approved by the institutional ethics committee of the University of Veterinary Medicine Vienna, and animal experiment licenses were granted under GZ 68.205/0166-WF/V/3b/2015 and 68.205/0103-WF/V/3b/2015 (Austrian Federal Ministry of Science, Research and Economy). All animals were bred and maintained in a specific pathogen-free environment in the experimental mouse facility at the University of Veterinary Medicine (Vienna, Austria).

### Cell culture and generating stable cell lines

32D murine myeloid cells (#ACC 411) and Ba/F3 murine pro-B cells (#ACC 300) were purchased from the German Collection of Microorganisms and Cell Cultures (DSMZ). Cells were cultured in RPMI 1640 supplemented with 10% FBS, 2 mM l-glutamine, 10 U mL–1 penicillin-streptomycin (all from Gibco, Thermo Fisher Scientific) and 1 ng mL–1 murine Interleukin-3 (mIL-3; PeproTech). To generate retrovirus, Platinum-E retroviral packaging cells (Cell Biolabs) were transfected with plasmids using Lipofectamine 2000 (Invitrogen, Thermo Fisher Scientific), as per the manufacturer’s protocols. 32D and Ba/F3 cells were transduced with viral particles by spinfection, and pools of cells stably expressing the transgenes of interest were selected after 48 h by sorting for GFP-positive cells using fluorescence-activated cell sorting (FACS).

### Immunoblotting

Immunoblotting was performed using standard protocols. The following antibodies were used: polyclonal rabbit anti-phospho-STAT5 (Y694) (#71-6900; Invitrogen, Thermo Fisher Scientific; 1:1000), monoclonal mouse anti-STAT5 (#610191; BD Biosciences; 1:1000), monoclonal mouse anti-HSC70 (#sc-7298; Santa Cruz Biotechnology; 1:10000) and monoclonal mouse anti-α-tubulin (#sc-32293; Santa Cruz Biotechnology; 1:5000). Images of membranes were obtained using IRDye fluorescent secondary antibodies and an Odyssey CLx imaging system (LI-COR).

### Flow cytometry analyses

Whole body perfusion with phosphate-buffered saline (PBS) was performed on human STAT5BN642H or wild-type mice, or transplant recipient mice, and organs were then collected and minced through a 70 μm cell strainer (BD Biosciences). For organ T-cell infiltration experiments, leukocytes were isolated using 40% and 78% Percoll gradients (GE Healthcare). Cells were then stained for FACS analysis using various antibodies purchased from eBioscience, Biolegend or BD Biosciences (see Supplementary Table 3 for antibody list). Counting beads (BioLegend) were added before acquisition to quantify absolute cell numbers. All analyses were performed on a FACSCanto II flow cytometer using FACSDiva software (BD Biosciences). Further analyses were performed using FlowJo software. The gating strategies employed are shown in Supplementary Figs. 9 and 10.

### Immunohistochemistry

Tissues were incubated for 24 h in 4% phosphate-buffered formaldehyde solution (Roti-Histofix; Carl Roth), dehydrated, paraffin-embedded, and cut (4-μm-thick sections). For immunohistochemical staining, heat-mediated antigen retrieval was performed in citrate buffer at pH 6.0 (Dako) and sections were stained with monoclonal rabbit anti-CD3 (#RM-9107; Thermo Fisher Scientific; 1:300) or monoclonal mouse anti-Ki67 (#NCL-Ki67p; Novocastra, Leica Biosystems; 1:1000) antibodies, using standard protocols. Images were obtained using an Olympus BX 53 LED light microscope with an Olympus SC50 camera.

### Hematopoietic colony assays

Bone marrow cells were isolated from the hind limbs of human STAT5B, human STAT5BN642H, or wild-type mice, and 5 × 104 cells were plated in duplicate into 30 mm plates in base methylcellulose (#M3231; MethoCult; Stemcell Technologies) according to the manufacturer’s protocols, in the absence or presence of 20 ng mL–1 mIL-3, Erythropoietin (hEPO), Thrombopoietin (mTPO), granulocyte colony-stimulating factor (mG-CSF), or granulocyte-macrophage colony-stimulating factor (mGM-CSF). Colonies were incubated at 37 °C for 10 days and were counted manually using a light microscope.

### Protein expression

The gene (NCBI: NM_012448.3) encoding full-length human STAT5B (M1-S787) or the core fragment (A136-Q703) was synthesized and cloned into a pET-28b( + ) vector using restriction enzymes NheI and XhoI with a N-terminal His-SUMO tag. STAT5BN642 point mutations were generated through site-directed mutagenesis for both the full-length and core fragment. Molecular cloning was performed by GenScript. Protein expression and purification of all STAT5B proteins was carried out as described previously32,54. Briefly, BL21 (DE3) RILP cells were transformed with plasmid containing His-SUMO-STAT5B and single colonies were selected and cultured in 5 mL of lysogeny broth containing kanamycin (50 µg mL–1) and chloramphenicol (34 µg mL–1). The cultures were grown under continuous shaking at 37 °C for 3–4 h and used to inoculate 1 L of Super broth containing 10 mM MgSO4, 0.1% (w/v) glucose, 3% (v/v) ethanol, kanamycin (50 µg mL–1) and chloramphenicol (34 µg mL–1). Following culture growth (OD600 = 2.0), the incubation temperature was reduced to 18 °C and the culture was induced with 0.5 mM IPTG. The cells were harvested after 18–20 h and stored at –80 °C. All reagents were purchased from BioShop.

### Protein purification and crystallization

STAT5B (wild-type and N642 mutants) protein purification was carried out by identical procedures. The cell pellets were lysed by sonication (Q55-QSonica) and the cell lysate was cleared by centrifugation. The fusion protein, His-SUMO-STAT5B, was isolated by Ni2+-nitrilotriacetic acid column chromatography (GE Healthcare). The fractions containing the STAT protein were treated with 0.1% (v/v) His-Ulp1 protease. His-Ulp1 protease was expressed and purified from E. coli using the same procedure as for STAT5B55. The plasmid containing Ulp1 protease, pFGET19_Ulp1, was a gift from Hideo Iwai (Addgene plasmid # 64697). The cleaved STAT protein was isolated by size exclusion chromatography (SEC650, Bio-Rad). The sample was dialyzed into 100 mM HEPES pH 7.4, 2% (v/v) glycerol. Protein concentrations were determined using a Pierce Bicinchoninic (BCA) Protein assay kit (Thermo Fisher Scientific).

For crystallography, the isolated STAT5B protein was dialyzed overnight into 20 mM HEPES pH 7.5, 200 mM NaCl, 1 mM TCEP and subsequently concentrated to 100–200 µM using 10 kDa MWCO concentrator (Amicon Millipore). Crystals of wild-type STAT5B protein were grown for 7–10 days in 200 mM lithium citrate, 20% (w/v) PEG3350. STAT5BN642H crystals were grown in 200 mM lithium sulfate, 100 mM Tris-HCl pH 8.3, 25% (w/v) PEG3350, using 40–50 μM protein at 20 °C. Crystals were harvested from the drops using 0.05–0.1 mm Mounted CryoLoops–10 micron (Hampton Research) and stored in liquid nitrogen. Initial diffraction images for STAT5B and STAT5BN642H were screened at Dana-Farber Cancer Institute and the Structural Genomics Consortium or York University, respectively.

### Data collection with structure solution and refinement

X-ray diffraction data for STAT5B were collected on NE-CAT beamline 24-ID-C at the Advanced Photon Source; data was collected on a Pilatus 6 M detector with 0.2 s exposure and 0.2° oscillation per frame (λ = 0.979 Å). Diffraction data for STAT5BN642H was collected at the Canadian Macromolecular Crystallography Facility (CMCF), beamline 08ID-1, at the Canadian Light Source in Saskatoon, SK, Canada. Data for STAT5BN642H were collected at a wavelength of 0.979 Å on a Pilatus 6 M detector with 0.3 s exposures and 0.3° oscillation per frame. Diffraction images were processed using the Xia256 pipeline. The structure of STAT5B was solved by molecular replacement with Phaser57 using mouse STAT5A (PDB:1Y1U) as the search model; the STAT5N642H structure used STAT5B as the search model. The structures were refined within Phenix58,59,60,61, with manual examination/rebuilding of |2Fo| − |Fc| and |Fo| − |Fc| maps using Coot62. Both structures exhibited two molecules per asymmetric unit and identical packing. Stereochemical quality of the final refined structures was done via MolProbity63, and deposited in the PDB as 6MBW (STAT5B) and 6MBZ (STAT5BN642H) with the corresponding statistics provided in Supplementary Table 1. Structures were visualized through the use of USCF Chimera64.

### Fluorescence polarization

Fluorescence polarization experiments were conducted using an Infinite M1000-Tecan Instrument with 384-well black plates (Greiner, medium binding). For fluorescence polarization experiments, the reaction samples (50 µL in volume) were prepared in 20 mM HEPES (pH 6.8, 7.25, 7.5, or 7.9), 50 mM NaCl, 0.1% (v/v) NP-40 substitute, 10% (v/v) DMSO with 10 nM fluorescently labeled (6-carboxyfluorescein or FAM) STAT5B-binding peptide (FAM-GpYLVLDKW, purchased from CanPeptide) and varying concentrations of STAT5B proteins. Each sample was incubated for 1 h at room temperature and the fluorescence polarization was detected at 530 nm following excitation at 470 nm (slit width of 5 nm in each case). The Kd was determined using the equation for binding as depicted below:

$${\mathrm{FP}} = {\mathrm{FP}}_{\mathrm{o}} + \frac{({\mathrm{FP}}_{\infty} - {\mathrm{FP}}_{\mathrm{o}})}{2\left[{\mathrm{Peptide}}_{\mathrm{Total}} \right]}\left[\vphantom{\sum^{x}}\left(\left[{\mathrm{Peptide}}_{\mathrm{Total}}\right] + \left[{\mathrm{STAT5}}_{\mathrm{Total}}\right] + K_{\mathrm{d}}\right)\right. \\ \hskip 10pt \left. - \, \sqrt{\left(\left[{\mathrm{Peptide}}_{\mathrm{Total}} \right] + \left[{\mathrm{STAT5}}_{\mathrm{Total}}\right] + K_{\mathrm{d}} \right)^{2} - \, 4\left(\left[{\mathrm{Peptide}}_{\mathrm{Total}} \right]\right)\left(\left[{\mathrm{STAT5}}_{\mathrm{Total}}\right]\right)} \right]$$
(1)

All experiments were repeated in triplicate to ensure reproducibility.

### Differential scanning fluorimetry (DSF)

DSF was performed using a BioRad CFX-96 Real Time PCR System, C1000 Thermal Cycler. The reactions (50 µL in volume) were performed with 100 mM HEPES buffer (pH 6.8, 7.25, 7.5, or 7.9) containing 150 mM NaCl, 0.2 mg mL–1 of wild-type or mutant STAT5B, and Sypro Orange (Invitrogen) at a final concentration of 5 × (1:1000 dilution). DSF was carried out by increasing the temperature by 0.5 °C/cycle from 20 to 80 °C, and fluorescence readings were collected at 30 s intervals. The fluorescence data and the second derivatives were collected and processed with the CFC Manager Software using the melt curve function of the software’s protocol editing tools. The temperature scan curves were fitted to a Boltzmann sigmoid function, and the Tm values were obtained from the midpoint of the transition. All experiments were repeated in triplicate to ensure reproducibility.

### Molecular dynamic simulations

The crystal structures of STAT5B and STAT5BN642H dimers were subjected to all-atom MD simulations. The missing atoms in the crystal structures were added using Swiss-PDB Viewer65 and the missing loops were introduced using the loop model module of Modeler66. The resulting loop models were refined using the refine fast option of loop model. Three different structures for the loop regions were obtained as a result of loop modeling and refinement, which were used as initial structures in three independent simulations for each dimer. The simulation system in each case consisted of the dimer with charged termini (NH3+ at the N-terminus and COO- at the C-terminus) in a rhombic dodecahedron box with 10 Å distance to all box edges, with CHARMM-modified TIP3P water molecules67 (~129,000 and ~134,750 water molecules for STAT5B and STAT5BN642H dimers, respectively) and 150 mM NaCl. All simulations were carried out using GROMACS version 2016.568. Energy minimization was carried out using the steepest descent algorithm. Prior to equilibration simulations, a restrained simulation was carried out for 2 ns with position restraints on the heavy atoms of the protein. Virtual sites were used69, allowing the use of an integration time step of 4 fs and periodic boundary conditions were applied. Short-range electrostatic interactions and van der Waals interactions were calculated with a cutoff of 0.95 nm. Long-range electrostatic interactions were evaluated using smooth Particle-mesh Ewald summation70 with 0.12 nm grid spacing and a fourth order interpolation. The Verlet cutoff scheme was used for neighbor searching. The bonds involving hydrogen atoms were constrained using the LINCS algorithm71 and water molecules were constrained using the SETTLE algorithm72. The velocity rescaling thermostat73 was used to maintain the temperature at 298 K. Equilibration simulations were carried out in the NPT ensemble for 10 ns using Berendsen pressure coupling74, followed by 10 ns simulation using the Parrinello-Rahman barostat75. Production runs were carried out in the NPT ensemble at 298 K and 1 bar using the Parrinello-Rahman barostat and the velocity rescaling thermostat. The CHARMM36m force field76 with the CHARMM-modified TIP3P water model was used for all of the simulations. Simulations were carried out for 1 µs for STAT5BN642H dimers. In the simulations of the STAT5B dimers, the inter-monomer contacts were lost at 8 ns, 213 ns, and 75 ns, for systems 1, 2, and 3, respectively, and these simulations were terminated shortly after dimer dissociation was observed. To study the dynamics of the wild-type STAT5B protein, we carried out three independent simulations of the unphosphorylated STAT5B monomer by using the chain A from the corresponding STAT5B dimer systems as a starting structure. Simulations were carried out for 1 µs following the procedure described above. In all simulations, the simulation box was sufficiently large to avoid interactions between periodic images.

Visual molecular dynamics (VMD)77 was used for visualizing the simulation trajectories. The GROMACS utility mindist was used to calculate the inter-chain distances and contacts. Here, an inter-chain contact was defined as present if any two atoms are within a cutoff distance of 6 Å. The frequency of sustained inter-chain contacts was calculated using the VMD tool contactFreq.tcl. A sustained contact was defined when a contact between two non-hydrogen atoms within a cutoff of 4.5 Å is present for a total of 40% of the simulation time in at least one of the systems. The GROMACS utility rmsf was used to calculate the RMSF of the α-carbon atoms of the individual domains. For the unphosphorylated STAT5B monomer, the statistical uncertainty was calculated over three independent simulation systems. For the STAT5BN642H dimer, RMSF was calculated for each SH2 domain individually, and the statistical uncertainty was reported over six SH2 domains in three independent systems. All-to-all RMSD of the non-hydrogen atoms of the SH2 domain was used as a convergence measure (Supplementary Fig. 11) and was calculated using the RMSD trajectory tool of VMD. Conformational states visited by the SH2 domain appear as blocks of regions with similar RMSD values along the diagonal. RMSD values in the off-diagonal regions are generally below 1 nm, indicating that conformational states are re-visited in independent simulations. This analysis shows that the conformational space of the SH2 domain is adequately sampled in both the unphosphorylated STAT5B monomer and STAT5BN642H dimer systems. It should be noted that RMSD values are higher in STAT5B monomer systems, indicating higher flexibility in the wild-type monomer compared to the mutant dimer.

### Dynamic light scattering (DLS)

DLS experiments were performed on a Malvern Zetasizer NanoZS instrument with 50 µM samples of STAT5B and STAT5BN642H (200 µL in volume) in 20 mM HEPES pH 7.4, 150 mM NaCl, 2% (v/v) glycerol. Standards of 1 mM Lysozyme (Bio-Shop) and Bovine Serum Albumin (BSA, Sigma-Aldrich) were also prepared in the same buffer. All samples were centrifuged for 10 min at 14,800 rpm and the data were collected in a quartz cuvette (1 cm path length). The hydrodynamic radius was determined from the average of the frequency distribution of particle sizes in three independent experiments for each sample.

### Surface plasmon resonance (SPR)

SPR experiments were performed at room temperature using the Biacore T200 (GE Healthcare) and a dextran-coated gold sensor chip functionalized with nitrilotriacetic acid (Series S Sensor Chip NTA, GE Healthcare). His-tagged ABL1 was immobilized using Ni2+-NTA affinity. Measurements were performed in PBS running buffer, including 0.05% Tween 20 and 50 µM EDTA. The surfaces of flow cells two (Fc2) and one (Fc1) were conditioned with 350 mM EDTA (GE Healthcare) for 60 s at a flow rate of 30 µL min–1. Fc2 was then prepared for ligand immobilization with 0.5 mM NiCl (GE Healthcare) injection at 30 µL min–1. Fifty nanomolar His-tagged ABL1 kinase was injected onto to FC2 at 30 µL min–1 until at least 400 RU was reached, and immobilized ABL1 kinase was allowed to stabilize for 30 min. Single cycle kinetics was used to collect kinetic data. Full-length STAT5B or STAT5BN642H in running buffer was injected over the two flow cells at varying concentrations (0.0124, 0.0370, 0.111, 0.333, 1 µM) at a flow rate of 30 µL min–1 and a contact time of 120 s. The complex was allowed to dissociate for 600 s. The surfaces were regenerated with a 120 s injection of 350 mM EDTA, followed by a 120 s injection of 500 mM imidazole. Data were double referenced by subtracting binding responses from buffer solution flown over the active (Fc2) and reference (Fc1) flow cell. Data were collected at a rate of 10 Hz and fit to a two-state kinetic model using the global data analysis option available within BiaEvaluation 3.0 software.

### Statistics

Weight measurements and flow cytometry data are reported as mean ± SEM and differences were assessed for statistical significance by unpaired, two-tailed Student’s t-tests. Colony quantifications are reported as mean ± SEM and statistical differences were assessed by one-way ANOVA with Bonferroni’s correction. A p-value of <0.05 was considered statistically significant.

### Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.