The transcription factor STAT5 catalyzes Mannich ligation reactions yielding inhibitors of leukemic cell proliferation

Protein-templated fragment ligations have been established as a powerful method for the assembly and detection of optimized protein ligands. Initially developed for reversible ligations, the method has been expanded to irreversible reactions enabling the formation of super-additive fragment combinations. Here, protein-induced Mannich ligations are discovered as a biocatalytic reaction furnishing inhibitors of the transcription factor STAT5. STAT5 protein catalyzes multicomponent reactions of a phosphate mimetic, formaldehyde, and 1H-tetrazoles yielding protein ligands with greatly increased binding affinity and ligand efficiency. Reactions are induced under physiological conditions selectively by native STAT5 but not by other proteins. Formation of ligation products and (auto-)inhibition of the reaction are quantified and the mechanism is investigated. Inhibitors assembled by STAT5 block specifically the phosphorylation of this protein in a cellular model of acute myeloid leukemia (AML), DNA-binding of STAT5 dimers, expression of downstream targets of the transcription factor, and the proliferation of cancer cells in mice.

A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted Give P values as exact values whenever suitable.

For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated

Clearly defined error bars
State explicitly what error bars represent (e.g. SD, SE, CI) Our web collection on statistics for biologists may be useful.

Software and code
Policy information about availability of computer code

Data collection
Magellan™ for data collection in Tecan, BD CellQuest™ Pro Software for data acquisition in FACScan Flow Cytometer, Autodock Vina and Sybyl8.1 for molecular docking, 6200 series TOF/6500 series Q-TOF/LC-MS, Syngene Pxi for gel documentation and blot visualization, S1000™ Biorad Thermal Cycler for cDNA synthesis, CETSA and ITDRF, Roche Light-Cycler 480 Real-Time PCR System for RT-PCR and TSA.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

April 2018
Data Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability The datasets generated and analyzed in the current study are available from the corresponding author upon reasonable request.

Field-specific reporting
Please select the best fit for your research. If you are not sure, read the appropriate sections before making your selection.

Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf

Life sciences study design
All studies must disclose on these points even when the disclosure is negative.

Sample size
Sample sizes were determined by triplicate samples or more (n≥ 3) for comparisons between one or multiple groups, followed by the statistical analysis.The group size, i.e. the number of experimental animals per group is calculated using a statistical program (e.g., Sigma Stat, SAS) for an ANOVA evaluation based on the following parameters: difference of the target size tumor area, variance of the target size and number of groups. The target size is the tumor area, which is calculated after measuring the length and width of the implanted tumors. Based on the variance of the test parameter tumor growth (standard deviation), n=6 animals per group were used to secure statistical significance.
Data exclusions No data were excluded from the analyses.

Replication
The data were reliably reproduced in repeated experiments. The animal data were collected and analyzed from enough mice for each group.
Randomization Animals were randomized with Microsoft Excel into treatment or control groups. For mouse xenograft assays (Figure 6f), the mice were randomized and then treated with or without compound 16.

Blinding
The investigators were not blinded. Investigators were not blind to group allocation during data collection and analysis because the absolute values of tumor volume and tumor weight were recorded for statistic analysis.
Reporting for specific materials, systems and methods   Detail sample preparation is provided in the Supplementary Methods section.