Sequence specificity analysis of the SETD2 protein lysine methyltransferase and discovery of a SETD2 super-substrate

SETD2 catalyzes methylation at lysine 36 of histone H3 and it has many disease connections. We investigated the substrate sequence specificity of SETD2 and identified nine additional peptide and one protein (FBN1) substrates. Our data showed that SETD2 strongly prefers amino acids different from those in the H3K36 sequence at several positions of its specificity profile. Based on this, we designed an optimized super-substrate containing four amino acid exchanges and show by quantitative methylation assays with SETD2 that the super-substrate peptide is methylated about 290-fold more efficiently than the H3K36 peptide. Protein methylation studies confirmed very strong SETD2 methylation of the super-substrate in vitro and in cells. We solved the structure of SETD2 with bound super-substrate peptide containing a target lysine to methionine mutation, which revealed better interactions involving three of the substituted residues. Our data illustrate that substrate sequence design can strongly increase the activity of protein lysine methyltransferases.

N umerous covalent post-translational modifications (PTMs) occur on the tails of histone proteins including acetylation of lysine residues, methylation of lysine and arginine residues, and phosphorylation of serine or threonine residues 1 . Together with other chromatin modifications, histone PTMs play very important roles in the regulation of chromatin structure and gene expression [2][3][4] . The methylation of the ε-amino group of lysine residues is catalyzed by protein lysine methyltransferases (PKMTs), which can introduce up to three methyl groups with high specificity using S-Adenosyl-L-methionine (AdoMet) as methyl group donor. One group of PKMTs contains a SET (Su(var) 3-9, Enhancer of Zeste (E(z)) and Trithorax (trx)) domain, which harbors the catalytically active center for the transfer of the methyl group [5][6][7] .
The SET-domain containing protein 2 (SETD2, also called KMT3A) PKMT was first identified as huntingtin interacting protein 1 (HYPB, HIP-1) 8 . It has a size of 230 kDa corresponding to 2564 amino acids and can add up to three methyl groups to K36 of histone H3 9 . The ability for trimethylation of H3K36 makes SETD2 unique, because no other trimethyltransferase for this lysine residue is known in human cells 9,10 . The structures of SETD2 in complex with H3K36 variant peptides and S-Adenosyl-L-homocysteine (AdoHcy) revealed that the substrate peptide is positioned in a deep binding channel in the catalytic SET domain where several contacts between the enzyme and the substrate peptide are formed 11,12 . H3K36me3 is enriched in the gene bodies of expressed genes, because SETD2 is recruited to its genomic target sites by the elongating RNA Polymerase II 3,8,13,14 . It has a role in the maintenance of repressed chromatin in transcribed regions by recruitment of histone deacetylases, the H3K4 demethylase LSD2, chromatin remodelers, and DNA methylation via the PWWP domain of DNMT3 enzymes. Moreover, H3K36me3 in gene bodies has been associated with alternative splicing 3,13,14 .
Many studies have shown that lysine methylation not only occurs on histones, but it is a very abundant PTM on non-histone proteins as well, where it has diverse regulatory roles [15][16][17] . Park et al. 16 showed that SETD2 methylates α-tubulin at lysine 40 and that SETD2 is crucial for mitosis and cytokinesis 18 . Later studies reported that the signal transducer and activator of transcription 1-alpha/beta (STAT1) is also methylated by SETD2 at lysine 525 and this modification is an important signal for IFNα-dependent antiviral immune response 19 .
Previous investigations revealed many somatic mutations of the SETD2 gene in cancer tissues, for example, in pediatric and adult high grade gliomas 20 . Another cancer type in which SETD2 plays an important role is Sporadic clear cell renal cell carcinoma (cRCC), where frameshift, non-sense and missense mutations in SETD2 were observed, suggesting a loss-of-function mechanism 21 . This model was supported by the observation that the level of H3K36 trimethylation is reduced in cRCC, but H3K36me2 is not affected 22 . Another mechanism of SETD2 inhibition in cancer cells is the expression of the so-called H3K36M oncohistone, frequently found in chondroblastomas 23 . Structural and kinetic studies have shown that the H3K36M mutation inhibits the methyltransferase activity of SETD2, because the methionine residue binds into the lysine binding channel of the active center and thereby functions as a competitive enzyme inhibitor 11 . The inhibition of H3K36 PKMTs by the K36M oncohistone leads to massive perturbations of genome wide H3K36me3 and H3K27me3 patterns 11,24 . Genetic mutations of SETD2 cause an overgrowth syndrome 25 related to the SOTOS syndrome, which is majorly caused by mutations in the H3K36 mono-and dimethyltransferase NSD1 26 .
While the crystal structures of SETD2 in complex with H3 peptides provide the interaction details between the SETD2 substrate binding channel and the bound peptide 11,12 , the substrate specificity of SETD2 for peptide methylation has not yet been systematically analyzed. Using a peptide array based approach, we have determined the amino acid specificity profile of the substrate sequence for SETD2 in this study. Based on this information, we identified 9 novel and strongly methylated nonhistone peptide substrates and a methylation site at K666 of Fibrillin-1 at the protein level. Strikingly, we observed that SETD2 prefers other amino acids than those present in the H3K36 sequence at several places of the specificity profile. Based on this observation, a super-substrate of SETD2 was designed and shown to be methylated much more efficiently than H3K36 in vitro and in cells. Structural analysis pinpointed improved molecular interactions as potential reasons for the increased methylation efficiency of the super-substrate. However, the human proteome does not contain a protein fully matching the optimized SETD2 substrate sequence indicating that additional work will be needed to understand this unexpected property of SETD2.

Results
SETD2 expression and purification. The His 6 -tagged catalytic SET domain of SETD2 was expressed in E. coli cells and purified through Ni-NTA-beads ( Supplementary Fig. 1a). To examine the activity of SETD2, the enzyme was incubated with reconstituted mononucleosomes (rec. MN), native mononucleosomes isolated from HEK293 cells (nat. MN) and recombinant H3.1 protein as substrates in methylation buffer supplemented with radioactively labeled [methyl-3 H]-S-Adenosyl-L-methionine (AdoMet) as cofactor. After the methylation reaction, the samples were separated by SDS-PAGE and the transfer of the radioactively labeled methyl groups from the cofactor to the respective target lysine was visualized by autoradiography ( Supplementary Fig. 1b). While no activity could be detected on the recombinant H3.1 protein, strong methylation was observed with the nucleosomal substrates. The activity of SETD2 on reconstituted mononucleosomes was higher than on the native mononucleosomes, presumably because the native mononucleosomes were already partially methylated at K36.
Next, the SETD2 activity was tested on peptide SPOT arrays containing fifteen amino acid long peptides immobilized on a cellulose membrane. As reported SETD2 substrates, the H3K36 (29-43), α-tubulin (33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43)(44)(45)(46)(47) and STAT1 (518-532) peptides (with the target lysine centered in each case) were included, and peptides with the corresponding target lysine mutated to alanine were used as negative controls. The peptide array was incubated in methylation buffer supplemented with SETD2 and radioactively labeled AdoMet. The methylation of the peptides was visualized by autoradiography ( Supplementary Fig. 1c) showing detectable methylation of the H3K36 (29-43) peptide spot. The K36A spot and the other investigated peptide substrates (α-tubulin and STAT1) did not show a detectable peptide methylation signal. These data confirm the methylation activity of SETD2 at the H3K36 site in nucleosomal and peptide substrates. Methylation of the previously reported non-histone substrates α-tubulin and STAT1 was not detectable at the peptide level and, therefore, they were excluded from further investigation.
Substrate specificity analysis of SETD2 in an H3K36 context. After confirmation of the SETD2 activity, its substrate specificity was determined by using SPOT peptide arrays 27 . The H3K36 (aa 29-43) fragment was used as a template sequence in this analysis, and each single position of the template sequence was exchanged against 18 other natural amino acids creating single amino acid mutants for each position. Trp was excluded in this analysis because of its unfavorable coupling properties. Cys was omitted because it can be methylated itself 28 . This setup resulted in 270 peptide spots whose methylation could be investigated in a single experiment (Fig. 1a). The methylation experiments of the peptide arrays were performed as described above in duplicates, which were normalized and averaged (Fig. 1b). The standard deviations (SD) of the methylation activity of each single spot were calculated, which showed that most spots had an SD smaller than ±10% ( Supplementary Fig. 2a) indicating a high reproducibility of the results. For visualization of the substrate specificity motif of SETD2, discrimination factors were calculated as previously described 27 . The discrimination factors show the preference of SETD2 for one specific amino acid over all other amino acids in each single position of the peptide sequence (Fig. 1c). In addition, a Weblogo of the substrate specificity profile was calculated as well 29 .
Our data revealed that SETD2 is a highly specific PKMT with specific sequence readout between P30 (position −6, if K36 is annotated as position 0) and R42 (+6). The strongest specificity was observed at G33 (−3), G34 (−2), V35 (−1) and P38 (+ 2). Surprisingly, we observed that SETD2 prefers amino acids different from the canonical sequence surrounding K36 in H3 at five sites of the profile. At position −5, SETD2 prefers R, G, and L over the naturally occurring alanine residue in the H3K36 substrate. At the −4 position, F, Y, L, I, and R are more preferred than the natural T. At position +1, SETD2 has a higher preference for R, H, I, L, F, Y, and V compared to the natural K.
At position +3, R, N, G, K, F, and Y are more favored than the naturally occurring H, and finally R and K are preferred over Y at position +5.
Design of a super-substrate for SETD2. Based on these observations, we aimed to explore, if better substrates than H3K36 can be designed for SETD2. We synthesized peptide SPOT arrays including H3K36 (29-43) as a positive control and the H3K36A mutant as a negative control, in which up to 5 combined mutations were systematically introduced that converted the original H3K36 residues into residues that are more favored (Fig. 2a and  Supplementary Table 1). This peptide array was methylated by SETD2 as described above and the methylation signal was captured by autoradiography. Strikingly, we observed that most of the mutated peptides were indeed more strongly methylated than the H3K36 peptide indicating that the mutant peptides are preferred substrates. We observed a stepwise improvement of activity with increasing number of mutations starting with T32F, going over A31R/T32F, A31R/T32F/K37R, and finally to A31R/T32F/ K37R combined with H39N/K. Among the investigated peptides, the peptide sequence of spot B10 was selected as a super-substrate (ssK36) peptide, because it was one of the most strongly methylated peptides (sequence: A P R F G G V K R P N R Y R P, with the altered amino acids printed in blue; the red labeled residue represents the target lysine K36). The Y to R exchange at the +5 site was not included to avoid the generation of a triple-R a d e  Peptide SPOT array substrate sequence specificity investigation of SETD2. a Substrate sequence specificity scan using H3K36 (29-43) as template sequence. The template sequence is represented on the horizontal axis, K36 is printed in red. The vertical axis represents the 18 amino acids, which were used to create single amino acid mutants of the template sequence. The peptide array was methylated with SETD2 using radioactively labeled AdoMet and the transfer of methyl groups was detected by autoradiography. b Analysis of two independent specificity scan experiments. The signals of both independent replicates were quantified, normalized and averaged. c Discrimination factors showing the preference of SETD2 for specific amino acids over all others at each position 27 and visualization of the substrate specificity motif of SETD2 as Weblogo (https://weblogo.berkeley.edu/logo.cgi) 29 . sequence and because it did not result in a clear increase in activity. Densitometric analysis of the intensities and shapes of 5 pairs of H3K36 and ssK36 peptide spots on independent arrays revealed a 70 ± 10 (SD) fold stronger methylation of ssK36 than H3K36.
Peptide methylation and inhibition studies with ssK36. To compare the methylation rates of the H3K36 and super-substrate peptides directly, purified ssK36 and H3K36 were methylated by SETD2, resolved on a Tricine gel and the methylation activity captured by autoradiography (Fig. 2b). To allow direct comparison of the methylation levels of both substrates, methylated ssK36 was loaded in different dilutions on the same gel. Densitometric analysis of three independent experiments showed that the ssK36 peptide was methylated 50 ± 4 (SD) fold more efficiently than the wild type H3K36 peptide, which is in good qualitative agreement with the estimations from the peptide spot methylation experiments.
To further characterize the methylation kinetics of the supersubstrate peptide, steady-state methylation experiments were performed using the super-substrate peptide in different concentrations ranging from 20 to 320 µM ( Supplementary Fig. 3a). The derived data were fitted to the Michaelis-Menten model revealing a K M of 45 µM. Hence, the apparent K M value of the methylation of the super-substrate was only slightly improved when compared with the K M of the H3K36 peptide previously determined under the same conditions (65 µM 30 ) suggesting that most of the methylation rate enhancement observed with the ssK36 is due an increase in k cat . It has been previously shown that H3K36M is a substrate competitive inhibitor of SETD2 11,30 . To further study the difference in peptide binding, inhibition of SETD2 by ssK36M and H3K36M inhibitor peptides was investigated ( Supplementary Fig. 3b and c). The results were fitted to a competitive inhibition model revealing K i -values of 128 µM for the ssK36M peptide and 447 µM for H3K36M indicating a roughly 3.5-fold increase in binding of the ssK36M peptide.
To determine the methylation rates of H3K36 and ssK36 more quantitatively, the time course of ssK36 methylation was determined under single turnover conditions revealing a methylation rate constant of k st = 4.07 h −1 (Fig. 2c). Finally, the    Table 1) were synthesized on a cellulose membrane and methylated by SETD2. The signal of the transferred radioactive methyl groups was detected by autoradiography. Spots A1 and B15 contain the H3K36 sequence, A2 and B16 the H3K36A peptide. Two films with different exposure times are shown to visualize weaker methylated spots as well. b Comparison of the methylation of purified super-substrate (ssK36) and the H3K36 peptides. The upper panel shows a Tricine-SDS-gel of the ssK36 and H3K36 peptides stained with Coomassie BB. Methylation reactions were conducted using 20 µM peptide and 6 µM enzyme. The lower panels display autoradiography films after different exposure times allowing to compare the methylation levels of the different samples. For comparison, different dilutions of the methylated ssK36 were loaded as indicated. c Time course of ssK36 peptide methylation determined under single turnover conditions using 5 µM substrate and 6 µM enzyme. Fit of the data to an exponential reaction progress curve revealed a single turnover rate constant of 4.07 ± 0.39 h −1 (average ± SEM). d Direct comparison of the time courses of methylation of the ssK36 and H3K36 peptides determined using 20 µM substrate and 6 µM enzyme. The gels with the radioactive ssK36 (5 µl loaded per well) and H3K36 peptides (20 µl loaded per well) were exposed together to the same x-ray film. The figure shows exemplary autoradiography images and the quantitative analysis of two experiments revealing a 290 ± 70 (average ±SEM) higher initial slope for ssK36 methylation. An enlargement of the H3K36 data with adjusted y-axis scale is shown in Supplementary Fig. 3d.
time courses of methylation of ssK36 and H3K36 were determined under matching conditions and quantified together by joined exposition of the gels to the same x-ray film ( Fig. 2d and Supplementary Fig. 3d). Comparison of the initial slopes of the methylation reactions revealed that ssK36 was methylated 290fold faster than H3K36.
Substrate specificity analysis of SETD2 in an ssK36 context. Based on the sequence of the super-substrate peptide, a second substrate specificity analysis was performed ( Fig. 1d-f and Supplementary Fig. 2b). In this experiment, only few amino acid exchanges led to slightly increased methylation efficiencies (most prominently I instead of V at position −1 and R instead of Y at the +5 site). This finding confirms that the super-substrate sequence is close to the optimal recognition sequence of SETD2. As before, a very specific interaction with the target peptide was observed. At the −4 position, SETD2 only accepts F, R, I, L, and Y. At the −3 position, SETD2 is highly specific and only tolerates G. At position −2, a strong preference of SETD2 was observed as well, where four residues are allowed with F and G being favorably accepted and Y and A to a lesser extent. At position −1, I is stronger recognized by SETD2 than V. In addition, L, F, and Y are weakly tolerated at this position. At position +1, E, G, and P are not tolerated. The position +2 is also highly specifically recognized and only A, G, I, L, S, T, and V are tolerated apart from P, with clear preferences for I, A, and P. Altogether, the following specificity motif based on the super-substrate (29-43) sequence of SETD2 was determined (the amino acids of the ssK36 peptide are underlined and red represents the target lysine 36): Interestingly, both substrate specificity profile analyses resulted in very similar amino acid preferences, except that the phenylalanine at the −4 position and glycine at the −3 position showed stronger readout in the more favorable super-substrate context than the corresponding residues in H3K36.
Identification of SETD2 non-histone peptide substrates. A Scansite database search with the super-substrate peptide sequence revealed that this sequence is not present in the human proteome. However, we could identify 166 possible targets, which contain partial matches to the substrate specificity profile. The possible methylation of these 166 putative substrates was investigated at the peptide level. For this, fifteen amino acid long peptides containing the sequence of the non-histone targets surrounding the predicted target lysine were synthesized on a membrane. The H3K36 (29-43) peptide (spots A1, I9), the supersubstrate peptide (spots A3, I11) and the corresponding lysine to alanine mutation peptides (spot A2, I10, and A4, I12) were included as positive and negative controls (Fig. 3a, Supplementary  Fig. 4 and Supplementary Table 2). The methylation reactions and the detection of the methylation signals were performed in duplicates as described above. We observed that all of the investigated peptides were more weakly methylated than the super-substrate peptide. However, several of the methylated nonhistone substrates were methylated at least as strong as the H3K36 peptide suggesting that their methylation could be relevant in cells. In the next step, 20 of the peptides were selected for further investigation based on the methylation intensity and the known functions of the proteins to confirm the methylation at the predicted target site. The peptides were synthesized on another membrane, now including the target lysine to alanine mutation  4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Fig. 3 Identification of non-histone substrates of SETD2. a Candidate non-histone peptide substrates were identified in the human proteome based on the SETD2 super-substrate specificity profile. The peptides were synthesized on a peptide SPOT array (Supplementary Table 2) and methylated by SETD2. As controls, the H3K36, H3K36A, ssK36 and ssK36A peptides were included on spots A1-A4 and I9-I12. The autoradiography image is an example of the obtained results. Note the strong methylation of several spots when comparing with the H3K36 control spots (A1 and I9). The quantification of the data of two experiments is shown in Supplementary Fig. 4. b Analysis of target lysine methylation of selected non-histone substrate peptides. Peptide SPOT arrays were synthesized containing the non-histone peptide substrates identified in panel A in addition to the lysine to alanine mutations presented on the array as every second spot (Supplementary Table 3). As controls, the H3K36, H3K36A, ssK36, and ssK36A mutant peptides were included on spot A1-A4 and C5-C8. The autoradiography image is an example of the obtained results. The quantification of the data of three experiments is shown in Supplementary  Fig. 5. Peptides selected for furhter analysis are highlighted by blue circles. c Protein methylation of FBN1 and confirmation of the target lysine methylation. Both wild type (WT) and mutant (K666R) FBN1 were purified and equal amounts were methylated by SETD2 (2 µM control peptides in each case. The methylation reactions and the detection of the signal were conducted as described above (Fig. 3b, Supplementary Fig. 5 and Supplementary Table 3). This analysis confirmed strong methylation at the target site for 9 of the novel non-histone target peptides ( Table 1).
Investigation of non-histone protein methylation by SETD2. The 9 new putative non-histone substrates of SETD2 were cloned into a pGEX-6p2 bacterial expression vector for protein expression. Six of them could be expressed and purified through glutathione sepharose beads (Table 1). For comparison, the supersubstrate sequence was cloned with a C-terminal GST-tag (ssK36-GST) to mimic the natural setting of the H3K36 site in the Nterminal tail of histone 3. Similar amounts of the purified nonhistone substrate proteins and the ssK36-GST protein were loaded on an SDS-gel and stained with Coomassie BB (Supplementary Fig. 6a). Afterward protein methylation reactions were conducted using radioactively labeled AdoMet and the radioactive signal of the methyl group transfer was detected by autoradiography ( Supplementary Fig. 6b). Our data revealed methylation of the FBN1 protein, although more weakly than ssK36-GST. Next, we aimed to confirm the FBN1 methylation at the predicated target lysine K666. For this, a K666R mutation was introduced and the wildtype and mutant FBN1 proteins were purified by affinity chromatography. Equal amounts of wildtype and mutant FBN1 proteins were methylated by SETD2, revealing a clear methylation signal of FBN1 that was lost in the FBN1 K666R mutant (Fig. 3c). This result confirms SETD2 methylation of the FBN1 protein at K666.
Crystal structure of the SETD2-ssK36M peptide complex. To understand the molecular basis of the super-substrate methylation by SETD2, we determined the crystal structure of the human SETD2 catalytic domain (aa 1435-1711) bound to AdoHcy and the super-substrate peptide containing a target lysine to methionine mutation (ssK36M) ( Table 2). The SETD2 catalytic domain comprises an N-terminal AWS motif coordinating two zinc ions (aa 1494-1548), as well as SET (aa 1550-1667) and post-SET (aa 1674-1690) subdomains 31 . The SET domain assumes a triangular barrel-like structure (β1-β2; β3-β8-β7; β4-β6-β5-α6) as previously reported 11,12,31 (Fig. 4a). We observed clear electron density for the ssK36M peptide and a cofactor product corresponding to AdoHcy, feasibly formed from AdoMet under the crystallization conditions. The ssK36M peptide is in an extended conformation, and AdoHcy is surrounded by residues of the SET and post-SET domains (Fig. 4a, b, Supplementary  Fig. 7). The overall SETD2 protein structure as well as the The target lysine residues are printed in bold and underlined. MW refer to the molecular weight of the cloned GST fusion proteins. substrate-binding mode were not notably different from the previously reported ternary complex with H3K36M and AdoMet (PDB 5V21) 12 , as we could see from the aligned enzyme-substrate interaction surfaces of these two complexes (Fig. 4c). We use single and three letter amino acid codes for the residues of the peptide and SETD2, respectively. Residues 29-APRFGGVM-36 of ssK36M are inserted into a surface channel of the SETD2 protein and covered by the loops between α6 and β5; and post-SET and α8 (Fig. 4b) and the peptide conformation closely mimics the previously described structures of the H3K36M and H3.3 K36M peptide bound to SETD2 (PDB 5V21 and 5JJY) 11,12 .
Structural basis for the preferable ssK36 methylation. Interaction analysis by LIGPLOT 32 based on the ssK36M-SETD2 complex structure indicated 18 hydrogen-bonds and 31 spots for hydrophobic interactions across the ssK36M-SETD2 interface, ignoring solvent atoms, but only 14 and 28 for the corresponding H3K36M structure (PDB 5V21) 12 ( Supplementary Fig. 8). As expected, the majority of hydrophobic interactions occur in the SET domain channel that is occupied by the substrate residues 29 through 36. The super-substrate carries four amino acid exchanges (A31R, T32F, K37R, and H39N) with respect to the canonical H3K36 peptide. To compare the substrate interaction differences mediated by these four residues, we superimposed ssK36M and H3K36M bound (PDB 5V21) SETD2 crystal structures (Fig. 4c) by overlaying the SETD2 molecules. Although not completely resolved by electron density, the side chain of ssK36M R31 (in its most likely conformation ttt180 33,34 ) may form a salt bridge with the side chain of Glu1674 (Fig. 5a). The aromatic F32 side chain of ssK36M is inserted into a shallow pocket formed by the side chains of Glu1674 and Gln1676 making more extensive hydrophobic interactions than T32 of H3K36M (Fig. 5b). Neither the H3K36M complex structure 12 nor the present ssK36M structure clearly resolves the side chains of K37 or R37, respectively, yet in the ssK36 complex a difference density near the Ala1699 and Ala1700 carbonyl groups suggests possible hydrogen bonds with the R37 guanidinium moiety 35 (Fig. 5c). We could not infer any specific interactions involving the side chain of N39, equivalent to H3-H39, from the complex structure. In conclusion, the substituted residues in the super-substrate ssK36M interact with SETD2 mainly through the post-SET domain loop and the α6 helix of the SETD2 catalytic domain (Fig. 5d). Slotting of the F32 phenyl ring into a pocket between the Glu1674 and Gln1676 and possible interactions involving the guanidine moieties of R31 and R37 provide a plausible rationale for the enhanced methylation activity by SETD2 due to a tighter binding affinity between the super-substrate peptide and SETD2.
H3K36 and ssK36 methylation at the protein level.  Afterwards, equal amounts of ssK36-GST, H3K36-GST and ssK36M-GST were used for methylation by SETD2 (Fig. 6a). Native mononucleosomes (MN) isolated from HEK293 cells were used as a positive control. We observed that the ssK36-GST protein and the MN were strongly methylated, but no signal was detected for the H3K36-GST protein. Therefore, the experiment was repeated with a longer exposure time of the film. In addition, different dilutions of ssK36-GST were loaded onto the gel to allow for a better comparison of the activities (Fig. 6b). Densitometric analysis of the band intensities revealed an about 50-65 fold higher methylation activity of SETD2 on ssK36-GST than on H3K36-GST.
For analysis of cellular methylation of the super-substrate, we first validated that an H3K36me3 antibody also detects lysine methylation in the context of the ssK36 sequence ( Supplementary  Fig. 9a). Next, YFP-tagged ssK36 and H3K36 (29-43) substrates were transfected into HEK293 cells together with DsRed-tagged SETD2 or empty vector expressing DsRed. The SETD2 expression (or DsRed expression in the empty vector control) was analyzed by FACS showing roughly equal expression levels ( Supplementary Fig. 9b-d). The YFP-tagged ssK36 and H3K36 substrates were isolated by GFP-trap and quantified by Western Blot with anti-GFP antibody (Fig. 6c). Next, the methylation of the substrates was analyzed using the H3K36me3 antibody revealing strong methylation of the supersubstrate after co-transfection with SETD2 (Fig. 6c). No methylation was observed without SETD2 co-transfection, suggesting that the endogenous SETD2 is limited. Moreover, the H3K36 sequence was not methylated even after cotransfection with SETD2, indicating that the super-substrate is methylated much more efficiently than the H3K36 sequence in cells as well. These results were confirmed in three independent transfection series.

Discussion
SETD2 is a PKMT initially in 2005 identified to methylate H3K36 8 . So far, lysine 36 in H3 was still the best substrate known for this enzyme at the peptide level. At the protein level, H3 methylation by SETD2 is weak, but it is strongly stimulated in the context of a nucleosome as shown here by us and previously by others 8 . Besides H3K36, two non-histone substrates of SETD2 are known, α-tubulin methylated at K40 and STAT1 methylated at K525, but methylation of these substrates was reported to be weaker than H3K36 18,19 , and in our hands both were not methylated on peptide arrays. In this work, the substrate specificity profile of SETD2 was determined. Our results indicate that SETD2 is a very specific PKMT with recognition of G33 and G34, readout of hydrophobic residues at V35 and P38 and a preference for arginine at R40. The weak methylation of the α-tubulin (DGQMPSDKTIGGGDD) and STAT1 (LNMLGEKLLGPNA) peptides is in agreement with the specificity profile, because both of them contain amino acid residues that are not among the preferred ones. Based on our specificity profiles, additional nonhistone peptide substrates of SETD2 were identified and 9 of them showed strong methylation at the predicted target lysine. One non-histone protein (Fibrillin 1, FBN1) was also identified as a SETD2 substrate and the target site was validated to be K666. Based on our results and literature data 18,19 , FBN1 is more efficiently methylated than H3, α-tubulin and STAT1, but methylation of H3K36 in the nucleosomal context is even stronger. Methylation of FBN1 is in agreement with the reported cellular localization of SETD2 and FBN1 (www.proteinatlas.org 36 , retrieved in March 2020), because in addition to the nuclear localisation of SETD2 and extracellular localization of FBN1, cytoplasmic localization has been observed for both proteins as well.
FBN1 is a major component of the 10-12 nm diameter microfibrils of the extracellular matrix (ECM) 37 . These microfibrils are highly conserved macromolecular assemblies 38 and mutations of FBN1 cause different diseases, for example, the Marfan syndrome 39 . FBN1 contains 42 Calcium-binding EGFlike (EGFCA) domains together with EFG, EGFL and TB domains. The methylation site K666 is located between the EGFCA domain 6 (aa 613-653) and a Cys-rich TB domain (aa 670-710). K666 has already been found to be acetylated, ubiquitylated and sumoylated (Phosphosite plus 40 , retrieved in May 2020) indicating that post-translational modification of this residue is possible. While further work needs to be done to confirm that FBN1 is a cellular target of SETD2 and identify the biological role of its methylation, SETD2 depletion has been associated with decreased bone matrix formation 41 and reduced interaction of endothelial cells with the surrounding cell and matrix 42 . For the other strong non-histone peptide substrates, where we could not detect protein methylation, it remains to be seen whether they are methylation substrates of SETD2. However, it is interesting to note that COMA and HMCN1 have roles in cell-cell adhesion and extracellular matrix as well and FBN2 is a close homolog of FBN1.
Strikingly, our data revealed that the substrate specificity preferences of SETD2 differ from H3K36 at five sites; R is favored at position −5 instead of A31, F instead of T32 at position −4, R instead of K37 at position +1, N instead of H39 at position +3 and R instead of Y at +4. Based on these findings, a supersubstrate peptide was designed and shown to be methylated about 290-fold better than the original H3K36 peptide. This is a striking result illustrating the power of PKMT substrate design on the basis of specificity profiles. The super-substrate fused to GST (ssK36-GST) was also methylated much more strongly than the equivalent H3K36-GST protein, confirming the distinctive methylation at the protein level. In fact, methylation of ssK36-GST was comparable to the methylation of H3 in a nucleosomal context, making ssK36-GST the best protein substrate of SETD2 that is known so far. Highly preferred methylation of the supersubstrate fused to YFP by SETD2 was also confirmed in human cells.
The kinetic results with SETD2 and the super-substrate are in good agreement with the structures of SETD2 in complex with different H3K36 peptides 11,12 and the structure of SETD2 with bound super-substrate K36M inhibitor determined here. The high overall specificity of SETD2 including readout of substrate residues from P30 to R42 is in agreement with the structural data showing that the ssK36M and H3K36M peptides are deeply engulfed by the enzyme. G33 and G34 are bound to narrow binding sites, which do not leave space for larger amino acid side chains and V35 is embedded in a hydrophobic pocket. These residues are present in the normal H3K36 sequence and in the super-substrate sequence. The 3.5-fold improved inhibition constant of ssK36M when compared with the H3K36M inhibitor observed in the steady-state inhibition kinetics can be explained by possible additional interactions observed in the SETD2-ssK36M complex structure; ssK36M R31 could interact with Glu1674, F32 is sandwiched into a pocket generated by Glu1674 and Gln1676, and R37 could interact with the peptide carbonyl groups of Ala 1699 and Ala1700.
As shown in the kinetic analysis, the increased methylation of the super-substrate is mainly due to an increase in the rate of catalysis and only to a smaller degree to improved binding. This observation suggests that conformational changes occur after substrate binding which are leading to the increased catalytic rate of SETD2. This conclusion is also in agreement with the finding that methylation of H3K36 in a nucleosomal context is much faster than methylation of the H3 protein. To identify parts of the structure that might undergo conformational changes, we superimposed the SETD2-ssK36M complex with the H3K9 complex of the highly active DIM5 H3K9 PKMT (pdb 1PEG) 43 . While the conserved SET domains of both proteins superimposed well, the C-terminal part of α-helix 6 of SETD2 next to R31 and F32 is shifted leading to a more open peptide binding cleft (Fig. 7a). Interestingly, movement of the same helix has been linked to regulation of the activity of the MLL1 (KMT2A) PKMT previously 44 . In addition, the C-terminal part of the bound ssK36M peptide including R37 and N39 is shifted when compared with the conformation observed in the DIM5-H3K9 complex (Fig. 7b). Hence, our data suggest that the amino acids exchanged in the super-substrate either allow for the formation of additional contacts to the enzyme or they influence the conformational dynamics and by this they lead to the massive rate enhancement of methylation of the super-substrate.
In conclusion, our data show the importance of mapping the substrate specificity of PKMTs, because based on specificity profiles, new substrates can be identified and mechanistic insights can be gained as demonstrated here for SETD2. The discovery of novel non-histone substrates of PKMTs is particularly important to gain more insights into their biological roles. Lysine methylation can cause downstream effects, like changing the protein stability or modulation of protein/protein interaction, which then lead to biological outcomes. The most surprising result of our work is the design of an artificial SETD2 substrate that is methylated about 290-fold faster than H3K36, the best substrate of this enzyme known so far. These data illustrate the great potential in substrate sequence design to increase the specific activity of PKMTs. Strikingly, the SETD2 super-substrate sequence identified here does not exist in the human proteome. It will be interesting to find out the reason for this unexpected specificity of this important PKMT. One evolutionary scenario to explain this surprising finding could be that SETD2 has been simultaneously optimized in evolution for the methylation of more than one protein substrate leading to a "mixed" adaptation. Then, the amino acid sequence that interacts best with the binding site may differ from all individual substrates and it may even not correspond to an existing protein sequence as observed here for SETD2.
Cell culture, transfection, and immunoprecipitation. HEK293 cells were grown in Dulbecco's Modified Eagle's Medium (Sigma) supplemented with 5% fetal bovine serum, penicillin/streptomycin, and L-glutamine (Sigma). The DsRed fused SETD2 was co-transfected with YFP-fused H3K36 (29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40)(41)(42)(43) 43 . DIM5 is shown in light blue, SETD2 in tan. The H3K9 peptide bound to DIM5 is shown in dark blue with K36 in green. The ssK36M bound to SETD2 is shown in yellow with the substituted positions in orange and M36 in green. a While the conserved SET domains of SETD2 and DIM5 superimpose well, the C-terminal part of α-helix 6 next to R31 and F32 is shifted in SETD2 leading to a more open peptide binding cleft. b The C-terminal part of the bound ssK36M peptide including R37 and N39 is shifted when compared with the conformation observed in the DIM5-H3K9 complex.
Peptide array methylation. Peptide arrays containing fifteen amino acid long peptides were synthesized using the SPOT synthesis method 48 with an Autospot peptide array synthesizer (Intavis AG, Köln, Germany). After synthesis, the obtained membranes were pre-incubated in methylation buffer (20 mM Tris/HCl pH 9, 0.01% Triton X-100, 10 mM DTT, 1.5 mM MgCl 2 ) for 5 min on a shaker. Thereafter, the membranes were incubated in methylation buffer supplemented with 0.76 µM radioactively labeled AdoMet (PerkinElmer) and 3-6 µM SETD2 enzyme for 60 min on a shaker. After methylation, the membranes were washed 5 times for 5 min in wash buffer (100 mM NH 4 HCO 3 , 1% SDS) and incubated in amplify NAMP100V (GE Healthcare) for 5 min. This was followed by the exposure of the membranes to a hyperfilm TM high performance autoradiography (GE Healthcare) film at −80°C in the dark. The films were developed with an Optimax Typ TR machine after different exposue times. For quantification the signal intensities were measured with the Phoretix TM software and analyzed with Microsoft Excel.
Time courses of ssK36 methylation were analyzed by fitting to an exponential reaction progress curve (Eq. 1). Initial rates of single turnover reactions were determined using the first derivative of eq. 1 at t = 0.
: Exponential reaction progress curve used to fit single turnover methylation kinetics (BL, baseline; F, Signal factor; t, time; k st , single turnover rate constant).
: Michaelis-Menten model used to fit multiple turnover kinetics (v, reaction rate; c S , concentration of the substrate peptide; K M , Michaelis-Menten constant; v max , turnover number).
Native mononucleosome isolation. HEK293 cells were grown in Dulbecco's Modified Eagle's Medium (Sigma) supplemented with 5% fetal bovine serum, penicillin/streptomycin, and L-glutamine (Sigma). HEK293 cell pellets was thawed on ice and afterward resuspended in ice cold resuspension buffer (RB) (15 mM Tris/HCl pH 8, 15 mM NaCl, 60 mM KCl, 250 mM sucrose, 5 mM MgCl 2 , 1 mM CaCl 2 , 1 mM DTT, 200 µM PMSF). Thereafter, the cells were centrifuged at 530 g for 5 min at 4°C and the supernatant was discarded. These two steps were repeated. Afterward, the cells were resuspended in a 1:1 mixture of ice cold RB and lysis buffer (15 mM Tris/HCl pH 8, 15 mM NaCl, 60 mM KCl, 250 mM sucrose, 5 mM MgCl 2 , 1 mM CaCl 2 , 1 mM DTT, 200 µM PMSF, 0.6% NP-40) and lysed for 10 min on ice. Then, the cells were centrifuged at 530 × g, for 5 min at 4°C and the supernatant was discarded. The nuclei were resuspended in RB and transferred into a new chilled eppendorf tube. The nuclei were centrifuged at 600 × g, for 5 min at 4°C and the supernatant was discarded. This step was followed by a washing step of the nuclei with RB and another centrifugation of the nuclei at 600 × g, for 5 min at 4°C. Thereafter, the nuclei were resuspended in RB and the DNA concentration was measured with the NanoDrop spectrometer. After this step, the nuclei were digested with micrococcal nuclease (1 µl per 300-350 ng/µl) (NEB, M0247S) at 37°C for 10 min. The reaction was stopped by the addition of EGTA-EDTA buffer (1/10 of the volume) and this was followed by centrifugation at maximum speed, for 10 min at 4°C. Thereafter, the supernatant was transferred into a new chilled eppendorf tube and the DNA concentration was measured with the NanoDrop spectrometer.
Histone purification. Individual histone proteins were overexpressed in BL21 DE3 codon plus cells. The cells were grown to an OD 600 of 0.6, induced for 4 h with 1 mM isopropyl-β. -D-thiogalactopyranoside (IPTG), and harvested by centrifugation (5000 × g, 20 min). The cell pellet was resuspended in lysis buffer containing 40 mM sodium acetate pH 7.5, 1 mM EDTA, 10 mM lysine, 5 mM β-mercaptoethanol, 6 M urea and 200 mM NaCl, and disrupted by sonication (5 min of sonication, with 40% power). The lysate was cleared by centrifugation (20,000 × g, 1 h) and passed through a 0.45 µm syringe filter to remove particulate matter. The solution was then passed over a 5 ml SP HP cation exchange column (GE Healthcare) which was equilibrated with lysis buffer. After washing with additional 5 column volumes (CV) of lysis buffer, the histone proteins were eluted using a salt gradient from 200 to 800 mM NaCl. Fractions containing pure histone proteins were dialyzed against water and subsequently lyophilized and stored at 4°C.
Octamer reconstitution. Lyophilized aliquots of histone proteins were dissolved in unfolding buffer containing 7 M guanidiniumchloride, 20 mM Tris/HCl pH 7.5, and 5 mM DTT for 30 min. Concentration was determined by spectrophotometry and the proteins were pooled according to a molar ratio of 1 (H3, H4) to 1.2 (H2A, H2B) to prevent the formation of free (H3-H4) 2 tetramers. The mixture was dialyzed against refolding buffer containing 2 M NaCl, 10 mM Tris/HCl pH 7.5, 1 mM EDTA, and 5 mM β-mercaptoethanol. Fully reconstituted histone octamers were separated from (H2A-H2B) dimers by size exclusion chromatography using a Superdex 200PG column (GE Healthcare) equilibrated with refolding buffer. Fractions containing pure histone octamers were concentrated tenfold using Amicon spin filters (Merck), flash frozen in liquid nitrogen and stored at −80°C.
Nucleosome reconstitution. A 240 bp sequence containing the Widom 601 sequence was amplified by PCR and purified using the Nucleospin PCR cleanup kit (Macherey-Nagel). Purified DNA and histone octamers were mixed in equimolar ratio together with 5 M NaCl to a final DNA concentration of 0.5 mg/ml and 2 M NaCl. The sample was dialyzed against high salt buffer (2 M NaCl, 10 mM Tris/HCl pH 7.5, 1 mM EDTA, 1 mM DTT) which was continuously replaced by low salt buffer (same with 250 mM NaCl) over a time of 24 h. Successful nucleosome assembly was confirmed by electrophoretic mobility shift assay (EMSA), aliquots were flash frozen in liquid nitrogen and stored at −80°C.
Protein purification and crystallization. Wild-type human SETD2 catalytic domain (residues 1435-1711) was expressed in Escherichia coli (E. coli) cells and purified in the form of His-tagged proteins as previously described 31 . Briefly, Nterminal His-tagged plasmids were transformed into E. coli (DE3) RIL cells and, protein expression was induced by 0.5 mM IPTG at 14°C for 16 h. Proteins were purified using Ni-NTA affinity agarose resin (QIAGEN), followed by running through cation-exchange (HiTrap SP HP 5 mL, GE Healthcare) and size-exclusion (Superdex 200, GE Healthcare) columns in a final buffer of 25 mM Tris/HCl (pH 8.0), 200 mM NaCl. For ternary complex crystal growth, the SETD2 protein (10 mg/ml) was incubated with AdoMet and the ssK36M peptide at a molar ratio of 1:10:5. The crystallization condition was optimized based on a previously reported condition for the crystal growth of a SETD2 ternary complex 11 . Crystals were obtained via the hanging drop method by mixing 1 μL of the protein complex and 1 μL of reservoir solution (0.2 M KSCN, 0.1 M Tris/HCl pH 8.5, and 26% PEG 3350) at 18°C. Next, the crystals were mounted and soaked in a cryo-protectant composed of reservoir solution supplemented with 10-20% glycerol and were flash-frozen in liquid nitrogen for data collection.
Data collection and structure refinement. Diffraction experiments were conducted initially on a rotating copper anode and, later, at Advanced Photon Source beam line 24-ID-E. Diffraction images were processed with XDS 49 and AIM-LESS 50 . Coordinates from isomorphous PDB entry 5JLB chain A were refined against these diffraction data using DIMPLE/REFMAC 51 . The model was interactively rebuilt in COOT 34 and further refined in REFMAC, AUTOBUSTER (BUSTER. Cambridge, United Kingdom: Global Phasing Ltd.) and PHENIX 52 . The latest round of refinement was performed with BUSTER against the weaker, but earlier collected, copper anode data, which we preferred due to its lower radiation load and concerns about radiation-induced decarboxylation of glutamate residues 53 . Two conformations were modeled for the side chain of ssK36M R37 to denote uncertainty in the true coordinates, not to denote the observation of any specific conformation. We failed to fit a favored arginine rotamer to the R37 difference electron density. Diffraction data were deposited at the Integrated Resource for Reproducibility in Macromolecular Crystallography (proteindiffraction.org) as entry SETD2_HR606-6vdb (https://proteindiffraction.org/project/ SETD2_HR606_6vdb/). The current atomic model has been deposited in the Protein Data Bank (https://www.rcsb.org/) as entry 6VDB (https://www.rcsb.org/ structure/6VDB). The PDB2PQR webserver has been used to prepare some of the illustrations 54 .
Statistics and reproducibility. All experiments were conducted in replicates as indicated.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The datasets generated during the study are available from the corresponding authors upon request. Diffraction data were deposited at the Integrated Resource for Reproducibility in Macromolecular Crystallography (proteindiffraction.org) as entry SETD2_HR606-6vdb. The current atomic model has been deposited in the Protein Data Bank (https://www.rcsb.org/) as entry 6VDB. All biochemical data generated or analyzed during this study are included in the published article and its supplementary files. Source data underlying plots shown in figures and full blots are provided in Supplementary Data 1.