Site-specific characterization of endogenous SUMOylation across species and organs

Small ubiquitin-like modifiers (SUMOs) are post-translational modifications that play crucial roles in most cellular processes. While methods exist to study exogenous SUMOylation, large-scale characterization of endogenous SUMO2/3 has remained technically daunting. Here, we describe a proteomics approach facilitating system-wide and in vivo identification of lysines modified by endogenous and native SUMO2. Using a peptide-level immunoprecipitation enrichment strategy, we identify 14,869 endogenous SUMO2/3 sites in human cells during heat stress and proteasomal inhibition, and quantitatively map 1963 SUMO sites across eight mouse tissues. Characterization of the SUMO equilibrium highlights striking differences in SUMO metabolism between cultured cancer cells and normal tissues. Targeting preferences of SUMO2/3 vary across different organ types, coinciding with markedly differential SUMOylation states of all enzymes involved in the SUMO conjugation cascade. Collectively, our systemic investigation details the SUMOylation architecture across species and organs and provides a resource of endogenous SUMOylation sites on factors important in organ-specific functions.

quantified in response to MG132 treatment, in this study (HEK cells) compared to a study using exogenous K0-SUMO (HeLa cells) 1 . Every dot represents a protein quantified in both studies. The line represents the best-fit linear correlation, with the Pearson correlation (R) and p-value indicated. The p-value was determined by two-tailed Student's t-testing, n=2,674 quantified protein ratio pairs. (B) As A, but for SUMO2/3 sites LFQ quantified in response to MG132 treatment in both studies. n=7,534 quantified site ratio pairs.   Immunoblot analysis of SUMO-immunoprecipitation performed using the 8A2 antibody, from Lys-C digested mouse organs (liver, kidney, testis, and brain), after desalting the peptides. An amount of Lys-C digest corresponding to 0.1% of an entire organ was loaded (excepting liver for which half the organ was used), or 0.1% of the Lys-C digest corresponding to the protein content of one HEK cell culture replicate (50 mg). Immunoblot probing was performed using the 8A2 antibody; long exposure. The asterisk indicates the fragment resulting from Lys-C digestion of free SUMO,

Optimization of the SUMO-IP
Whereas other SUMO enrichment methods rely on denaturing buffer conditions, or seek to dilute the concentration of denaturants prior to enrichment, we chose to entirely eliminate denaturing agents for the purification phase. Following Lys-C digestion, peptides were desalted on C8 resin to eliminate guanidine from the samples and to efficiently capture larger peptides, as the SUMO mass remnant generated upon Lys-C digestion is 5.6 kDa. Moreover, we investigated at which percentage of solvent SUMOylated peptides elute from the resin (Supplementary Fig. 1C-D), and adapted the protocol to include a highly selective elution range. Importantly, this minimized the presence of smaller unrelated peptides in the sample, and also excluded precipitate (typically undigested proteins) and polymers that may elute from the C8 resin when using very high (>50%) concentrations of acetonitrile.
For our SUMO-IP strategy, desalted Lys-C-digested peptides were lyophilized and redissolved in a very mild buffer. In this manner, the SUMO purification was carried out in buffer conditions analogous to western blotting, where the 8A2 antibody is known to recognize the SUMO epitope in a robust and specific manner. To corroborate this, depletion of SUMOylated peptides upon enrichment from a Lys-C digested lysate was confirmed by immunoblot (IB) analysis (Supplementary Fig. 2A-B). Compared to a previous study where the 8A2 antibody is used to immunoprecipitate endogenously SUMOylated proteins, we used 20 times less antibody per milligram of starting material 3 , while achieving a 20-fold greater depth of sequencing (Fig. 1E).

Choosing the optimal SUMO mass remnant
While successful in enriching peptides modified by endogenous SUMO2/3, the single-largest complication of proteomics analysis of endogenous site-specific SUMO2/3 relates to the large mass remnant that remains after Lys-C or trypsin digestion. Thus, knowing that a Lys-C digestion followed by purification and a second enzymatic digestion can generate samples of sufficient purity for proteomics analysis 1,4 , we generated initial SUMO-IP samples from HeLa and U2OS cells ( Supplementary Fig. 2C), and digested equal amounts with either trypsin, endoproteinase Glu-C (cleaves C-terminal of E and D residues), and wildtype alpha-lytic protease (WALP; preferentially cleaves C-terminal of V, A, T, and S residues). The tryptic digest was expected to leave a 3.5 kDa SUMO remnant, while Glu-C digest theoretically leaves a 1.3 kDa remnant, and WALP should yield a di-glycine remnant.
For identification of endogenous SUMO modification sites we employed liquid chromatography coupled with high-resolution mass spectrometry (LC-MS/MS) on the Q Exactive HF instrument 5 . For identification of SUMO-modified lysine residues and associated SUMO mass remnants, all peptides were fragmented using higher-energy collisional dissociation (HCD) 6 . To ensure high confidence in identified peptides and modification sites, acquired data was filtered at both peptide-and protein-level to ensure a false-discovery rate (FDR) or <1%. For identification of modified lysines we assessed the localization score for each site using the MaxQuant software suite 7 , and only considered sites identified with an Andromeda score above 40, a delta score above 20, and a localization score above 6 (approximately >80% localization probability).
Although the tryptic SUMO mass remnant is commonly considered too cumbersome for successful identification, we nonetheless identified 918 unique SUMOylated lysines using HCD In conclusion, while optimizing the SUMO site identification strategy, we utilized four different enzymes in the second stage, all following Lys-C as the first stage enzyme.
Comparatively, we found that Asp-N identified four times more sites as WALP and trypsin did, when using equal amounts of sample. Unlike Glu-C, Asp-N did not generate a multitude of mass remnants when digesting the SUMO2/3 C-terminus. Somewhat to our surprise, the tryptic remnant could readily be identified using our approach in combination with the freely available MaxQuant

Comparison of the identified endogenous SUMOylome to exogenous analyses
As our endogenous analysis was performed using the same treatments as our recent K0-SUMO study, we investigated in more detail the differences in identified SUMOylation sites between the endogenous and exogenous analyses. Under control conditions and in response to heat shock we identified 70% and 86% of the number of previously identified exogenous SUMO sites, respectively (Fig. 1D) Fig. 3D).

SUPPLEMENTARY NOTE 4 Analysis of SUMO consensus sub-motifs and sites-per-protein
When investigating the sequence context surrounding the endogenously SUMOylated lysine residues we identified, other known sequence trends could be observed ( Fig. 2A), including enrichment for bulky hydrophobic amino acids at −1, acidic residues from +5 to +8, and the inverted [ED]xK motif 10 . To further validate the motifs observed within our data, we performed Motif-X analysis to extract high-confidence sequence motifs (Supplementary Data 1 and Supplementary Fig. 4), confirming the presence of all well-established SUMO motifs within our dataset. KxE-type sites were globally more abundantly modified and thus more confidently identified, and the top 2,000 sites identified retained over 50% KxE adherence (Fig. 2B), demonstrating that identified SUMO2/3 sites and the overall SUMOylation pattern identified by our method follows the previously described preferences of the SUMO enzymatic machinery.
Since our endogenous method identified sites following known preferences of SUMOylation, we investigated the number of SUMO sites identified per protein. Our endogenous analysis identified ~3 SUMOylation sites per protein under standard growth conditions and in response to MG132, and ~4 sites per protein in response to heat shock ( Fig. 2C-D). In response to stress, we observed a sharper incline in the number of SUMO-modified lysines compared to SUMO target proteins (Fig. 2C-D), suggesting that primarily the same proteins are being increasingly modified in response to proteotoxic stress and heat shock, in line with previous SUMO proteomics reports 1,4 .

SUPPLEMENTARY NOTE 5 Comparison between SUMOylation phenomena in human and mouse
We compared identified SUMO2/3 target proteins, grouped by their respective protein-coding genes, in order to approximate global similarities and differences between the SUMOylomes we identified from cultured human cells and mouse organs. Overall, 663 proteins were found to be observed enrichment for proteins involved in telomere maintenance, response to ionizing radiation, helicases, and rRNA processing. In mouse organs, exclusively SUMOylated proteins were strongly enriched for many functions associated with organs, including glycolysis, oxygen transport, and muscle contraction. SUMO2/3 target proteins found in cell culture in response to stress exclusively were primarily enriched for tRNA processing, the Golgi apparatus, and both phosphatase and kinase function. Interestingly, 112 proteins only SUMOylated in response to stress in cell culture were baseline SUMOylated in mouse organs, with functions including regulation of muscle adaptation to stress, oxidation-reduction activity, and nutrient response, suggesting that the conditions required for this stress-induced SUMOylation in cell culture may exist naturally within certain mouse tissues.

Subcellular localization of endogenous SUMO substrates
SUMOylation is canonically a modifier of nuclear proteins 11 , and proteomics studies have extensively supported this 12 . In order to assess whether endogenous and in vivo SUMOylation has a similar tendency for targeting nuclear proteins, we investigated the subcellular localization annotation for all SUMO2/3 proteins and sites identified. In the human cell line data, 76% of all SUMO-modified proteins were localized in the nucleus, increasing to 88% when considering SUMO sites individually (Fig. 5A). Under control condition, 95% of SUMO sites occurred on nuclear proteins, confirming that steady-state endogenous SUMOylation is predominantly nuclear.
Moreover, 29% of SUMO sites were mapped to proteins located at the chromatin; 5-fold greater than randomly expected. We performed the same analysis for the mouse organs, and found a large degree of similarity to human cell culture (Fig. 5B). Of all SUMO-modified proteins in mouse organs, 79% were nuclear-localized and 20% chromatin-localized, with no notable differences when considering SUMO target proteins or sites. Lung and spleen displayed the highest fraction of nuclear-localized SUMO target proteins, at 94% and 96%, respectively. Testis and spleen displayed the highest fraction of chromatin-localized SUMO sites, at 29%.

Crosstalk between phosphorylation and SUMOylation
Phosphorylation at the +4 or +5 position of the SUMOylated lysine has been previously established, as well as a tendency for this phosphorylation to be proline-directed 1 . Using our endogenous SUMO2/3 method in HEK cells, we mapped 526 unique SUMO2/3 and phosphorylation co-modified peptides, directly identifying and localizing both PTMs in the MS/MS spectra (Supplementary Data 11). Proline-directed phosphorylation was observed in 70% of all cases, and thus in agreement with what was described previously. In humans, proline-directed phosphorylation occurs with 34.7% frequency 13 , highlighting the significance of our finding with a pvalue of 2.5*10 −23 using Fisher Exact testing. An enrichment for phosphorylation at the +4 and +5 positions was also confirmed (Fig. 5C), however, we also observed significantly more phosphorylation events at both the −3 and +2 positions and significantly less phosphorylation events at both the −1 and +1 positions. Overall, we found 3.5% of total SUMOylation proximal to phosphorylation under standard growth conditions, and 2.7% when also considering proteotoxic stress.
In mouse organs, we identified 49 unique SUMO2/3 and phosphorylation co-modified peptides in total (Supplementary Data 12). Phosphorylation occurred significantly at +5 in relation to the SUMOylated lysine (Fig. 5D), in accordance with the canonical phosphorylation-dependent SUMOylation motif 14 . We also observed higher occurrence of phosphorylation at −5, −4, and −3, of SUMOylation. Notably, we could not detect any co-modified peptides in brain and heart, and the overall contribution of SUMO-phospho to the total SUMOylome was only ~0.1% in muscle, lung, and kidney. In spleen, 0.7% of SUMOylation occurred proximal to phosphorylation, increasing further to 1.5% for both liver and testis. Globally, 73% of SUMO-proximal phosphorylated occurred in a proline-directed manner, suggesting that phosphorylation that occurred proximal to SUMO in organs is potentially cell-cycle regulatory. In mice, proline-directed phosphorylation occurs with 37.5% frequency 15 , showing the significance of our finding with a p-value of 0.0022 using Fisher Exact testing. Interestingly, 76% of all co-modified peptides could be identified in liver, with the rest mapped primarily in testis, and otherwise in spleen. In testis, ~90% of the total phosphorylationproximal SUMOylation occurred on Mediator of DNA damage checkpoint 1 (Mdc1). Although Mdc1 is expressed at a somewhat higher level in testis as compared to the other organ types we investigated 16 , Mdc1 was nonetheless found to be co-modified to a higher degree than the canonical phosphorylation-dependent SUMOylation target protein Nop58 17 . Conversely, Nop58 was SUMO-phospho co-modified to a higher degree in other organ types.

Structural preference of endogenous SUMOylation
Exogenous SUMOylation generally targets lysines residing in disordered and exposed protein regions, contrasting ubiquitylation which frequently targets lysines in globular and buried protein regions 1 . To investigate whether the same preference is observed for endogenous SUMOylation in human and mouse, we performed structural-predictive analyses on all human and mouse SUMO2/3 target proteins identified. In human cells, we observed that SUMO2/3 sites identified under standard growth conditions were 60% more likely to occur in disordered regions (Fig. 6A). In response to heat shock, there was still enrichment for SUMOylation in disordered regions, but notably less than in the control. In response to MG132, and especially when considering MG132exclusive SUMO sites, the structural preference of SUMOylation was completely inverted, essentially mimicking ubiquitylation. In mouse organs, we observed a global preference for SUMOylation to target disordered regions, positioned in between the control and stress-induced enrichments observed for human cells (Fig. 6B). When considering specific organs, we noted that in brain there was no significant enrichment for any structural regions, in contrast to all other organs. Conversely, SUMOylation in lung and spleen was more likely to be targeted to disordered regions as compared to other organ types.
To assess whether these slight differences in structural preferences are owing to SUMO, or to a more general aspect of PTMs in tissue, we mapped tissue-specific ubiquitylation sites onto the SUMO2/3 proteins identified across organs. Collectively, the structural preferences of ubiquitin were highly similar across tissues and in human cells. Lysines targeted by both PTMs in mouse organs did not display a notable type of structural preference. To further disentangle potential organ-specific differences, we additionally compared SUMO to ubiquitin in individual organs (Fig.   6C); however, only for the five organs where ubiquitylation was previously mapped 18 . Here, we observed the same trend, with SUMO targeting disordered regions in heart, kidney, liver, and muscle, whereas ubiquitin predominantly targeted structured regions in all five organs, and most notably in liver. Overall, this demonstrates the intriguing propensity for SUMO2/3 and ubiquitin to target different structural regions of the same proteins. With these preferences appearing to be globally different between organ types, this highlights the need for future investigation into the in vivo regulation of these PTMs.

SUPPLEMENTARY NOTE 6 SUMO density in cells and organs
To assess the purity of our SUMO-IP strategy, we quantified the total amount of SUMO2/3modified peptides and proteins relative to the background signal. In HEK cells between 93-96% of MS/MS-identified peptide signal originated from SUMOylated proteins, corresponding to the overall sample purity after SUMO-IP ( Supplementary Fig. 9A), and demonstrating the remarkable efficiency of our purification strategy. In mouse organs, the SUMO-IP purity ranged from 42-62%, indicating either a larger relative presence of background proteins, or more putative target proteins where we did not manage to identify a SUMO site. Because the samples were further digested after the SUMO-IP, not all identified peptides correspond to modified peptides, and in HEK cells between 34-50% of SUMO target protein signal was derived from SUMO sites. In mouse, this number was considerably lower, ranging from 3-4% in muscle, heart, and brain, to 16-22% in spleen and testis. As modified peptides were harder to identify and filtered more stringently than unmodified peptides, the observed peptide-level purity correlated closely to the overall number of SUMO sites identified across the samples.
By normalizing the absolute amount of SUMO protein purified to the amount of input material used for each experiment, we observed a much greater yield of SUMO target proteins per milligram of starting material in HEK cells, as compared to the mouse organs ( Supplementary Fig.   9B). Even under standard growth conditions, we found a >10-fold higher density of SUMOmodified proteins in HEK cells, increasing to >20-fold in response to heat shock. No large differences were observed between organ types, with only heart having a lower yield ( Supplementary Fig. 9C). At the SUMO sites level, the gap was considerably bigger, with control condition HEK cells having a 30-fold higher yield of SUMO sites as liver, increasing to a 70-fold higher yield in response to heat shock. At the SUMO sites level, some differences were observed between the organs, mostly mirroring the sample purity as outlined above.
Finally, to ascertain whether the decreased SUMO yield obtained from mouse organs could be because of technical limitations, we investigated the amount of background proteins identified across all experiments, in relation to the amount of starting material (Supplementary Fig. 9D).
Reassuringly, even though we observed very large differences in the amount of SUMO sites and proteins between the experiment types, the amount of normalized absolute background signal did not vary significantly between samples. This suggested that the decreased amount of SUMO observed in mouse organs is because of biological reasons.

Validation of SUMO equilibrium using immunoblot analyses
Whereas immunoblotting can be used to a certain extent to assess the distribution of conjugated and free SUMO, differences in blotting methods and antibody preferences can bias towards detection of either conjugated SUMO or free SUMO, and such methods are not always applicable across all vertebrate model systems. Nonetheless, to validate the large difference between SUMO equilibrium we observed when comparing HEK cells to mouse organs, we performed immunoblot analysis on whole organ extracts and HEK whole cell lysates ( Supplementary Fig. 11). To facilitate immunoblot analysis and prevent potential deconjugation of SUMO2/3 during sample preparation, these samples were prepared in a denaturing buffer including a high concentration of sodium dodecyl sulfate, broad-range protease inhibitors, and a high concentration of N-ethylmaleimide to specifically inhibit SUMO proteases 2 .
In agreement with our MS data, we observed a much higher density of SUMOylation in HEK cells compared to all mouse organs we tested, exemplified by similar or higher levels of SUMO2/3 signal observed in HEK cells despite loading 10 times less total protein ( Supplementary   Fig. 11). Reassuringly, we also observed comparable distributions of conjugated SUMO2/3 and free SUMO2/3, with SUMO2/3 predominantly conjugated in HEK cells and large pools of free SUMO2/3 existing natively in most mouse organs. Although immunoblot analysis can be used to probe the SUMO equilibrium, potential differences in transfer efficiency of small and large proteins during electrophoretic transfer, in addition to potential off-target effects when using different antibodies to probe for SUMO2/3 in distinct organ types, could lead to variable results. As our MS strategy is not sensitive to these limitations, we believe that using MS to quantify the SUMO equilibrium is more accurate and reproducible.

SUPPLEMENTARY NOTE 8 Comparison to contemporary endogenous SUMO proteomics methods
Whereas other SUMO proteomics strategies have been published that facilitate studying endogenous SUMOylation to some degree 3,19-21 , none of them display the same sensitivity as our method, and none of them can specifically enrich and uniquely identify lysines modified by endogenous SUMO-2/3. Two contemporary methods purify SUMOylated proteins, but do not strife to identify modified lysine residues 3,19 .

Comparison to site-specific endogenous SUMO strategies
Recently, a study was published that facilitates identification of endogenously SUMOylated lysines, through digestion of total lysate with the WALP enzyme followed by di-glycine enrichment 21  Another recent study identified 53 lysines endogenously modified by SUMO1 in mouse testis, applying a strategy of trypsin/Lys-C digest followed by SUMO1 peptide purification and subsequent digestion with Glu-C 20 . Although not reaching the same depth as our method and not being applicable to SUMO2/3, the method described by Cai et al. is the only one to currently facilitate specific study of lysine residues endogenously modified by SUMO1.

In-depth analysis of endogenous SUMO sites identified by Lumpkin et al.
The highest number of reported endogenous SUMO sites detected by mass spectrometry is 1,209, with the authors using heat shock treated HeLa cells, digesting the total lysate with the WALP enzyme, and subsequently performing di-glycine enrichment to purify putative SUMO sites 21 . In theory, with WALP cleaving C-terminal of threonine, this would allow identification of lysines modified by endogenous SUMO-1 and SUMO-2/3, although these distinctly regulated SUMO family members cannot be distinguished by their method. In our initial optimization phase, we also tested WALP, and noticed that the enzyme is highly promiscuous. While WALP preferentially cleaves C-terminal of valine, alanine, threonine, serine, leucine, and glycine residues, the enzyme can cleave virtually anywhere at somewhat lower efficiency. This is reflected by Lumpkin et al.
performing a non-specific search on their data, allowing WALP to cleave anywhere for generation of their theoretical peptide library. Moreover, the authors found multiple SUMOylated peptides that were cleaved C-terminal of arginine, and mention that up to 8% of their peptides were cleaved Cterminal of arginines 21 . This in turn strongly suggests that ubiquitin, which has an RGG motif at the  Fig. 13A). In HEK cells, we routinely identified 1,500

Validation of matching between runs
We investigated the intensity assignments for MS/MS and matching identifications within all the mouse organ SUMO data. For this, we utilized the "evidence.txt" file as written by MaxQuant, which is available at the ProteomeXchange Consortium database via the Proteomics Identifications (PRIDE) partner repository, under dataset ID PXD008003. Overall, we found that 16.7% of the total assigned SUMO signal was derived from matching between runs, with the large majority of all experimental evidence derived from direct MS/MS identifications. Upon inspection of all matched peptides, we found that 79.5% of these occurred within replicates of the same organ type, and 96.5% matched to the same chromatographic fraction. The 3.5% of matches corresponding to neighboring fractions could potentially be false-positive, which would correspond to 0.6% of the total experimental intensity, leaving the maximum matching FDR below 1%. Moreover, fractionmismatched hits are not necessarily wrong as peptides frequently appear in neighboring chromatographic fractions.

Evolutionary conservation of SUMO
We investigated the evolutionary conservation of SUMOylation, and found that SUMOylation occurred on less conserved lysine residues in ordered regions. In support of our findings, bioinformatics analysis of PTM co-evolution previously revealed that SUMOylation entailed the fastest evolution of all analyzed PTMs 25 . Contrary to what we found, Minguez et al. reported that SUMOylation sites were more conserved compared to background residues. However, the data used by Minguez et al. for this analysis was based upon a limited number of SUMOylation sites, mostly derived from studies employing single mutagenesis analyses, which overall does not reflect an unbiased overview of the SUMOylome. Intriguingly, whereas SUMO2/3 targeted evolutionarily less conserved lysines, most PTMs are described to target more conserved residues 25 , and specifically phosphorylation is considered to occur on evolutionarily conserved residues 26 . Thus, our data highlights a unique property of SUMOylation to rapidly evolve, suggesting that SUMOylation may be regulated or potentially misregulated to a considerable extent in the context of disease.

Endogenous and in vivo SUMO chain topology
SUMO2/3 chain topology was similarly different between cell culture and organs, with SUMO chains being focused on the KxE-type K11 in SUMO2, and increasingly so in response to stress.
Conversely, K11 modification of SUMO2 was rarer in organs, with chain formation more often occurring on K21 and K33. Interestingly, this would shift the chain topology observed in organs in the opposite direction from the stress-induced topology in cell lines, thus hinting at a 'low stress' state in organs. In a recent study using the lysine-deficient K0-SUMO, nearly 90% of K0-SUMO modification occurred on K11 in endogenous SUMO2, with almost no modification on the other lysines 1 . Since the K0-SUMO would effectively prevent a chain from being extended, this suggests that K11 may be the first linkage formed in a branched chain, or alternatively that overexpressed or mutated SUMO is more likely to be targeted to K11, or less likely to be removed from K11. With no SUMO E3 ligases known to facilitate specific types of SUMO chain linkages, it otherwise remains unknown how these observed differences in chain topology may have arisen. However, with the amount of SUMO residing in chains being globally relatively low, SUMO chains may only have niche functions in vertebrate cells and organs. With two SUMO-specific proteases specifically targeted at reducing SUMO chains 27 , it may be possible that SUMO is primarily intended to laterally SUMOylate multiple lysines in proteins, as opposed to vertically forming SUMO chains on singular lysines. In turn, this could explain the large degree of overlap observed between SUMOylated lysines identified using either the endogenous method described here, or using the lysine-deficient K0-SUMO 1,4 .

SUMOylation patterns across distinct organ types
Our findings demonstrate overlapping SUMOylation patterns between certain organ types, and suggest that shared biological processes may be commonly governed by SUMOylation.
Specifically, heart and skeletal muscle were enriched for multiple overlapping functions, which is logical considering both are a type of muscle. Moreover, similar SUMOylation patterns observed in liver and kidney are cooperative for certain biological regulations through tight regulation of neural and humoral mechanisms, while similarity of the SUMOylome across spleen and lung may reflect commonly shared immune activities between splenic lymphocytes and airway epithelia. In support of this, SUMO is generally known to regulate the immune system, and be involved in auto-immune diseases 28 .