First comprehensive proteome analysis of lysine crotonylation in seedling leaves of Nicotiana tabacum

Histone crotonylation is a new lysine acylation type of post-translational modification (PTM) enriched at active gene promoters and potential enhancers in yeast and mammalian cells. However, lysine crotonylation in nonhistone proteins and plant cells has not yet been studied. In the present study, we performed a global crotonylation proteome analysis of Nicotiana tabacum (tobacco) using high-resolution LC-MS/MS coupled with highly sensitive immune-affinity purification. A total of 2044 lysine modification sites distributed on 637 proteins were identified, representing the most abundant lysine acylation proteome reported in the plant kingdom. Similar to lysine acetylation and succinylation in plants, lysine crotonylation was related to multiple metabolism pathways, such as carbon metabolism, the citrate cycle, glycolysis, and the biosynthesis of amino acids. Importantly, 72 proteins participated in multiple processes of photosynthesis, and most of the enzymes involved in chlorophyll synthesis were modified through crotonylation. Numerous crotonylated proteins were implicated in the biosynthesis, folding, and degradation of proteins through the ubiquitin-proteasome system. Several crotonylated proteins related to chromatin organization are also discussed here. These data represent the first report of a global crotonylation proteome and provide a promising starting point for further functional research of crotonylation in nonhistone proteins.


Results
Detection of lysine-crotonylated proteins in tobacco leaves. To characterize the global crotonylation proteome of tobacco, a proteomic method based on sensitive immune-affinity purification and high-resolution LC-MS/MS was applied to identify crotonylated proteins and their modification sites in tobacco. An overview of the experimental procedures is shown in Fig. 1a. A total of 2044 lysine crotonylation sites distributed in 637 proteins were identified, representing the most abundant lysine acylation proteome reported in the plant kingdom (Table 1). MS/MS information related to these crotonylated peptides were deposited to iProX database with accession number IPX0000889000 (http://www.iprox.org). Detailed information for all identified crotonylated peptides and their corresponding proteins was shown in Supplementary Table S1, the scores for protein and peptide identification were shown in Supplementary Table S2. Among the 637 crotonylated proteins, 357 (56%) proteins contained one or two crotonylation sites, and 80 (13%) proteins had 7 or more crotonylation sites (Fig. 1b). Most peptides ranged from 7 to 28 amino acids in length, consistent with the properties of tryptic peptides (Fig. 1c). To confirm the validation of the MS data, the mass error of all identified peptides was assessed. The distribution of the mass error was near zero, and most of these proteins were less than 0.02 Da, suggesting that the mass accuracy of the MS data met the requirement (Fig. 1d).

Motifs and secondary structures of lysine crotonylated peptides.
To evaluate the nature of the crotonylated lysines in tobacco, the sequence motifs in all identified crotonylated peptides were investigated using the Motif-X programme. As shown in Supplementary Table S3, a total of nine conserved motifs were retrieved. Particularly, motifs KcrE, EKcr and KcrD (Kcr indicates the crotonylated lysine) were strikingly conserved (Fig. 2a, Supplementary Table S4). Importantly, the significantly conserved amino acids in these motifs, namely E and D, were both negatively charged, which were rarely identified in other PTMs. These motifs are likely to represent a feature of crotonylation in tobacco. Hierarchical cluster analysis was also performed to further analyse these motifs. As shown in the heat map (Fig. 2b), the enrichment of positively charged K residues was observed in the −10 to −5 and +10 to +5 positions, while negatively charged residues D and E were markedly enriched in the −4 to +4 position. Short aliphatic A residues were frequently observed in the −10 to +10 position, while the sulphur-containing C residue was not observed.
To explore the relationship between lysine crotonylation and protein secondary structures, a structural analysis of all crotonylated proteins was performed using the algorithm NetSurfP. As shown in Fig. 2c, approximately 47% of the crotonylated sites were located in α-helices, and 12% of the sites were located in β-strands. The remaining 42% of the crotonylated sites were located in disordered coils. However, considering the similarity of the distribution pattern between crotonylated lysines and all lysines, there was no tendency towards lysine crotonylation in tobacco. The surface accessibility of the crotonylated lysine sites was also evaluated. The results showed that 91% of the crotonylated lysine sites were exposed to the protein surface, close to that of all lysine residues (Fig. 2d). Therefore, lysine crotonylation likely does not affect the surface properties of modified proteins.

Functional annotation and subcellular localization of crotonylated proteins.
To obtain an overview of the crotonylated proteins in tobacco, the Gene Ontology (GO) functional classification of all crotonylated proteins based on their biological processes, molecular functions and subcellular locations was investigated (Supplementary Table S5, Supplementary Table S6). Within the biological processes category, the majority of crotonylated proteins were related to metabolic processes, cellular processes, and single-organism processes, respectively accounting for 36, 27 and 24% of all the crotonylated proteins (Fig. 3a). For the molecular function category, 45 and 40% of the crotonylated proteins were associated with catalytic activity and binding functions, respectively ( Fig. 3b). Subcellular localization analysis revealed that most of the crotonylated proteins were localized to the chloroplast (37%), cytosol (30%), nucleus (12%), and mitochondria (5%) (Fig. 3c).

Functional enrichment analysis.
To better understand the biological function of these crotonylated proteins, we performed an enrichment analysis of the GO (Supplementary Table S7 Table S9). The enrichment analysis of the cellular components revealed that the crotonylated proteins were significantly enriched in the proteasome complex, thylakoid membrane, and photosystem II oxygen evolving complex (Fig. 4a). Based on the enrichment results of the molecular function category, most crotonylated proteins were related to NAD binding, threonine-type peptidase activity, endopeptidase activity, and calcium ion binding (Fig. 4a). In the biological processes category, most of the crotonylated proteins were implicated in oxoacid metabolic processes, protein catabolic processes, cellular amino acid metabolic processes, protein folding, ubiquitin-dependent protein catabolic processes, and photosynthesis (Fig. 4a). The KEGG pathway enrichment   analysis showed that a majority of the crotonylated proteins were related to carbon metabolism, carbon fixation in photosynthetic organisms, pyruvate metabolism, proteasome, amino acid biosynthesis, the citrate cycle, glycolysis, porphyrin and chlorophyll metabolism, and photosynthesis ( Fig. 4b). Consistent with these observations, Pfam domains, including the NAD(P)-binding domain, ATPase core domain, chlorophyll a/b binding protein domain, aldolase-type TIM barrel, and thioredoxin domain, were significantly enriched in crotonylated proteins ( Fig. 4c), implying an important role for lysine crotonylation in these processes.

Protein interaction network of the crotonylated proteins in tobacco.
To further identify the cellular processes regulated through crotonylation in tobacco, the crotonylated protein interaction network was established using an algorithm in Cytoscape software. A total of 264 acetylated proteins were mapped to the protein interaction database (Supplementary Table S10), presenting a global view of the diverse cellular functions of crotonylated proteins in tobacco. As shown in Fig. 5, crotonylated protein involved in ribosome, proteasome, carbon metabolism, oxidative phosphorylation, and terpenoid backbone biosynthesis were retrieved, comprising a dense protein interaction network. The physiological interactions among these crotonylated protein complexes likely contribute to their cooperation and coordination in tobacco.

Discussion
Histone crotonylation is a new lysine acylation type of PTM enriched at active gene promoters and potential enhancers in mammalian cells 25 . Crotonylation is catalysed through histone acetyltransferase p300/CBP 28 , 'read' The numbers in X axes represent the value of significant analysis. When the value is greater than 1.3, the p value is less than 0.05, which means the data is statistically significant.
by YEATS2 and AF9, 'erased' by Sirtuin family members SIRT1-3 in yeast and mammals 29,[43][44][45][46] . However, the lysine crotonylation of nonhistone proteins and in plant cells has not yet been studied. To determine whether lysine crotonylation also exists in plants and to study its function in cellular processes, a global crotonylation tobacco proteome was realized using high-resolution LC-MS/MS coupled with highly sensitive immune-affinity purification. A total of 2044 lysine crotonylation sites distributed in 637 proteins were identified, representing the most abundant lysine acylation proteome reported in the plant kingdom. These crotonylated proteins were associated with diverse biological processes, including multiple metabolic pathways, chromatin organization, protein biosynthesis, folding, and degradation. The protein interaction network analysis also suggested that a wide range of interactions involved in these biological processes was likely modulated through protein crotonylation. Carbon is one of the most important macroelements, providing the backbone for biological macromolecules. Lysine acetylation and succinylation in plants have been implicated in carbon metabolism, glycolysis, pyruvate metabolism, TCA cycle, pentose phosphate pathway, glyoxylate and dicarboxylate metabolism 32,33,37,40 . The results of the present study showed that numerous enzymes in these metabolism pathways were also modified through crotonylation. In plants, one of the most important metabolic processes is photosynthesis. In the present study, there are 236 crotonylated proteins were localized to the chloroplast. Among these proteins, a total of 72 proteins were involved in photosynthesis processes. For example, 10, 14, 4, 8, 2, 9, and 25 proteins, identified as members of antenna proteins, photosystems II complex, cytochrome b6f complex, photosystems I complex, ferredoxin-NADP reductase, ATP synthesis complex, and the carbon fixation pathway, respectively. Significantly, 73% (8/11) enzymes in the Calvin cycle 42 were extensively crotonylated at multiple sites, with an average of 10.   Table 3. Crotonylated proteins involved in protein biosynthesis, folding, Ubiquitin-dependent degradation.
For example, ribulose-1,5-bisphosphate carboxylase/oxygenase (Rubisco), the key carbon fixation enzyme, was crotonylated at 15 amino acid sites. The key amino acid residues of Rubisco, K201 and K334 which were identified as acetylated resulting in the downregulation of Rubisco activity 47 , also modified through crotonylation. This result suggested that crotonylation might change Rubisco activity in coordination with acetylation. Moreover, the two Rubisco activase isoforms 48 , involved in the light activation of Rubisco, were also crotonylated at 24 sites. Moreover, 67% (10/15) of the enzymes involved in chlorophyll synthesis 49 were also modified through crotonylation. To our knowledge, until recently, there have been no reports of lysine acylation in chlorophyll metabolism. These results suggested that lysine crotonylation might play a role in regulating carbon metabolism and photosynthesis. Proteins are macromolecules that, in addition to carbohydrates, perform a vast array of functions within organisms. Proteins comprise amino acids and are synthesized through translation. In plants, proteins can be degraded in two ways -proteolysis in the vacuole or via the ubiquitin-proteasome system. The data in the present study revealed that lysine crotonylation was related to the synthesis and degradation of multiple amino acids, such as lysine, valine, leucine and isoleucine. The ribosome serves as the factory of protein synthesis. In the present study, we identified 47 crotonylated proteins associated with translation, including 29 ribosome subunits, 8 translation initiation factors, 7 elongation factors, and 3 aminoacyl-tRNA synthetases. After synthesis in the ribosome, the polypeptide chain rapidly folds into its characteristic and functional three-dimensional structure from a random coil. This process is accomplished through the assistance of chaperones, such as the ER-resident molecular chaperone BiP, the HSP70 family, and the HSP90 family [50][51][52][53][54][55] . The data in the present study showed that lysine residues in members of HSP70 and HSP90 were extensively crotonylated in tobacco. Moreover, Bip 4 and Bip 5 were also extensively modified through crotonylation, suggesting an important role for lysine crotonylation in protein folding. If several rounds of chaperone-assisted folding are futile, unfolded or misfolded proteins are recognized and targeted by ubiquitin and subsequently degraded by proteasomes 56,57 . In the present study, we found ubiquitin, ubiquitin extension protein, ubiquitin-conjugating enzyme, and ubiquitin-activating enzyme, are all modified through crotonylation. Furthermore, 14 proteasome subunits were also crotonylated. These results indicated the likely involvement of lysine crotonylation in regulating protein synthesis, folding, and ubiquitin-dependent degradation.
The organization of the eukaryotic genome into nucleosomes dramatically impacts the regulation of gene expression. The structure of the nucleosome core is relatively invariant in eukaryotic organisms, and includes a 147-bp segment of DNA and two copies of each of the four core histone proteins 58 . Histone chaperone nucleosome assembly protein 1 (Nap1) has been implicated in nucleosome assembly by eliminating competing, nonnucleosomal histone-DNA interactions 59 . The data presented here showed that tobacco histones H1, H2A, H2B, H3, and H4, and nucleosome assembly proteins Nap1;2, Nap1;3, and Nap1;4, were modified through crotonylation, indicating a potential role for lysine crotonylation in nucleosome assembly or disassembly. As complementary evidence, topoisomerase I, required for efficient nucleosome disassembly at gene promoter regions 60 , was also crotonylated in the present study. Nucleosomes are folded through a series of higher-order structures to eventually form a chromosome. An important factor in higher-order organization is the nuclear matrix, which serves as a scaffold for loops of chromatin 61 . Nuclear matrix has been proposed to play a role in regulating transcription, DNA replication, and RNA processing 62 . Chromosomal DNA was anchored to nuclear matrix by its matrix-associated regions (MARs), bound by matrix attachment region-binding protein 63 . Histone acetyltransferase (HAT) p300 and deacetylase SIRT1 interacts with matrix attachment region-binding protein SAF-A and SATB1, respectively, and thereby regulates gene expression 64,65 . Surprisingly, in the present study, a matrix attachment region binding filament-like protein (MFP1) was identified as crotonylated at 20 amino acid sites, and even its homologue was also crotonylated at 8 amino acid sites. MFP1 is a conserved nuclear and chloroplast DNA-binding protein in plants; however, its physiological function is not understood [66][67][68] . Considering that p300 and SIRT1 possess crotonylation and decrotonylation activities, respectively, in animals 25,28,29 , it is an interesting assumption that the crotonylated or decrotonylated form of MFP1 was also associated with the regulation of gene expression. In addition to these crotonylated protein that might be associated with the assembly of nucleosome and chromatin, we identified a G-strand-specific single-stranded telomere-binding protein (GTBP), associated with maintaining telomere stability, also modified through crotonyl groups 69,70 . These results indicated the likely involvement of lysine crotonylation in chromatin organization and gene regulation at least in tobacco.
In summary, the present study provided the first global lysine crotonylation proteome in tobacco. These data revealed lots of crotonylated proteins associated with diverse aspects of cellular process, particularly carbon metabolism, photosynthesis, protein biosynthesis, folding, degradation, and chromatin organization. These finding raised some questions that if the crotonylation of these proteins are related to biological functions and that if crotonylation changes in different situations. All these questions should be addressed in the future work. Nevertheless, the results presented here may provide a promising starting point for further functional research of crotonylation in nonhistone proteins.

Materials and Methods
Plant materials and growth conditions. Tobacco were grown in a greenhouse at 25 °C and a photoperiod of 16/8 h (light/dark). The leaves were excised from 4-week-old seedlings with three biological replicates and immediately used for protein extraction.
Protein Extraction. The samples were grinded to powder in liquid nitrogen, and subsequently mixed with extraction buffer (8 M urea, 2 mM EDTA, 3 μM TSA, 50 mM NAM, 10 mM DTT and 1% Protease Inhibitor Cocktail, Millipore). The remaining debris was removed through centrifugation at 20,000 g for 10 min at 4 °C. Finally, the proteins were precipitated using cold 15% TCA for 2 h at −20 °C. After centrifugation at 4 °C for 10 min, the supernatant was discarded. The remaining precipitate was washed three times with cold acetone. The protein was redissolved in buffer (8 M urea, 100 mM NH 4 CO 3 , pH 8.0) and the protein concentration was determined using the 2-D Quant kit (GE Healthcare) according to the manufacturer's instructions.
Trypsin Digestion. For digestion, the protein solution was reduced with 10 mM DTT for 1 h at 37 °C and alkylated with 20 mM IAA for 45 min at room temperature in darkness. For trypsin digestion, the protein sample was diluted after adding 100 mM NH 4 CO 3 to a urea concentration of less than 2 M. Finally, trypsin was added at 1:50 trypsin-to-protein mass ratio for the first digestion overnight and a 1:100 trypsin-to-protein mass ratio for a second 4-h digestion.
HPLC Fractionation. The sample was subsequently fractionated through high pH reverse-phase HPLC using an Agilent 300 Extend C18 column (5 μm particles, 4.6 mm ID, 250 mm length). Briefly, the peptides were separated into 80 fractions using a gradient of 2% to 60% acetonitrile in 10 mM ammonium bicarbonate, pH 10, over 80 min. Subsequently, the peptides were combined into 6 fractions and dried using vacuum centrifugation. Affinity Enrichment. To enrich Kcro peptides, tryptic peptides dissolved in NETN buffer (100 mM NaCl, 1 mM EDTA, 50 mM Tris-HCl, and 0.5% NP-40, pH 8.0) were incubated with pre-washed antibody beads (PTM Biolabs) at 4 °C overnight with gentle shaking. The beads were washed four times with NETN buffer and twice with ddH 2 O. The bound peptides were eluted from the beads using 0.1% TFA. The eluted fractions were combined and vacuum-dried. The resulting peptides were cleaned with C18 ZipTips (Millipore) according to the manufacturer's instructions, followed by LC-MS/MS analysis.
Quantitative Proteomic Analysis by LC-MS/MS. The peptides were dissolved in 0.1% FA and directly loaded onto a reversed-phase pre-column (Acclaim PepMap 100, Thermo Scientific). Peptide separation was performed using a reversed-phase analytical column (Acclaim PepMap RSLC, Thermo Scientific). The gradient comprised an increase from 6% to 22% solvent B (0.1% FA in 98% ACN) for 24 min, 22% to 40% for 8 min and climbing to 80% in 5 min, subsequently holding at 80% for the last 3 min, all at a constant flow rate of 300 nl/min on an EASY-nLC 1000 UPLC system, the resulting peptides were analysed using the Q Exactive TM Plus hybrid quadrupole-Orbitrap mass spectrometer (ThermoFisher Scientific). The peptides were subjected to NSI source followed by tandem mass spectrometry (MS/MS) in Q Exactive TM plus (Thermo) coupled online to the UPLC. Intact peptides were detected in the Orbitrap at a resolution of 70,000. The peptides were selected for MS/MS using NCE setting as 30; ion fragments were detected using Orbitrap at a resolution of 17,500. A data-dependent procedure that alternated between one MS scan followed by 20 MS/MS scans was applied for the top 20 precursor ions above a threshold ion count of 5E3 in the MS survey scan with 15.0 s dynamic exclusion. The electrospray voltage applied was 2.0 kV. Automatic gain control (AGC) was used to prevent overfilling of the Orbitrap; 5E4 ions were accumulated for generation of MS/MS spectra. For MS scans, the m/z scan range was 350 to 1800. Fixed first mass was set as 100 m/z. Database Search. The resulting MS/MS data was processed using MaxQuant with integrated Andromeda search engine (v.1.5.1.8). Tandem mass spectra were searched against UniProt tobacco database concatenated with reverse decoy database. Trypsin/P was specified as cleavage enzyme allowing up to 4 missing cleavages, 5 modifications per peptide and 5 charges. Mass error was set to 10 ppm for precursor ions and 0.02 Da for fragment ions. Carbamidomethylation on Cys was specified as fixed modification and oxidation on Met, crotonylation on Lys and crotonylation on protein N-terminal were specified as variable modifications. False discovery rate (FDR) thresholds for protein, peptide and modification sites were specified at 1%. Minimum peptide length was set at 7. All the other parameters in MaxQuant were set to default values. The site localization probability was set as >0.75.

Bioinformatics Methods.
Motif-X software (http://motif-x.med.harvard.edu/) was used to analyse the model of sequences constituted with amino acids in specific positions of acetyl-21-mers (10 amino acids upstream and downstream of the site) in all protein sequences 71 . For further hierarchical clustering based on categories, all the acetylation substance categories obtained after enrichment were first collated along with their p-values, and subsequently filtered for those categories at least enriched in one of the clusters with a p-value < 0.05. This filtered p-value matrix was transformed by the function x = −log (p-value), and the x values for each category were z-transformed. These z scores were subsequently clustered using one-way hierarchical clustering (Euclidean distance, average linkage clustering) in the Genesis programme. The cluster membership was visualized using a heat map through the "heatmap.2" function in the "gplot2" R-package. Secondary structures were predicted using NetSurfP. Gene Ontology (GO) annotation proteome was derived from the UniProt-GOA database (http:// www.ebi.ac.uk/GOA/). The proteins were classified using Gene Ontology annotation based on three categories: biological process, cellular component and molecular function. The protein subcellular localization was analysed using Wolfpsort (http://www.genscript.com/wolf-psort.html). The KEGG was used to annotate protein pathways. GO term, protein domain, and KEGG pathway enrichment were performed using the DAVID bioinformatics resources 6.7. Fisher's exact test was used to examine the enrichment or depletion (two-tailed test) of specific annotation terms among members of resulting protein clusters. Correction for multiple hypothesis testing was performed using standard false discovery rate control methods. Any terms with adjusted p-values below 0.05 in any of the clusters were treated as significant. The Search Tool for Retrieval of Interacting Genes/Proteins (STRING) database (http://string-db.org/) was used for PPI analysis. Cytoscape (version 3.0) software was used to display the network 72 .