Global chemical modifications comparison of human plasma proteome from two different age groups

In this study, two groups of human plasma proteome at different age groups (old and young) were used to perform a comparison of global chemical modifications, as determined by tandem mass spectrometry (MS/MS) combined with non-limiting modification identification algorithms. The sulfhydryl in the cysteine A total of 4 molecular modifications were found to have significant differences passing random grouping tests: the succinylation and phosphorylation modification of cysteine (Cys, C) and the modification of lysine (Lys, K) with threonine (Thr, T) were significantly higher in the old group than in the young group, while the carbamylation of lysine was lower in the young group. We speculate that there is an increase in certain modified proteins in the blood of the old people which, in turn, changes the function of those proteins. This change may be one of the reasons why old people are more likely than young people to be at risk for age-related diseases, such as metabolic diseases, cerebral and cardiovascular diseases, and cancer.

Scientific RepoRtS | (2020) 10:14998 | https://doi.org/10.1038/s41598-020-72196-z www.nature.com/scientificreports/ acid substitution refers to the change in protein properties and functions caused by the substitution of amino acids in the protein side-chain by other kinds of amino acids. These changes are the types of modifications that significantly affect protein function. Mass spectrometry can not only realize the acquisition of large-scale data and in-depth mining but also achieve the accurate determination of specific protein targeted modifications. With the continuous development of scientific instruments, ultrahigh-resolution and tandem mass spectrometry (MS/MS) provide more abundant information or data for proteomics and chemical modification research, which also facilitates the accurate identification of chemical modification sites in the protein chain. In the process of proteomic data analysis, it is usually necessary to search and compare the proteomic database of species, and high-resolution tandem mass spectrometry is the primary method for obtaining a large amount of protein modification information. When using the search engine to search the database, known types of protein chemical modification were usually set. This type of search method is called a restricted search, but it is difficult to identify a new type of modification with a type in the product that is unknown 7 . Therefore, comprehensive and non-limiting modification identification plays an important role in understanding all the chemical modification information contained in the sample proteome. Open-pFind is an open sequence library search algorithm that integrates the UniMod database, analyses, and processes the collected mass spectrum data through an open search to obtain the global chemical modification information of samples [8][9][10] .
Plasma is an important part of the internal environment and homeostasis, and it plays a role in transporting the substances needed for maintaining the life activities and the wastes generated off the body. Proteins are rich in variety and content in the plasma, and the components are easily affected by metabolism, physiology, and pathology. All metabolites or wastes in cells and tissues are transported and exchanged through the blood; the study of plasma proteomics may reflect the physiological state of the body at a specific stage. Chemical modification levels are another important research area of plasma proteomics. A comprehensive comparison of the changes in plasma proteome chemical modification levels will provide multiple dimensions of information for the study of physiological changes in the body 11 . At present, there are more than 1,500 kinds of chemical modifications in the UniMod 12 , PSI-MOD 13 , and RESID databases. There are many kinds of chemical modifications in the human plasma proteome, such as N-terminal acetylation, phosphorylation in the side chain, methylation, glycosylation, ubiquitination, and disulfide bonds between two chains. The plasma proteome can reflect the nuances defining the differences between age and ageing 14 . A study of the proteome changes in another body fluid, urine, demonstrated that this fluid can be analyzed to elucidate body ageing 15 . Even the common physiological process of hunger can be analyzed to characterize the urine proteome 16 . However, the comparison of the global proteome chemical modifications in two kinds of body fluids (plasma and urine) showed the differences in modifications between different types of samples 17 . Based on the importance of chemical modifications in the human plasma proteome, this study attempted to compare the differences in the global chemical modification levels of the plasma proteome at two different age groups, as determined by high-resolution tandem mass spectrometry combined with non-limiting modification identification (Open-pFind).

Results
Identification of total protein by using the bottom-up proteomics technology. In the labelfree proteomic analysis, 20 (10/10) samples were analyzed by LC-MS/MS. After retrieving data (.raw) based on pFind studio, the analysis results can be browsed and exported in pBuild studio. The 120-min liquid chromatogram gradient was analyzed, and 628-781 kinds of proteins (average 724) and 7,526-11,464 kinds of peptides (average 10,238) were identified in the plasma samples without high-abundance protein removal. About the raw data of pFind, we had submitted to iProX Datasets (https ://www.iprox .org/page/HMV00 6.html) under the Project ID: IPX0002313000. And the results of pFind were showed in Supplementary Table 1.
Comparison of post-translational modifications of the plasma proteome between the young and old groups. A total of 1,169 modifications (including low abundance modifications, that is, each modification is identified at least once) were identified in the plasma samples of the old group, and 1,154 modifications (including low abundance modifications) were identified in the plasma samples of the young group. Eighty-eight percent of the modifications are of the same type, and the remaining 12% are related (Fig. 1).
Among the 163 unique protein chemical modifications in the two groups, 158 of them had less than 50% repeatability in each group, and only 5 of them met the conditions of more than 50% reproducibility. Non-low-abundance chemical modifications (i.e., each modification is identified 10 times or more in the sample) are counted, and the identification coverage in each group is required to be greater than 50%. There are 120 kinds of chemical modifications in 1,080 kinds that meet this condition. Unsupervised cluster analysis can roughly distinguish young group and old group samples, but 40% of young group and old group samples are clustered into one group, which is not classified as being the same as other samples in the group. Figure 2 shows the unsupervised clustering results of specific samples. The screening conditions for different chemical modifications are as follows: the p-value between two groups is less than 0.05, and the mean value of the number of chemical modifications of each sample in the group is calculated by the normalization of the total number of chemical modification spectra identified. There are four modifications with a multiple of change greater than 1.5 between groups: 2-succinyl [ less than 0.05, fold change greater than 1.5) were performed by random grouping tests to verify the false-positive rate of each modification. Ten young group datasets and 10 old group datasets were randomly divided into two groups with 10 samples in each group and 92,378 different combinations ( 1 2 c 10 20 ). Each combination was statistically analyzed by the same difference screening conditions. After extensive calculation and statistical analysis, four different modified random combinations were obtained. See Table 2 for details.
Through the random grouping test, it is found that the randomness of the four kinds of different modifications is approximately 5%, and the reliability is more than 90%. It is shown that the difference between the four kinds of modifications in different age groups is less likely to be generated at random. About the random grouping test details, the Supplementary 1-2 will show.

Discussion
The entire life cycle of proteins, extending from translation assembly to final degradation, involves many toxic environments. These environments modify protein molecules to change the structure and function of the protein. This process is also called molecular ageing 18 . These reactions are mainly caused by the non-enzymatic binding of active small molecules on proteins with functional groups. The modification of proteins can be divided into reversible and irreversible modifications. Through a non-limiting modification search, several significantly different protein chemical modifications were found in the plasma of the two different age groups. These modifications include succinamide modification of cysteine, the substitution of lysine residues by threonine residues, phosphorylation of cysteine, the substitution of proline residues by tryptophan residues, phosphorylation of serine, N-terminal modification of succinamide, dehydration of serine, the substitution of glutamic acid residues by aspartic acid residues, and carbamylation of lysine. Among these changes, succinylation, phosphorylation of cysteine, and L-lysine replacement with threonine were significantly higher in the older group than in the younger group, and carbamylation of lysine was higher in the younger group than in the older group.
Cysteine is one of the amino acids with a lower abundance in the protein. The statistics of protein residues show that the average frequency of cysteine in eukaryotes is approximately 2%, and 70% of the reduced sulfhydryl source protein is present in the in vivo environment 19 . Studies have shown that the abundance of cysteine in protein is affected by the function of the protein. The high activity of the sulfhydryl group makes facilitates its participation in many chemical modifications. In contrast, this group also determines the distribution and topological properties of cysteine residues in protein structure. The sulfhydryl group is also an important group to form a disulfide bond and maintain protein structure 20 . Table 1. Differential molecular modification information of plasma proteome between the old and young group. The number is the database entry number, from https ://www.unimo d.org/; AA means amino acid.  The succinylation and phosphorylation of cysteine in this study involve the participation of the sulfhydryl group, which may affect the formation of the disulfide bond. Fumaric acid was added to the dissociative sulfhydryl sites of some Cys residues in proteins by a Michael addition reaction to form S-(2-succinic acid) cysteine 21 . This modification was initially detected in plasma proteins (including albumin) and formed by irreversible reactions. This modification has been reported in diabetes, obesity, fumaric acid hydratase-related diseases, and the model of RIE's syndrome. At the same time, it was also found that the content of succinic acid protein increased in mouse 3T3-L1 adipocytes cultured in high glucose medium (30 mm, while the physiological level was 5 mm), as well as in rats treated with streptozotocin [21][22][23][24] . It has been reported that an excess of nutrients (sugars) will lead to an increase in ATP: ADP, NADH: NAD + and mitochondrial membrane potential, while  www.nature.com/scientificreports/ an increase in NADH: NAD + will inhibit oxidative phosphorylation, resulting in the continuous accumulation of mitochondrial intermediates (including fumaric acid), leading to an increase in protein succinylation 21 . The accumulation of succinate protein is also caused by a decrease in fumarate hydratase activity 25,26 . Fumaric acid hydratase catalyzes the reversal of fumaric acid to malic acid in the tricarboxylic acid cycle. It is important to note that loss of function and mutations in fumaric hydratase is known to predispose affected individuals to multiple skin diseases and uterine leiomyomas, as well as hereditary leiomyomas and renal cell carcinoma (HLRCC) 19 .
Although the exact role of succinic acid-modified cysteine and other related proteins has not been fully elucidated, it is related to the cancer response related to fumarate hydratase [27][28][29] . The increase in protein succinic acid was also described in the brain stem of ndufs4 knockout mice (a model of Leigh syndrome) 30 , indicating that this type of protein chemical modification has a potential role in the pathogenesis of this mitochondrial disease. Park et al. found that in the detected protein succinylation sites, 16 succinylation sites appeared in the cofactor binding area or enzyme catalytic area, and 74 succinylation sites existed around the enzyme active site 31 . Baynes et al. found that cysteine succinyl modification in human skin collagen increased with age 22,32,33 . Cysteine phosphorylation was recently found in prokaryotic and eukaryotic systems and is believed to play a key role in signal transduction and regulation of cellular responses. Due to the low chemical stability of thiophosphates in peptides, the in vivo phosphorylation of cysteine side chains is rarely studied 33,34 . Phosphorylation of cysteine is an important function of cysteine-dependent protein phosphatase (CDP), which belongs to a subfamily of protein tyrosine phosphatases (PTPs) and catalyzes the hydrolysis of phosphate ester bonds through the formation of phosphate cysteine intermediates. Studies have shown that this reversible post-translational regulation (PTM) is crucial in regulating the expression of virulence determinants and bacterial resistance to antibiotics. Besides, it has also been shown that phosphorylation of cysteine, previously considered a rare modification, may be more common in nature and may play an important role in the biological regulation of various organisms [35][36][37] .
The replacement of lysine with threonine changed the acid-base properties of the protein, which affected the activity and function of the protein. It has long been observed that the substitution of lysine residues with threonine residues on hemoglobin will reduce its oxygen affinity 38 . Sun WY et al. found that the mutation of nucleotide 20,040 in exon 14, with Thr replacing Lys at amino acid 556, would reduce the pro-coagulative activity of prothrombin by 50% 39 . The decrease in oxygen affinity, including some other recessions belonging to coagulation activity and other physiological conditions, also reflects the slow metabolism of the body, which may be the earlier manifestation of ageing. Also, it was found that human and mouse embryonic stem cells need specific amino acids to proliferate. MES cells need threonine (Thr) metabolism to complete epigenetic histone modification. Thr is converted to glycine and acetyl coenzyme A, and glycine metabolism specifically regulates the trimethylation of lysine (Lys) residues in histone H3 (H3K4me3) 40 . Besides, we also found that the modification of L-lysine carbamylation in the old group was lower than that in the young group. Carbamylation is an irreversible non-enzyme modification process. The process is the side chain reaction between the decomposition product of urea and the N-terminus of protein or lysine residue, which was previously reported to be related to the ageing of proteins 41 . L-lysine carbamylation can promote the coordination interaction of metal ions to specific enzyme activities. Some studies have pointed out that the amount of carbamylation in the plasma of patients with increased urea levels (such as nephrotic patients) is significantly increased 42 .
Ageing is an inevitable and spontaneous process undergone by the organism over time. Ageing is a complex natural phenomenon that is manifested by the degeneration of structure, the decline of function, and the recession of adaptability and resistance. Ageing is one of the largest known risk factors for most human diseases: approximately two-thirds of the world's 150,000 people die every day from ageing-related causes. At present, the cause of ageing has not been determined. The current mainstream theory explaining ageing is damage theory, and DNA damage is considered as the common foundation of cancer and ageing. Some people think that the internal cause of DNA damage is the most important driving force of ageing. According to the waste accumulation theory, the accumulation of waste in cells may interfere with metabolism. For example, a waste called lipofuscin is formed by a complex reaction of fat and protein combining in cells. These wastes accumulate in cells in the form of small particles, and their size will increase with age. Plasma protein can reflect changes in the body during the process of ageing. At the same time, the accumulation of biological macromolecules whose structure is destroyed or even inactivated may lead to the gradual failure of the biological body and system, which is also considered to be the concept of ageing.
Through the research described above, we found several types of chemical modifications and replacement of proteins in different age groups. These chemical modifications change the structure and properties of proteins and then affect the function of proteins. It is speculated that the gradual accumulation of some kinds of harmful and irreversible protein modifications in the plasma of the elderly may reflect the ageing process of the body. The accumulation of harmful protein modifications may be one of the reasons why the old are more likely than the young to suffer from ageing-related diseases, such as metabolic diseases, cardiovascular and cerebrovascular diseases, and tumor risks.

Materials and methods
Sample collection. The plasma samples of 20 candidates who had medical examinations with passing medical tests were collected from the clinical laboratory of Beijing Hospital, and all of the samples had been discarded from the clinical laboratory. All candidates fully understood and signed the informed consent. The samples were divided into two groups according to age: a young group and an old group. The samples were randomly selected from the clinical laboratory samples, and there were no restrictions or requirements on the diet, drugs, and other factors of the blood sampling donors. The study was approved by the Beijing Hospital and the ethics committee of Beijing Normal University and all experiments were performed under relevant guidelines and regulations. This experiment provided volunteers with detailed information about the study, including Scientific RepoRtS | (2020) 10:14998 | https://doi.org/10.1038/s41598-020-72196-z www.nature.com/scientificreports/ its purpose, method, and process, and kept the personal information of volunteers strictly confidential. Only age and gender information are mentioned for discarded samples from the laboratory. Table 3 shows the sample information. Raw data and results of pFind were submitted to iProX Datasets (https ://www.iprox .org/page/ HMV00 6.html) under the Project ID: IPX0002313000.
Protein sample preparation and trypsin enzymolysis. The plasma was centrifuged after whole blood anticoagulant treatment. The plasma sample (n = 20) was diluted 40 times with Milli-Q water, and then 100 μL was taken for subsequent experiments. A 20 mmol/L dithiothreitol (DTT) was used to react with the sample at 37 °C for 1 h to denature the disulfide bond in the protein structure, and then 55 mmol/L iodoacetamide (IAM) was added and reacted in the dark for 30 min to alkylate the disulfide bond sites. The supernatant was precipitated with three volumes of precooled acetone at − 20 °C for 2 h and then centrifuged at 4 °C and 12,000×g for 30 min to obtain protein precipitation. The precipitate was then resuspended in an appropriate amount of protein lysis buffer solution (8 mol/L urea, 2 mol/L thiourea, 25 mmol/L DTT, and 50 mmol/L Tris). The concentration of protein extract was measured by Bradford analysis. Using filter-assisted sample preparation (FASP), trypsin gold (mass spec grade, Promega, Fitchburg, WI, USA) was used to hydrolyze 100 μg protein at a ratio of 50:1 for each sample. The dried peptides were sealed at -80 °C after drying by a vacuum centrifugal concentrator.

Liquid chromatography-tandem mass spectrometry (LC-MS/MS) analysis.
Before analysis, the dried polypeptide samples were dissolved in 0.1% formic acid solution, and the final concentration was controlled at 0.1 μg/μL. Each sample was analyzed according to 1 μg polypeptide quality: a Thermo Easy-nlc1200 chromatographic system was loaded on the precolumn and the analytical column. Proteomic data were collected by the Orbitrap Fusion Lumos mass spectrometry system (Thermo Fisher Scientific Bremen, Germany). nese Academy of Sciences) was used to analyze the LC-MS/MS data with label-free quantification. The target retrieval database is from the Homo Sapiens database downloaded from UniProt (updated to October 2018). Raw files generated by the Orbitrap Fusion were searched directly using a ± 20-ppm precursor mass tolerance and a ± 20-ppm fragment mass tolerance. At the time of retrieval, the instrument type is HCD-FTMS, the full specificity of an enzyme is trypsin, and there are at most two missing sites, peptides with at least six amino acids were retained. Open-search is selected. Screening conditions: the FDRs were estimated by the program from the number and quality of spectral matches to the decoy database; for all data sets, the FDRs at spectrum, peptide, and protein level were < 1%, and the Q-value at the protein level is less than 1%. Data are analyzed using both forward and reverse database retrieval strategies.