Introduction

The biological processes of living things/organisms can perform and operate in mutual coordination and high-efficiency cooperation simultaneously, relying on the synthesis, catalysis, and regulatory reactions in which proteins are involved. Proteins are biological macromolecule with complex structures, and the difference in the high-level structures of proteins determines their different biochemical activities, which can perform specific functions through a higher-level network formed by the combination of proteins1. Chemical modification of proteins refers to the covalent group reaction of amino acid residues or chain ends. In general, a few changes in chemical structure do not affect the biological activity of proteins, which are called modification of nonessential parts of the protein. However, in most cases, changes in molecular structure will significantly change the physical and chemical properties of the protein, change the conformation of the protein, and make the activity of the protein vary, and then the function that it performs is changed2. Therefore, even if there is no change in the protein content level, but some small changes in the level of chemical modification, the function of the proteins will change significantly, which means that the chemical modification of the proteins enriches the different functions of the protein in another dimension. The effects of chemical modification of proteins on the functions of proteins are mainly shown in the following three aspects: (1) Even if a modification occurs, the functions of proteins will be affected. (2) The same type or different type of modifications of the different amino acids have different effects on the same protein’s function. (3) The same protein may undergo many types of chemical modification, which makes the biological process in which it participates a more variable and complex process3. The main types of chemical modifications related to proteins are as follows: (1) Post-translational modifications (PTMs) refer to the chemical modification of a protein after translation4. The precursor of post-translationally modified protein often has no biological activity. Post-translational modification of a specific modified enzyme is usually required for it to become a functional mature protein and perform its specific biological functions5,6. (2) Chemical-derived modification of proteins refers to a kind of modification that introduces new groups or removes original/intrinsic groups in the protein side-chain, usually including spontaneous non-enzymatic modifications, as well as the modifications introduced by cross-linking agents and artificial agents. (3) Amino acid substitution refers to the change in protein properties and functions caused by the substitution of amino acids in the protein side-chain by other kinds of amino acids. These changes are the types of modifications that significantly affect protein function.

Mass spectrometry can not only realize the acquisition of large-scale data and in-depth mining but also achieve the accurate determination of specific protein targeted modifications. With the continuous development of scientific instruments, ultrahigh-resolution and tandem mass spectrometry (MS/MS) provide more abundant information or data for proteomics and chemical modification research, which also facilitates the accurate identification of chemical modification sites in the protein chain. In the process of proteomic data analysis, it is usually necessary to search and compare the proteomic database of species, and high-resolution tandem mass spectrometry is the primary method for obtaining a large amount of protein modification information. When using the search engine to search the database, known types of protein chemical modification were usually set. This type of search method is called a restricted search, but it is difficult to identify a new type of modification with a type in the product that is unknown7. Therefore, comprehensive and non-limiting modification identification plays an important role in understanding all the chemical modification information contained in the sample proteome. Open-pFind is an open sequence library search algorithm that integrates the UniMod database, analyses, and processes the collected mass spectrum data through an open search to obtain the global chemical modification information of samples8,9,10.

Plasma is an important part of the internal environment and homeostasis, and it plays a role in transporting the substances needed for maintaining the life activities and the wastes generated off the body. Proteins are rich in variety and content in the plasma, and the components are easily affected by metabolism, physiology, and pathology. All metabolites or wastes in cells and tissues are transported and exchanged through the blood; the study of plasma proteomics may reflect the physiological state of the body at a specific stage. Chemical modification levels are another important research area of plasma proteomics. A comprehensive comparison of the changes in plasma proteome chemical modification levels will provide multiple dimensions of information for the study of physiological changes in the body11. At present, there are more than 1,500 kinds of chemical modifications in the UniMod12, PSI-MOD13, and RESID databases. There are many kinds of chemical modifications in the human plasma proteome, such as N-terminal acetylation, phosphorylation in the side chain, methylation, glycosylation, ubiquitination, and disulfide bonds between two chains. The plasma proteome can reflect the nuances defining the differences between age and ageing14. A study of the proteome changes in another body fluid, urine, demonstrated that this fluid can be analyzed to elucidate body ageing15. Even the common physiological process of hunger can be analyzed to characterize the urine proteome16. However, the comparison of the global proteome chemical modifications in two kinds of body fluids (plasma and urine) showed the differences in modifications between different types of samples17. Based on the importance of chemical modifications in the human plasma proteome, this study attempted to compare the differences in the global chemical modification levels of the plasma proteome at two different age groups, as determined by high-resolution tandem mass spectrometry combined with non-limiting modification identification (Open-pFind).

Results

Identification of total protein by using the bottom-up proteomics technology

In the label-free proteomic analysis, 20 (10/10) samples were analyzed by LC–MS/MS. After retrieving data (.raw) based on pFind studio, the analysis results can be browsed and exported in pBuild studio. The 120-min liquid chromatogram gradient was analyzed, and 628–781 kinds of proteins (average 724) and 7,526–11,464 kinds of peptides (average 10,238) were identified in the plasma samples without high-abundance protein removal. About the raw data of pFind, we had submitted to iProX Datasets (https://www.iprox.org/page/HMV006.html) under the Project ID: IPX0002313000. And the results of pFind were showed in Supplementary Table 1.

Table 1 Differential molecular modification information of plasma proteome between the old and young group.

Comparison of post-translational modifications of the plasma proteome between the young and old groups

A total of 1,169 modifications (including low abundance modifications, that is, each modification is identified at least once) were identified in the plasma samples of the old group, and 1,154 modifications (including low abundance modifications) were identified in the plasma samples of the young group. Eighty-eight percent of the modifications are of the same type, and the remaining 12% are related (Fig. 1).

Figure 1
figure 1

Statistics of the global plasma proteome chemical modifications in the old and young groups. The old and young groups have 1,080 of the same type of chemical modification, of which 89 are unique to the old group, and 74 are unique to the young group. The above entire modifications include all low-abundance modifications.

Among the 163 unique protein chemical modifications in the two groups, 158 of them had less than 50% repeatability in each group, and only 5 of them met the conditions of more than 50% reproducibility. They are four kinds of modifications in the old group: Arg → Phe [R], Arg → Trp [R], Xlink_DMP [K], DTT_ST [S] and bisANS-sulphonate [T] modifications in the young group. In general, the unique chemical modification types are the molecular modifications with low identification quantity and poor reproducibility in the group.

Non-low-abundance chemical modifications (i.e., each modification is identified 10 times or more in the sample) are counted, and the identification coverage in each group is required to be greater than 50%. There are 120 kinds of chemical modifications in 1,080 kinds that meet this condition. Unsupervised cluster analysis can roughly distinguish young group and old group samples, but 40% of young group and old group samples are clustered into one group, which is not classified as being the same as other samples in the group. Figure 2 shows the unsupervised clustering results of specific samples. The screening conditions for different chemical modifications are as follows: the p-value between two groups is less than 0.05, and the mean value of the number of chemical modifications of each sample in the group is calculated by the normalization of the total number of chemical modification spectra identified. There are four modifications with a multiple of change greater than 1.5 between groups: 2-succinyl [C], Lys → Thr [k], phosphor [C], and carbamyl [k]. Table 1 shows the details.

Figure 2
figure 2

Unsupervised cluster analysis of overall plasma proteome molecular modifications in the young and old groups. The red mark (Y) is the sample of the young group, and the blue mark (O) is the sample of the old group. The abscissa is the unsupervised clustering and sample-specific information, and the ordinate is the specific molecular modification name.

Random grouping test

Four different modifications satisfying different screening conditions (p-value less than 0.05, fold change greater than 1.5) were performed by random grouping tests to verify the false-positive rate of each modification. Ten young group datasets and 10 old group datasets were randomly divided into two groups with 10 samples in each group and 92,378 different combinations (\(\frac{1}{2}{c}_{20}^{10}\)). Each combination was statistically analyzed by the same difference screening conditions. After extensive calculation and statistical analysis, four different modified random combinations were obtained. See Table 2 for details.

Table 2 Random grouping calculation results of 4 different chemical modifications in the plasma proteome.

Through the random grouping test, it is found that the randomness of the four kinds of different modifications is approximately 5%, and the reliability is more than 90%. It is shown that the difference between the four kinds of modifications in different age groups is less likely to be generated at random. About the random grouping test details, the Supplementary 12 will show.

Discussion

The entire life cycle of proteins, extending from translation assembly to final degradation, involves many toxic environments. These environments modify protein molecules to change the structure and function of the protein. This process is also called molecular ageing18. These reactions are mainly caused by the non-enzymatic binding of active small molecules on proteins with functional groups. The modification of proteins can be divided into reversible and irreversible modifications. Through a non-limiting modification search, several significantly different protein chemical modifications were found in the plasma of the two different age groups. These modifications include succinamide modification of cysteine, the substitution of lysine residues by threonine residues, phosphorylation of cysteine, the substitution of proline residues by tryptophan residues, phosphorylation of serine, N-terminal modification of succinamide, dehydration of serine, the substitution of glutamic acid residues by aspartic acid residues, and carbamylation of lysine. Among these changes, succinylation, phosphorylation of cysteine, and L-lysine replacement with threonine were significantly higher in the older group than in the younger group, and carbamylation of lysine was higher in the younger group than in the older group.

Cysteine is one of the amino acids with a lower abundance in the protein. The statistics of protein residues show that the average frequency of cysteine in eukaryotes is approximately 2%, and 70% of the reduced sulfhydryl source protein is present in the in vivo environment19. Studies have shown that the abundance of cysteine in protein is affected by the function of the protein. The high activity of the sulfhydryl group makes facilitates its participation in many chemical modifications. In contrast, this group also determines the distribution and topological properties of cysteine residues in protein structure. The sulfhydryl group is also an important group to form a disulfide bond and maintain protein structure20.

The succinylation and phosphorylation of cysteine in this study involve the participation of the sulfhydryl group, which may affect the formation of the disulfide bond. Fumaric acid was added to the dissociative sulfhydryl sites of some Cys residues in proteins by a Michael addition reaction to form S-(2-succinic acid) cysteine21. This modification was initially detected in plasma proteins (including albumin) and formed by irreversible reactions. This modification has been reported in diabetes, obesity, fumaric acid hydratase-related diseases, and the model of RIE's syndrome. At the same time, it was also found that the content of succinic acid protein increased in mouse 3T3-L1 adipocytes cultured in high glucose medium (30 mm, while the physiological level was 5 mm), as well as in rats treated with streptozotocin21,22,23,24. It has been reported that an excess of nutrients (sugars) will lead to an increase in ATP: ADP, NADH: NAD+ and mitochondrial membrane potential, while an increase in NADH: NAD + will inhibit oxidative phosphorylation, resulting in the continuous accumulation of mitochondrial intermediates (including fumaric acid), leading to an increase in protein succinylation21. The accumulation of succinate protein is also caused by a decrease in fumarate hydratase activity25,26. Fumaric acid hydratase catalyzes the reversal of fumaric acid to malic acid in the tricarboxylic acid cycle. It is important to note that loss of function and mutations in fumaric hydratase is known to predispose affected individuals to multiple skin diseases and uterine leiomyomas, as well as hereditary leiomyomas and renal cell carcinoma (HLRCC)19.

Although the exact role of succinic acid-modified cysteine and other related proteins has not been fully elucidated, it is related to the cancer response related to fumarate hydratase27,28,29. The increase in protein succinic acid was also described in the brain stem of ndufs4 knockout mice (a model of Leigh syndrome)30, indicating that this type of protein chemical modification has a potential role in the pathogenesis of this mitochondrial disease. Park et al. found that in the detected protein succinylation sites, 16 succinylation sites appeared in the cofactor binding area or enzyme catalytic area, and 74 succinylation sites existed around the enzyme active site31. Baynes et al. found that cysteine succinyl modification in human skin collagen increased with age22,32,33. Cysteine phosphorylation was recently found in prokaryotic and eukaryotic systems and is believed to play a key role in signal transduction and regulation of cellular responses. Due to the low chemical stability of thiophosphates in peptides, the in vivo phosphorylation of cysteine side chains is rarely studied33,34. Phosphorylation of cysteine is an important function of cysteine-dependent protein phosphatase (CDP), which belongs to a subfamily of protein tyrosine phosphatases (PTPs) and catalyzes the hydrolysis of phosphate ester bonds through the formation of phosphate cysteine intermediates. Studies have shown that this reversible post-translational regulation (PTM) is crucial in regulating the expression of virulence determinants and bacterial resistance to antibiotics. Besides, it has also been shown that phosphorylation of cysteine, previously considered a rare modification, may be more common in nature and may play an important role in the biological regulation of various organisms35,36,37.

The replacement of lysine with threonine changed the acid–base properties of the protein, which affected the activity and function of the protein. It has long been observed that the substitution of lysine residues with threonine residues on hemoglobin will reduce its oxygen affinity38. Sun WY et al. found that the mutation of nucleotide 20,040 in exon 14, with Thr replacing Lys at amino acid 556, would reduce the pro-coagulative activity of prothrombin by 50%39. The decrease in oxygen affinity, including some other recessions belonging to coagulation activity and other physiological conditions, also reflects the slow metabolism of the body, which may be the earlier manifestation of ageing. Also, it was found that human and mouse embryonic stem cells need specific amino acids to proliferate. MES cells need threonine (Thr) metabolism to complete epigenetic histone modification. Thr is converted to glycine and acetyl coenzyme A, and glycine metabolism specifically regulates the trimethylation of lysine (Lys) residues in histone H3 (H3K4me3)40. Besides, we also found that the modification of L-lysine carbamylation in the old group was lower than that in the young group. Carbamylation is an irreversible non-enzyme modification process. The process is the side chain reaction between the decomposition product of urea and the N-terminus of protein or lysine residue, which was previously reported to be related to the ageing of proteins41. L-lysine carbamylation can promote the coordination interaction of metal ions to specific enzyme activities. Some studies have pointed out that the amount of carbamylation in the plasma of patients with increased urea levels (such as nephrotic patients) is significantly increased42.

Ageing is an inevitable and spontaneous process undergone by the organism over time. Ageing is a complex natural phenomenon that is manifested by the degeneration of structure, the decline of function, and the recession of adaptability and resistance. Ageing is one of the largest known risk factors for most human diseases: approximately two-thirds of the world's 150,000 people die every day from ageing-related causes. At present, the cause of ageing has not been determined. The current mainstream theory explaining ageing is damage theory, and DNA damage is considered as the common foundation of cancer and ageing. Some people think that the internal cause of DNA damage is the most important driving force of ageing. According to the waste accumulation theory, the accumulation of waste in cells may interfere with metabolism. For example, a waste called lipofuscin is formed by a complex reaction of fat and protein combining in cells. These wastes accumulate in cells in the form of small particles, and their size will increase with age. Plasma protein can reflect changes in the body during the process of ageing. At the same time, the accumulation of biological macromolecules whose structure is destroyed or even inactivated may lead to the gradual failure of the biological body and system, which is also considered to be the concept of ageing.

Through the research described above, we found several types of chemical modifications and replacement of proteins in different age groups. These chemical modifications change the structure and properties of proteins and then affect the function of proteins. It is speculated that the gradual accumulation of some kinds of harmful and irreversible protein modifications in the plasma of the elderly may reflect the ageing process of the body. The accumulation of harmful protein modifications may be one of the reasons why the old are more likely than the young to suffer from ageing-related diseases, such as metabolic diseases, cardiovascular and cerebrovascular diseases, and tumor risks.

Materials and methods

Sample collection

The plasma samples of 20 candidates who had medical examinations with passing medical tests were collected from the clinical laboratory of Beijing Hospital, and all of the samples had been discarded from the clinical laboratory. All candidates fully understood and signed the informed consent. The samples were divided into two groups according to age: a young group and an old group. The samples were randomly selected from the clinical laboratory samples, and there were no restrictions or requirements on the diet, drugs, and other factors of the blood sampling donors. The study was approved by the Beijing Hospital and the ethics committee of Beijing Normal University and all experiments were performed under relevant guidelines and regulations. This experiment provided volunteers with detailed information about the study, including its purpose, method, and process, and kept the personal information of volunteers strictly confidential. Only age and gender information are mentioned for discarded samples from the laboratory. Table 3 shows the sample information. Raw data and results of pFind were submitted to iProX Datasets (https://www.iprox.org/page/HMV006.html) under the Project ID: IPX0002313000.

Table 3 Statistical data of plasma samples.

Protein sample preparation and trypsin enzymolysis

The plasma was centrifuged after whole blood anticoagulant treatment. The plasma sample (n = 20) was diluted 40 times with Milli-Q water, and then 100 μL was taken for subsequent experiments. A 20 mmol/L dithiothreitol (DTT) was used to react with the sample at 37 °C for 1 h to denature the disulfide bond in the protein structure, and then 55 mmol/L iodoacetamide (IAM) was added and reacted in the dark for 30 min to alkylate the disulfide bond sites. The supernatant was precipitated with three volumes of precooled acetone at − 20 °C for 2 h and then centrifuged at 4 °C and 12,000×g for 30 min to obtain protein precipitation. The precipitate was then resuspended in an appropriate amount of protein lysis buffer solution (8 mol/L urea, 2 mol/L thiourea, 25 mmol/L DTT, and 50 mmol/L Tris). The concentration of protein extract was measured by Bradford analysis. Using filter-assisted sample preparation (FASP), trypsin gold (mass spec grade, Promega, Fitchburg, WI, USA) was used to hydrolyze 100 μg protein at a ratio of 50:1 for each sample. The dried peptides were sealed at -80 °C after drying by a vacuum centrifugal concentrator.

Liquid chromatography-tandem mass spectrometry (LC–MS/MS) analysis

Before analysis, the dried polypeptide samples were dissolved in 0.1% formic acid solution, and the final concentration was controlled at 0.1 μg/μL. Each sample was analyzed according to 1 μg polypeptide quality: a Thermo Easy-nlc1200 chromatographic system was loaded on the precolumn and the analytical column. Proteomic data were collected by the Orbitrap Fusion Lumos mass spectrometry system (Thermo Fisher Scientific Bremen, Germany). Liquid chromatography: pre-column: 75 μm × 2 cm, nanoviper C18, 2 μm, 100 Å; analytical column: 50 μm × 15 cm, nanoviper C18, 2 μm, 100 Å; injection volume: 10 μL, flow rate: 250 nL/min, mobile phase: phase A: 100% mass spectrometry grade water (Fisher Scientific, span)/1‰ formic acid (Fisher Scientific), phase B: 80% acetonitrile (USA)/20% water/1‰ formic acid, 120 min gradient washing off: 0 min, 3% B phase; 0–3 min, 8% B phase; 3–93 min, 22% B phase; 93–113 min, 35% B phase; 113–120 min, B phase; mass spectrometry, ion source: spray voltage: 2.0kv, capillary temperature:320 °C, resolution setting: first-order(Orbitrap) 120,000 @m/z 200, second-order (Orbitrap) 30,000 (Orbitrap) @m/z 200,parent ion scanning range: m/z 350–1,350; sub ion scanning range: start from m/z 110, MS1 AGC: 4E5, charge range: 2–7, ion implantation time: 50 ms, MS 2 AGC: 1E5, ion implantation time: 50 ms, ion screening window: m/z 2.0, fragmentation mode: HCD, energy NCE 32, data dependent MS/MS: Top 20, dynamic exclusion time: 15 s, internal calibration mass number: 44 5.12003.

Database search

The pFind Studio software (version 3.1.3, Institute of Computing Technology, Chinese Academy of Sciences) was used to analyze the LC–MS/MS data with label-free quantification. The target retrieval database is from the Homo Sapiens database downloaded from UniProt (updated to October 2018). Raw files generated by the Orbitrap Fusion were searched directly using a ± 20-ppm precursor mass tolerance and a ± 20-ppm fragment mass tolerance. At the time of retrieval, the instrument type is HCD-FTMS, the full specificity of an enzyme is trypsin, and there are at most two missing sites, peptides with at least six amino acids were retained. Open-search is selected. Screening conditions: the FDRs were estimated by the program from the number and quality of spectral matches to the decoy database; for all data sets, the FDRs at spectrum, peptide, and protein level were < 1%, and the Q-value at the protein level is less than 1%. Data are analyzed using both forward and reverse database retrieval strategies.