Accurate and Efficient Resolution of Overlapping Isotopic Envelopes in Protein Tandem Mass Spectra

Xiao, Kaijie; Yu, Fan; Fang, Houqin; Xue, Bingbing; Liu, Yan; Tian, Zhixin

doi:10.1038/srep14755

Download PDF

Article
Open access
Published: 06 October 2015

Accurate and Efficient Resolution of Overlapping Isotopic Envelopes in Protein Tandem Mass Spectra

Kaijie Xiao¹,
Fan Yu¹,
Houqin Fang¹,
Bingbing Xue¹,
Yan Liu¹ &
…
Zhixin Tian¹

Scientific Reports volume 5, Article number: 14755 (2015) Cite this article

2772 Accesses
15 Citations
1 Altmetric
Metrics details

Subjects

Abstract

It has long been an analytical challenge to accurately and efficiently resolve extremely dense overlapping isotopic envelopes (OIEs) in protein tandem mass spectra to confidently identify proteins. Here, we report a computationally efficient method, called OIE_CARE, to resolve OIEs by calculating the relative deviation between the ideal and observed experimental abundance. In the OIE_CARE method, the ideal experimental abundance of a particular overlapping isotopic peak (OIP) is first calculated for all the OIEs sharing this OIP. The relative deviation (RD) of the overall observed experimental abundance of this OIP relative to the summed ideal value is then calculated. The final individual abundance of the OIP for each OIE is the individual ideal experimental abundance multiplied by 1 + RD. Initial studies were performed using higher-energy collisional dissociation tandem mass spectra on myoglobin (with direct infusion) and the intact E. coli proteome (with liquid chromatographic separation). Comprehensive data at the protein and proteome levels, high confidence and good reproducibility were achieved. The resolving method reported here can, in principle, be extended to resolve any envelope-type overlapping data for which the corresponding theoretical reference values are available.

Ultra-fast label-free quantification and comprehensive proteome coverage with narrow-window data-independent acquisition

Article Open access 01 February 2024

MaxDIA enables library-based and library-free data-independent acquisition proteomics

Article Open access 08 July 2021

A comprehensive spectral assay library to quantify the Escherichia coli proteome by DIA/SWATH-MS

Article Open access 12 November 2020

Introduction

Protein characterization is performed with several commercially successful techniques, such as soft ionization (e.g., electrospray ionization and ESI¹), high-resolution affordable mass analyzers (e.g., Orbitrap²), efficient gas-phase dissociation (e.g., collision-induced dissociation (CID), higher-energy collisional dissociation (HCD)³ and electron-transfer dissociation (ETD)⁴) and state-of-the-art bioinformatics tools^{5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27}. Tandem mass spectrometry has also become a state-of-the-art platform for the characterization of high-throughput peptides and proteins. The interpretation of tandem mass spectra is the final, yet paramount, step of this platform. Overlapping isotopic envelopes (OIEs) are very common in these peptide and protein tandem mass spectra because many isotopic envelopes peak in a very narrow m/z range. In addition, it has long been a challenge to accurately and efficiently resolve these OIEs to maximize matching product ions and identify proteins^{28,29,30,31,32}.

Few resolving algorithms using Averagine and its variant models have been reported. THRASH resolves OIEs through sequential subtraction of best-fit abundance distributions; in the FTMS spectrum of GluC digest of a 191 kDa protein, THRASH found 65 more isotopic clusters (out of 824)³³. In 2006, L. Chen et al. reported the AID-MS algorithm and a Lorentzian-based peak-subtraction technique to resolve OIEs in high-peak-density regions³⁴. In the analysis of the plasma electron-capture dissociation (ECD) spectra of ubiquitin and carbonic anhydrase, AID-MS found 509 and 611 isotopic envelopes, respectively. In 2008, K. Park et al. reported an isotopic peak-intensity ratio based on an algorithm for which all possible pseudo-envelopes were considered and OIEs could be identified³⁰. MS-Deconv efficiently finds optimal set of envelopes through explicitly scoring combinations of candidate envelopes³⁵.

It is worth noting that quite a few algorithms have been reported to interpret OIEs of peptide precursor ions in the MS spectra. The observed overlap here is due to the small mass differences from either the peptides themselves^36,37, post-translational modifications (PTMs, citrullination³⁸, deamidation and ¹⁸O-labeling³⁹, metal-binding and sulfhydryl reduction⁴⁰), or quantitative chemical labeling (dimethyl⁴¹, acrylamide⁴² and mTRAQ⁴³).

Here, we report an alternative resolving method for OIEs based on exact isotopic envelopes computed from the elemental composition of the product ions’ actual amino acids. This method is in contrast to the strategies existing in published work that are used to determine the theoretical isotopic envelopes, which are composed of Averagine units. In this method, the ideal experimental abundance of a particular overlapping isotopic peak (OIP) is first calculated for all the OIEs sharing this OIP. An OIP is an experimental isotopic peak with its m/z values matched by theoretical isotopic peak m/z values from two or more product ions within the m/z tolerance. The relative deviation (RD) of the observed experimental abundance of this OIP relative to the summed ideal value is then calculated. The final individual abundance of the OIP in each OIE is defined as its ideal experimental abundance multiplied by 1 + RD. Currently, this method has been implemented in our automated intact protein database search engine ProteinGoggle⁴⁴ with user-friendly graphical user interfaces. ProteinGoggle, which runs on personal computers with Windows operating systems, is currently freely available at http://proteingoggle.tongji.edu.cn/. Tandem mass spectra from both individual proteins (with direct infusion) and proteome mixtures (with liquid chromatography separation) can be interpreted automatically. Initial results from the HCD spectra of myoglobin and E. coli are presented in this manuscript. The method reported here is expected to perform equally well with the tandem mass spectra of small peptides and better with the tandem mass spectra of large proteins in comparison with the current methods reported in the literature.

Results

HCD of Myoglobin

Three technical replicate HCD spectra of myoglobin, with abundances of 1.65E6, 1.61E6 and 1.67E6, were acquired and searched using ProteinGoggle with a search tolerance of IPACO = 20%, IPMD = 15 ppm and IPAD = 50%. IPACO, IPMD and IPAD are acronyms for “isotopic peak abundance cutoff” (in %), “isotopic peak m/z deviation” (in ppm) and “isotopic peak-abundance deviation” (in %), respectively.

The subsequent matching b and y ions, sequence coverage, peptide bond coverage, interpreted isotopic peaks and interpreted abundance from the forward search of the three replicate spectra were listed in Table 1. The peptide bond coverage is defined as the percentage of peptide bonds that have at least one matching b or y ion. The sequence coverage is defined as the percentage of amino acids in the protein sequence that is covered by matching b and y ions. For a protein has n amino acids and the biggest matching b and y ions are b_j and y_k, respectively, if j + k ≥ n, the sequence coverage is 100%; if j + k < n, then the sequence coverage equals to (j + k)*100%/n. An average of 92 matching b or y ions was found after a full 100% sequence coverage. The average peptide bond coverage was 42.5%. Both experimental isotopic peaks and abundances were comprehensively interpreted: the percentages were 95.6 ± 0.1 and 99.2 ± 0.0, respectively.

Table 1 Database Search Results of Three Technical Replicate HCD Spectra (S1, S2 and S3) of Myoglobin.

Full size table

Besides forward search described above, random and reverse searches were also carried out for the three HCD of myoglobin. The average numbers of matching and non-matching b and y ions with standard deviation from the random and reverse searches vs. those from the forward search were plotted in Fig. 1; the detailed lists of these matching and non-matching b and y ions are provided in supplemental Tables S1 and S2, respectively.

RPLC-MS/MS of the E. coli Intact Proteome

Three technical replicates of reversed-phase liquid chromatography (RPLC)-MS/MS datasets from E. coli were acquired. The total HCD spectra were 16573, 16533 and 16591. The abundance of the MS-only base-peak chromatograms were 2.62E8, 2.92E8 and 2.77E8, respectively (Supplemental Figure S1). ProteinGoggle was used to search the datasets. Two distinct tolerance parameters for MS and MS/MS spectra, as described in the following Methods section, were used for this analysis.

The resultant protein spectrum matches (PrSMs), unique proteoforms together with their averaged sequence coverage, peptide bond coverage, interpreted isotopic peaks and interpreted abundance from each dataset were summarized in Table 2. With a spectrum-level false discovery rate (FDR) of 1%, an average of 5105 ± 544 PrSMs with PMPs ≥ 5 were identified from the three datasets. PMPs, the percentage of matching product ions, is defined as the minimum percentage of the experimental matching product ions for the identification of a PrSM. “Amino acid sequence” and the corresponding “PTMs” were used as criteria to group PrSMs from each dataset in Microsoft Excel to remove duplicates and obtain unique proteoforms. A proteofrom may be identified multiple times from the same precursor ion in different TopN cycles or different precursor ions of different charge states; only the PrSM with the most matching b and y ions are kept for the final protein ID (i.e., proteoform). With grouping, an average of 105 ± 2 unique proteoforms were identified from the three datasets. The detailed information (including retention time, protein ID, sequence length, PTMs, PTM Score, -log(P Score), sequence coverage, peptide bond coverage, interpreted isotopic peaks and interpreted abundance) for each unique proteoform in the three datasets was provided in Supplemental Tables S3, S4 and S5, respectively. PTM Score is defined as the total number of non-redundant matching product ions containing the PTM that independently define the unique localization of a PTM; a product ion with multiple charge states are only counted once. The proteoforms have an average sequence coverage of 74.6 ± 1.8 and a peptide bond coverage of 26.7 ± 0.7. A higher coverage of sequence peptide bonds led to more confident protein identification and a greater chance of unique localization of PTMs.

Table 2 Database Search Results of Three Technical Replicate RPLC-MSMS Datasets (D1, D2 and D3) of the E. coli Intact Proteome with a Spectrum-Level FDR of 1%. PrSMs = Protein Spectrum Matches.

Full size table

Among the 128 unique proteoforms that were identified from 3 technical replicates, 13 were modified with acetylation, biotinylation, monomethylation, trimethylation, O-(pantetheine 4′-phosphoryl), or a combination thereof. PTMs on 12 of these modified proteoforms were uniquely localized with high PTM scores. For example, in protein RL7_ECOLI (accession number P0A7K2), S1 acetylation and K81 methylation were identified with PTM scores of 25 and 4, respectively. These scores imply that 25 and 4 matching b or y ions independently defined the unique locations of these two modifications (Fig. 2).

Discussion

The aforementioned matching product ions (short for MPs, including both b and y ions) have ideal experimental isotopic envelopes. An experimental isotopic envelope is considered an ideal isotopic envelope if all of its experimental isotopic peaks (above IPACO) are observed and their m/z and relative abundance are within the tolerance of IPMD and IPAD, respectively. On the other hand, an experimental isotopic envelope is a non-ideal isotopic envelope if any of its experimental isotopic peaks (above IPACO) are not observed or if the relative abundance of any observed experimental isotopic peak is larger than IPAD. The product ions with non-ideal isotopic envelopes are defined as non-matching product ions (short for non-MPs) accordingly.

The presence of MPs vs. non-MPs was evaluated using two search methods: first, by performing random and reverse searches for the three myoglobin HCD spectra and second, by a forward database search (Fig. 1). In both search methods, the non-MPs exhibited much higher randomness than MPs. The number of non-MPs in both random and reverse searches is on the same order of magnitude as that of the forward search. It is, therefore, paramount to use only MPs for protein identification, as well as for PTM localization. Using the ProteinGoggle database, in which all theoretical ions are pre-stored, the search for a tandem mass spectrum leads to a match with a product ion only if its theoretically highest (i.e., 100%) isotopic peak is observed with an m/z deviation smaller than or equal to the IPMD tolerance. Whether this ion is matching or non-matching is further categorized using the IPACO and IPAD tolerances. Due to the nature of both the reverse and random databases, a greater number of matching and non-matching products ions are usually found in the corresponding forward search.

A benefit of using the OIE_CARE method to resolve OIEs is that the appropriate individual experimental abundance of the shared OIPs is retrieved. Therefore, certain otherwise non-MPs are turned into MPs. More MPs were generally found for both the myoglobin and E. coli proteoforms. Four more proteins were identified from the E. coli datasets.

For HCD of myoglobin, IPADs of OIP m/z 1142.617676 in y10-1+ and y72-7+ were reduced from 214 to –2 and 343 to –5, respectively (Table 3). Thus, these two non-MPs were converted into MPs. A total of 141 unique matching b or y ions were found from the three replicate spectra. The number reduced to 134 when the OIE_CARE method was disabled. This implies that 7 more matching b/y ions (b76-7+, y10-1+, y136-13+, y149-15+, y58-6+, y72-7+, y76-7+) were found by resolving OIPs using OIE_CARE. As an example, the iEF maps of y72-7+ without and with using OIE_CARE are shown in Fig. 3(A,B), respectively. The theoretical m/z, theoretical relative abundance, experimental m/z, experimental relative abundance before and after the resolution for each isotopic peak in Fig. 3 are provided in Table S6. It is worth noting that resolving the experimental abundance of OIPs in a tandem mass spectrum increases the number of matching ions (and decreases number of non-matching ones) in general. At the same time, the total number of ions (including both matching and non-matching) remains the same. When a, b and y ions and their neutral loss (NL) ions were included in the database search, 84.2 ± 0.3% of the interpreted isotopic peaks in myoglobin HCD spectra were found to be OIPs. The overlapping percentage of matching b and y ions was 96.8 ± 0.9%. When a stringent IPMD of 5 ppm was used in the search, the percentages of matching OIPs and overlapping b and y ions were 43.7 ± 2.3% and 60.8 ± 5.3%, respectively. Therefore, the efficient and accurate resolving of such a high percentage of OIPs and OIEs is indispensable to confidently maximizing the matching product ions and protein identification. The OIE_CARE and partition of overlapping abundance of OIPs were used to resolve the experimental relative abundance of all interpreted isotopic peaks with IPAD ≥ 0 from all matching and non-matching ions. The result was that these were also comprehensively brought very close to their corresponding theoretical values. Comparative results from one of the myoglobin HCD spectra with or without using OIE_CARE are presented in Fig. 3(C,D); where the experimental relative abundance of all interpreted isotopic peaks (in all matching and non-matching product ions) are plotted against the corresponding theoretical relative abundance. These abundance together with m/z values are default output of ProteinGoggle for both matching and non-matching product ions. It should be noted that isotopic peaks with IPAD > 0 are, in general, OIPs with a shared experimental abundance. Equivalent plots of the interpreted isotopic peaks with IPAD < 0 show no observed essential changes and are provided in Supplemental Figure S2. As seen from Table 3, to resolve an OIP with n OIEs (or product ions), only 2n + 1 simple arithmetic (addition, subtraction, multiplication, or division) calculations are necessary. This linear computation load relationship with the size of the OIEs is especially advantageous for OIPs with many OIEs. For the HCD spectra of myoglobin, the isotopic peak of m/z is 1123.608521 and is shared by 26 product ions (b111-2H₂O-11+, b111-H₂O-NH₃-11+, b111-2NH₃-11+, y142-2H₂O-14+, y142-H₂O-NH₃-14+, b142-2H₂O-14+, y142-2NH₃-14+, b142-H₂O-NH₃-14+, a111-11+, b142-2NH₃-14+, a92-2H₂O-9+, b70-7+, a152-H₂O-15+, y10-H₂O-1+, y71-7+, b152-2H₂O-15+, b152-H₂O-NH₃-15+, b152-2NH₃-15+, y142-H₂O-14+, b121-12+, y142-NH₃-14+, a71-2H₂O-7+, y103-2H₂O-10+, a152-15+ and y103-2NH₃-10+).

Table 3 Resolving the OIPs among y10-1+, y20-2+ and y72-7+ in the HCD Spectrum of Myoglobin Using the OIE_CARE Method.^*

Full size table

For E. coli protein identification at the proteome level, more matching b and y ions were found with the OIE_CARE resolving OIEs for most of the identified proteoforms (Fig. 4). For example, 3, 6 and 3 more matching b and y ions were found for GRCA_ECO45, IHFB_ECO24 and DBHB_ECO57, respectively. The corresponding labeled MS/MS spectra are provided in Supplemental Figure S3. With OIE_CARE and more matching b and y ions, four new proteins (ASR_ECOLU, C562_ECO57, YNFD_ECOLI and RNFH_ECO7I) were also identified. The graphical fragmentation maps, along with matching b and y ions of these four new proteins, are provided in Supplemental Figure S4.

Protein-level comprehensiveness, in terms of the percentage of interpreted experimental isotopic peaks and abundance, has been achieved when b, y and their NL ions (including “a” and “a-NL” ions) are included in the database search (Table 1). To evaluate the individual contribution of the various ion series (only b and y ions), the HCD spectra of myoglobin were also independently searched using the same set of tolerance parameters as described above. An increase in the interpreted isotopic peaks and abundance versus these two combinatorial ion series are shown in Fig. 5. The b or y ions are approximately 70% in the number of isotopic peaks and approximately 90% in abundance. This implies that the b or y ions are the most abundant ion series in the HCD spectra of myoglobin. The b or y-NL (including a and a-NL) are approximately 26% in the number of isotopic peaks but approximately 10% in the total abundance. The remaining less than 4% of the isotopic peaks belonged to internal ions or their NL ions. Their total abundance (<1%) is negligible in this case. For comprehensiveness at the proteome level, the identification rate of the E. coli tandem mass spectra from the three technical replicate RPLC-MSMS runs is 73.3 ± 3.4%. The identification rate is defined as the total number of PrSMs from the dataset divided by the total number of MS/MS spectra between the first and last PrSMs. Here the MS/MS spectra were acquired only for precursors with ≥5 or with unassigned charge states. This rate could be further improved by additional search of the proteolytic peptidome, as well as by more comprehensive annotation of PTMs. The current protein-annotation rate in terms of ‘MOD_RES’ in the flat text file was only 5.2%. These extra utilities for ProteinGoggle are under development.

The protein-level reproducibility was characterized with matching b and y ions, which are the two ion series used for protein identification scoring and PTM localization. The results are shown in Fig. 6(A). The shared matching b and y ions among all three replicates are more than 60%. The proteome-level reproducibility was characterized using identified unique proteoforms and those shared among the three technical replicates of the RPLC-MS/MS analysis of the E. coli intact proteome were more than 80% (Fig. 6(B)). Better reproducibility at the proteome level would be possible with additional dimension(s) of separation to increase the dynamic detection range.

Overall, OIEs are very common in protein tandem mass spectra and the efficient resolving of these OIEs is essential for maximizing the matching product ions, improving the confidence in protein identification and achieving unique localization of PTMs. Using theoretical isotopic envelopes as a reference, the OIE_CARE method, as implemented in ProteinGoggle, efficiently disentangles OIEs at the raw experimental data level. This not only produces good orthogonality between the experimental and theoretical data, but it also maximizes the matching product ions and confidence in the protein identification. This computationally efficient method could, in principle, be extended to resolve any envelope-type overlapping data for which the corresponding theoretical reference values are available.

Methods

Reagents

Myoglobin (from horse heart, M1882), acetonitrile (CHROMASOLV gradient grade, 34851) and formic acid (FA, eluent additive for LC-MS, 56302) were purchased from Sigma-Aldrich (St. Louis, MO, USA). Tryptone (TG217), yeast extract (G0961), NaCl (F20051212), PBS (SB0627), PMSF (PB0425) and a BCA Reagent Kit (SK3021) were bought from Sangon Biotech (Shanghai, China). Ultrapure water was produced in the laboratory using the Millipore Simplicity system.

Cell Culture of E. coli and Protein Extraction

A conical flask with 2 g tryptone, 1 g yeast extract, 2 g NaCl and 200 mL doubly distilled H₂O was covered with aluminum foil and sterilized in a Shen’an high-pressure steam sterilizer (LDZX-50FBS, Shanghai, China) at 121 °C for 21 min. After cleaning the outer wall with 75% alcohol, the flask was transferred into a Suzhou Antai ultraclean bench (SW-CJ-2FD, Suzhou, Jiangsu, China) and pre-disinfected with UV for 15 min while cooling down to room temperature. A fresh E. coli colony was then injected into the flask and cultured overnight at 37 °C and 220 rpm in a ZHICHENG shaker (ZWY-240, Shanghai, China). After centrifugation for 5 min at 8000 rpm and 4 °C (Eppendorf, Centrifuge 5804R, Hamburg, Germany), the cell pellet was washed three times with 20 mL PBS. The pellet was then re-suspended in 5 mL PBS with 50 μL PMSF. Cells were lysed in a 1.5 mL centrifuge tube over ice using a Ningbo Scientz (Ningbo, Zhejiang, China) ultrasonic cell disruptor. Each cycle consisted of running the sample for 5 s (at 300 J, 4 °C) and pausing for 10 s. This cycle was continuously run for 5 min. After centrifugation for 15 min at 10000 rpm and 4 °C (Eppendorf, Centrifuge 5804R, Hamburg, Germany), the protein concentration in the supernatant was measured using a BCA assay in TECAN Infinite F50 (Salzburg, Austria) according to the manufacturer’s protocol. The E. coli proteome solution was finally aliquoted into 1.5-mL centrifuge tubes and stored in a refrigerator (at −80 °C) for future use.

HCD of Myoglobin

HCD tandem mass spectra of myoglobin in the profile mode were acquired using a Thermo Scientific Q Exactive Orbitrap mass spectrometer (Waltham, MA, USA). The myoglobin solution (2 μM, CH₃OH/H₂O 3:1 (v/v), HCOOH 1%) was electrosprayed and a 15+ ion (m/z 1131), was isolated with an isolation width of 6.0 m/z and fragmented at an NCE of 24%. An AGC (automatic gain control) target of 5E5 was used and three technical replicate spectra (S1, S2 and S3) were acquired at a 70 K resolution using 10 microscans.

RPLC-MS/MS of E. coli Proteome

RPLC tandem mass spectrometry using HCD of the E. coli intact proteome was performed using a Thermo Scientific Q Exactive mass spectrometer coupled with a Dionex UltiMate 3000 RSLCnano high-performance liquid chromatography (HPLC) system. The analytical column (75 μm i.d., 60 cm long) was packed in-house with C4 (5 μm, 300 Å). The trap column was packed with the same particles, but with an i.d. of 200 μm and a length of 5 cm. Buffer A consisted of 5% ACN, 94.8% H₂O and 0.2% FA. Buffer B consisted of 95% ACN, 4.8% H₂O and 0.2% FA. After being trapped on the column, the E. coli proteome was eluted using the following linear gradient: 0 min, 1% B; 1 min, 15% B; 92 min, 65% B; 98 min, 75% B; and 103 min, 99% B. At 99% B, the system was held for an additional 15 min. The MS spectra of the precursor ions were acquired with the following settings: microscans, 2; resolution, 70,000 (m/z 200); AGC, 3E6; and scan range, m/z 600–2,000. The data-dependent Top10 tandem HCD spectra acquisition settings were as follows: microscans, 1; resolution, 35,000 (m/z 200); AGC, 5E5; maximum IT, 250 ms; isolation window, 10 m/z; NCE, 30%; charge exclusion, 1–4; and dynamic exclusion, 20.0 s. Both MS and MS/MS spectra were acquired in the centroid mode. Overall, three technical replicate RPLC-MS/MS datasets (D1, D2 and D3) were acquired.

Database Search Using ProteinGoggle

The intact protein database search using ProteinGoggle, implemented with the isotopic mass-to-charge (m/z) ratio and envelope fingerprinting (iMEF) search algorithm, has been fully reported elsewhere and only a brief description is given here. Theoretical precursor isotopic envelope databases were created for all possible charge states of every proteoform in the MS acquisition window. The theoretical product-ion isotopic envelope databases were created with ion series of a/b/y and a/b/y-NL (NL = NH₃, H₂O, NH₃ + H₂O, 2NH₃ and 2H₂O). H₂O loss was a result of product ions containing the amino acids D/E/S/T. NH₃ loss was a result of product ions containing the amino acids K/N/Q/R. For the above data-dependent spectra, both the precursor ions and product ions were “fished” from the theoretical isotopic envelope database and fully confirmed using isotopic m/z fingerprinting (iMF) and isotopic envelope fingerprinting (iEF), respectively. Two sets of values (40/15/100 and 20/15/50) were used in this study for the precursor and product ion search, respectively. These search parameters were pre-optimized at the proteome level for most protein IDs, with orthogonal combinatorial parameter design and FDR control (data not shown). In addition to the above search parameters, a value of PMPs ≥ 5 was used for the identification and output of PrSMs. Final protein identification with an FDR of 1% at the spectrum level was achieved through a decoy search using a random database and a P Score cutoff.

Flat text protein databases were downloaded from UniProt (www.uniprot.org). For myoglobin (145 AAs with initial methionine), the entry name is MYG_HORSE with an accession number of P68082. For the E. coli proteome, the text database includes 7,658 proteins (2589 unique proteins by the amino acid sequence). This was downloaded with the following criteria: ‘Organism [OS]’ = escherichia coli, ‘Sequence_Fragment’ = No, ‘Sequence_Sequence length’ = 1–200 and ‘Reviewed’ = Yes. The corresponding customized ProteinGoggle database was created using shotgun imagery. With all annotated PTMs (listed in Supplemental Table S7) treated dynamically, a total of 2,883 proteoforms for E. coli were created; i.e., 294 of these proteoforms have one or more PTM(s). For example, RL10_ECO7I has annotated acetylation (ac) on K37 and K105, respectively and a total of 4 individual proteoforms (no PTM, K37ac, K105ac and K37acK105ac) were created.

The resolving of OIPs using the OIE_CARE method proceeds according to the following three steps. Given an OIP shared by n OIEs, the ideal experimental abundance of this OIP for the ith ion (DEA_i) is first calculated using Equation 1, where TA_i is the theoretically relative abundance of the OIP in this ion; EA_r and TA_r are the experimental absolute abundance and theoretical relative abundance of the reference isotopic peak of this ion, respectively. The reference isotopic peak is the normalization isotopic peak used to transform the absolute experimental abundance of all isotopic peaks of this ion into the relative experimental abundance. Second, the RD of the observed experimental abundance of this OIP (EA_OIP) relative to the corresponding total ideal value is then calculated (Equation 2). The final partitioned individual abundance of the OIP in the ith ion is its ideal value multiplied by the sum of one plus the relative deviation (Equation 3).

An example of resolving the OIP of m/z 1142.617676 shared by y10-1+, y20-2+ and y72-7+ using the OIE_CARE method is illustrated step-by-step in Table 3. According to Equation 1, the ideal experimental abundance for this particular OIP in y10-1+ (E3), y20-2+ (F3) and y72-7+ (G3) is firstly calculated using B7*D6/B6 and B10*D11/B11 and B14*D15/B15, respectively; e. g., the DEA of this OIP in y10-1+ was calculated as B7*D6/B6 = 23.01*128926.921875/63.64 = 46615.469396. This brings the total ideal value of this OIP to be 529680.016342 (E3 + F3 + G3). Given the EA_OIP = 480992.312500 (D3), the RD is computed to be −0.09 (H3) using Equation 2. The final individual experimental abundance for this OIP in y10-1+ (D7), y20-2+ (D10) and y72-7+ (D14) is calculated using E3*(1 + H3), F3*(1 + H3), G3*(1 + H3), respectively; e. g., the final experimental abundance of this OIP in y10-1+ is calculated to be 42330.617979 from 46615.469396*(1–0.09) using Equation 3. After resolving with OIE_CARE, the IPAD values of m/z 1142.617676 are changed from 214, 15 and 343 to –2, –9 and –5 in y10-1+, y20-2+ and y72-7+, respectively. This shows that the total abundance of this OIP is efficiently partitioned into the individual OIEs. In addition to the OIP of m/z 1142.617676, these three ions in combination with y144-14+ share another four OIPs (m/z 1140.619995, 1143.622437, 1144.620605 and 1145.624756). The experimental abundance of these OIPs can be resolved independently using the same three steps as described above. The full steps of the above five OIPs shared by the four product ions are provided in Supplemental Table S6.

The OIE_CARE resolving strategy and steps have been implemented in our intact protein database search engine ProteinGoggle. The database can be searched for individual tandem mass spectra from standard proteins with direct infusion or datasets from a proteome mixture with liquid chromatographic separation. The computer code corresponding to this part of the functionality is provided in Supplemental Scheme S1.

Additional Information

How to cite this article: Xiao, K. et al. Accurate and Efficient Resolution of Overlapping Isotopic Envelopes in Protein Tandem Mass Spectra. Sci. Rep. 5, 14755; doi: 10.1038/srep14755 (2015).

References

Fenn, J. B., Mann, M., Meng, C. K., Wong, S. F. & Whitehouse, C. M. Electrospray ionization for mass spectrometry of large biomolecules. Science 246, 64–71 (1989).
Article CAS ADS PubMed Google Scholar
Hardman, M. & Makarov, A. A. Interfacing the orbitrap mass analyzer to an electrospray ion source. Anal Chem 75, 1699–1705 (2003).
Article CAS PubMed Google Scholar
Olsen, J. V. et al. Higher-energy C-trap dissociation for peptide modification analysis. Nature methods 4, 709–712, 10.1038/nmeth1060 (2007).
Article CAS ADS PubMed Google Scholar
Syka, J. E., Coon, J. J., Schroeder, M. J., Shabanowitz, J. & Hunt, D. F. Peptide and protein sequence analysis by electron transfer dissociation mass spectrometry. Proceedings of the National Academy of Sciences of the United States of America 101, 9528–9533, 10.1073/pnas.0402700101 (2004).
Article CAS ADS PubMed PubMed Central Google Scholar
Zamdborg, L. et al. ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry. Nucleic Acids Res 35, W701–W706, 10.1093/Nar/Gkm371 (2007).
Article PubMed PubMed Central Google Scholar
LeDuc, R. D. et al. ProSight PTM: an integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res 32, W340–W345, 10.1093/Nar/Gkh447 (2004).
Article CAS PubMed PubMed Central Google Scholar
Boyne, M. T. et al. Tandem Mass Spectrometry with Ultrahigh Mass Accuracy Clarifies Peptide Identification by Database Retrieval. J Proteome Res 8, 374–379, 10.1021/Pr800635m (2009).
Article CAS PubMed PubMed Central Google Scholar
Frank, A. M., Pesavento, J. J., Mizzen, C. A., Kelleher, N. L. & Pevzner, P. A. Interpreting top-down mass spectra using spectral alignment. Anal Chem 80, 2499–2505, 10.1021/Ac702324u (2008).
Article CAS PubMed Google Scholar
Liu, X. W. et al. Protein Identification Using Top-Down. Mol Cell Proteomics 11, 10.1074/mcp.M111.008524 (2012).
Liu, X. W. et al. Identification of Ultramodified Proteins Using Top-Down Tandem Mass Spectra. J Proteome Res 12, 5830–5838, 10.1021/Pr400849y (2013).
Article CAS PubMed PubMed Central Google Scholar
Karabacak, N. M. et al. Sensitive and Specific Identification of Wild Type and Variant Proteins from 8 to 669 kDa Using Top-down Mass Spectrometry. Mol Cell Proteomics 8, 846–856, 10.1074/mcp.M800099-MCP200 (2009).
Article CAS PubMed PubMed Central Google Scholar
Tsai, Y. S. et al. Precursor Ion Independent Algorithm for Top-Down Shotgun Proteomics. J Am Soc Mass Spectr 20, 2154–2166, 10.1016/j.jasms.2009.07.024 (2009).
Article CAS Google Scholar
Geer, L. Y. et al. Open mass spectrometry search algorithm. J Proteome Res 3, 958–964, 10.1021/Pr0499491 (2004).
Article CAS PubMed Google Scholar
Mortz, E. et al. Sequence tag identification of intact proteins by matching tandem mass spectral data against sequence data bases. Proceedings of the National Academy of Sciences of the United States of America 93, 8264–8267, 10.1073/pnas.93.16.8264 (1996).
Article CAS ADS PubMed PubMed Central Google Scholar
Fagerquist, C. K. et al. Web-Based Software for Rapid Top-Down Proteomic Identification of Protein Biomarkers, with Implications for Bacterial Identification. Appl Environ Microb 75, 4341–4353, 10.1128/Aem.00079-09 (2009).
Article CAS Google Scholar
Eng, J. K., Mccormack, A. L. & Yates, J. R. An Approach to Correlate Tandem Mass-Spectral Data of Peptides with Amino-Acid-Sequences in a Protein Database. J Am Soc Mass Spectr 5, 976–989, 10.1016/1044-0305(94)80016-2 (1994).
Article CAS Google Scholar
Perkins, D. N., Pappin, D. J. C., Creasy, D. M. & Cottrell, J. S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567, 10.1002/(Sici)1522-2683(19991201)20:18<3551::Aid-Elps3551>3.0.Co;2-2 (1999).
Article CAS PubMed Google Scholar
Cox, J. et al. Andromeda: A Peptide Search Engine Integrated into the MaxQuant Environment. J Proteome Res 10, 1794–1805, 10.1021/Pr101065j (2011).
Article CAS ADS PubMed Google Scholar
Tanner, S. et al. InsPecT: Identification of posttransiationally modified peptides from tandem mass spectra. Anal Chem 77, 4626–4639, 10.1021/Ac050102d (2005).
Article CAS PubMed Google Scholar
Kim, S. et al. The Generating Function of CID, ETD and CID/ETD Pairs of Tandem Mass Spectra: Applications to Database Search. Mol Cell Proteomics 9, 2840–2852, 10.1074/mcp.M110.003731 (2010).
Article CAS PubMed PubMed Central Google Scholar
Bellew, M. et al. A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS. Bioinformatics 22, 1902–1909, 10.1093/bioinformatics/btl276 (2006).
Article CAS PubMed Google Scholar
Sturm, M. et al. OpenMS-An open-source software framework for mass spectrometry. Bmc Bioinformatics 9, Artn 163 10.1186/1471-2105-9-163 (2008).
Wang, L. H. et al. PFind 2.0: a software package for peptide and protein identification via tandem mass spectrometry. Rapid Commun Mass Sp 21, 2985–2991, 10.1002/Rcm.3173 (2007).
Article CAS Google Scholar
Li, D. Q. et al. pFind: a novel database-searching software system for automated peptide and protein identification via tandem mass spectrometry. Bioinformatics 21, 3049–3050, 10.1093/bioinformatics/bti439 (2005).
Article CAS PubMed Google Scholar
Shilov, I. V. et al. The paragon algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol Cell Proteomics 6, 1638–1655, 10.1074/mcp.T600050-MCP200 (2007).
Article CAS PubMed Google Scholar
Craig, R. & Beavis, R. C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467, 10.1093/bioinformatics/bth092 (2004).
Article CAS PubMed Google Scholar
Craig, R. & Beavis, R. C. A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun Mass Sp 17, 2310–2316, 10.1002/Rcm.1198 (2003).
Article CAS Google Scholar
Senko, M. W., Beu, S. C. & Mclafferty, F. W. Automated Assignment of Charge States from Resolved Isotopic Peaks for Multiply-Charged Ions. J Am Soc Mass Spectr 6, 52–56, 10.1016/1044-0305(94)00091-D (1995).
Article CAS Google Scholar
Zubarev, R. A., Kelleher, N. L. & McLafferty, F. W. Electron capture dissociation of multiply charged protein cations. A nonergodic process. J Am Chem Soc 120, 3265–3266, 10.1021/Ja973478k (1998).
Article CAS Google Scholar
Park, K. et al. Isotopic peak intensity ratio based algorithm for determination of isotopic clusters and monoisotopic masses of polypeptides from high-resolution mass spectrometric data. Anal Chem 80, 7294–7303, 10.1021/Ac800913b (2008).
Article CAS PubMed Google Scholar
Shin, B. et al. Postexperiment monoisotopic mass filtering and refinement (PE-MMR) of tandem mass spectrometric data increases accuracy of peptide identification in LC/MS/MS. Mol Cell Proteomics 7, 1124–1134, 10.1074/mcp.M700419-MCP200 (2008).
Article CAS PubMed Google Scholar
Shaw, J. B. et al. Complete Protein Characterization Using Top-Down Mass Spectrometry and Ultraviolet Photodissociation. J Am Chem Soc 135, 12646–12651, 10.1021/Ja4029654 (2013).
Article CAS PubMed PubMed Central Google Scholar
Horn, D. M., Zubarev, R. A. & McLafferty, F. W. Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J Am Soc Mass Spectr 11, 320–332, 10.1016/S1044-0305(99)00157-9 (2000).
Article CAS Google Scholar
Chen, L., Sze, S. K. & Yang, H. Automated intensity descent algorithm for interpretation of complex high-resolution mass spectra. Anal Chem 78, 5006–5018, 10.1021/Ac060099d (2006).
Article CAS PubMed Google Scholar
Liu, X. W. et al. Deconvolution and Database Search of Complex Tandem Mass Spectra of Intact Proteins. Mol Cell Proteomics 9, 2772–2782, 10.1074/mcp.M110.002766 (2010).
Article CAS PubMed PubMed Central Google Scholar
Renard, B. Y., Kirchner, M., Steen, H., Steen, J. A. J. & Hamprecht, F. A. NITPICK: peak identification for mass spectrometry data. Bmc Bioinformatics 9, Artn 355, 10.1186/1471-2105-9-355 (2008).
Sun, Y. T., Zhang, J. Q., Braga-Neto, U. & Dougherty, E. R. BPDA - A Bayesian peptide detection algorithm for mass spectrometry. Bmc Bioinformatics 11, Artn 490, 10.1186/1471-2105-11-490 (2010).
De Ceuleneer, M., Van Steendam, K., Maarten, D., Elewaut, D. & Deforce, D. Quantification of Citrullination by Means of Skewed Isotope Distribution Pattern. J Proteome Res 11, 5245–5251, 10.1021/Pr3004453 (2012).
Article CAS PubMed Google Scholar
Dasari, S. et al. Quantification of Isotopically Overlapping Deamidated and O-18-Labeled Peptides Using Isotopic Envelope Mixture Modeling. J Proteome Res 8, 1263–1270, 10.1021/Pr801054w (2009).
Article CAS ADS PubMed PubMed Central Google Scholar
Rhoads, T. W. et al. Using Theoretical Protein Isotopic Distributions to Parse Small-Mass-Difference Post-Translational Modifications via Mass Spectrometry. J Am Soc Mass Spectr 24, 115–124, 10.1007/s13361-012-0500-1 (2013).
Article CAS Google Scholar
Cappadona, S. et al. Deconvolution of overlapping isotopic clusters improves quantification of stable isotope-labeled peptides. J Proteomics 74, 2204–2209, 10.1016/j.jprot.2011.04.022 (2011).
Article CAS PubMed Google Scholar
Faca, V. et al. Quantitative analysis of acrylamide labeled serum proteins by LC-MS/MS. J Proteome Res 5, 2009–2018, 10.1021/Pr060102+ (2006).
Article CAS PubMed Google Scholar
Yoon, J. Y. et al. Improved Quantitative Analysis of Mass Spectrometry using Quadratic Equations. J Proteome Res 9, 2775–2785, 10.1021/Pr100183t (2010).
Article CAS PubMed Google Scholar
Li, L. & Tian, Z. X. Interpreting raw biological mass spectra using isotopic mass-to-charge ratio and envelope fingerprinting. Rapid Commun Mass Sp 27, 1267–1277, 10.1002/Rcm.6565 (2013).
Article CAS Google Scholar

Download references

Acknowledgements

This research was financially supported by a China State Key Basic Research Program Grant (2013CB911203), Shanghai Science and Technology Commission (14DZ2261100), the China “Youth 1000-talents Program” and the Tongji University “985 Project”.

Author information

Authors and Affiliations

Department of Chemistry and Shanghai Key Laboratory of Chemical Assessment and Sustainability, Tongji University, Shanghai, 200092, China
Kaijie Xiao, Fan Yu, Houqin Fang, Bingbing Xue, Yan Liu & Zhixin Tian

Authors

Kaijie Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Fan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Houqin Fang
View author publications
You can also search for this author in PubMed Google Scholar
Bingbing Xue
View author publications
You can also search for this author in PubMed Google Scholar
Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Tian
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Z.X.T. conceived the study; K.J.X. implemented OIE_CARE into ProteinGoggle; F.Y. performed the RPLC-MS/MS experiment on E. coli; H.Q.F. performed the cell culture of E. coli; X.B.B. performed the HCD of myoglobin; and Y.L. maintained the mass spectrometer. Z.X.T., F.Y. and K.J.X. performed the data analysis. Z.X.T. wrote the manuscript.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Electronic supplementary material

Supplementary Information

Rights and permissions

This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/

Reprints and permissions

About this article

Cite this article

Xiao, K., Yu, F., Fang, H. et al. Accurate and Efficient Resolution of Overlapping Isotopic Envelopes in Protein Tandem Mass Spectra. Sci Rep 5, 14755 (2015). https://doi.org/10.1038/srep14755

Download citation

Received: 25 March 2015
Accepted: 09 September 2015
Published: 06 October 2015
DOI: https://doi.org/10.1038/srep14755

This article is cited by

Large-scale identification and visualization of human liver N-glycome enriched from LO2 cells
- Kaijie Xiao
- Yuyin Han
- Zhixin Tian
Analytical and Bioanalytical Chemistry (2018)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.