Informed-Proteomics: open-source software package for top-down proteomics

Park, Jungkap; Piehowski, Paul D; Wilkins, Christopher; Zhou, Mowei; Mendoza, Joshua; Fujimoto, Grant M; Gibbons, Bryson C; Shaw, Jared B; Shen, Yufeng; Shukla, Anil K; Moore, Ronald J; Liu, Tao; Petyuk, Vladislav A; Tolić, Nikola; Paša-Tolić, Ljiljana; Smith, Richard D; Payne, Samuel H; Kim, Sangtae

doi:10.1038/nmeth.4388

Article
Published: 07 August 2017

Informed-Proteomics: open-source software package for top-down proteomics

Nature Methods volume 14, pages 909–914 (2017)Cite this article

7648 Accesses
110 Citations
54 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 13 June 2018

This article has been updated

Abstract

Top-down proteomics, the analysis of intact proteins in their endogenous form, preserves valuable information about post-translation modifications, isoforms and proteolytic processing. The quality of top-down liquid chromatography–tandem MS (LC-MS/MS) data sets is rapidly increasing on account of advances in instrumentation and sample-processing protocols. However, top-down mass spectra are substantially more complex than conventional bottom-up data. New algorithms and software tools for confident proteoform identification and quantification are needed. Here we present Informed-Proteomics, an open-source software suite for top-down proteomics analysis that consists of an LC-MS feature-finding algorithm, a database search algorithm, and an interactive results viewer. We compare our tool with several other popular tools using human-in-mouse xenograft luminal and basal breast tumor samples that are known to have significant differences in protein abundance based on bottom-up analysis.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: LC-MS feature finding in ProMex.**

**Figure 2: Illustration of the sequence graph for 'KRATQKTRAM'.**

**Figure 3: Consistency of LC-MS feature detection in ten technical replicate analyses of an ovarian tumor sample.**

**Figure 4: Quantitative reproducibility for LC-MS features detected in ten technical replicate LC-MS analyses.**

**Figure 5: Protein identification and characterization results for a human ovarian tumor.**

**Figure 6: Differentially expressed proteoforms in CompRef breast tumor sample.**

SPECTRUM – A MATLAB Toolbox for Proteoform Identification from Top-Down Proteomics Data

Article Open access 02 August 2019

Abdul Rehman Basharat, Kanzal Iman, … Safee Ullah Chaudhary

Best practices and benchmarks for intact protein analysis for top-down mass spectrometry

Article Open access 27 June 2019

Daniel P. Donnelly, Catherine M. Rawlins, … Jeffrey N. Agar

Simple, efficient and thorough shotgun proteomic analysis with PatternLab V

Article 11 April 2022

Marlon D. M. Santos, Diogo B. Lima, … Paulo C. Carvalho

Change history

13 June 2018
In the version of this article initially published, the authors erroneously reported the search mode that was used for ProSightPC 3.0 in the Online Methods and in Supplementary Table 3. The results presented in Fig. 5 were obtained with 'absolute mass' search mode, not 'biomarker discovery' search mode. The 'biomarker discovery' search mode of ProSightPC 3.0 looks for subsequences of those contained in the annotated proteoform database (e.g., truncated forms from degradation and/or cleavage). This search mode is expected to generate similar numbers of identifications as Informed-Proteomics, but is also expected to take dramatically longer (~480 CPU hours). Unfortunately, because of these heavy computational requirements, the authors were unable to complete an analysis using this search mode. They chose to use 'absolute mass' mode to illustrate the effect of search mode and database choice on the results. 'Absolute mass' mode is the most restrictive of the search modes illustrated in Fig. 5, as it searches only for proteoforms explicitly listed in the proteoform database within a user-defined mass tolerance. In addition, in the supplementary information originally published online, Supplementary Table 3 incorrectly stated that ProSightPC v3.0 was used in 'biomarker discovery' mode. 'Absolute mass' mode was the mode actually used in this comparison. These errors have been corrected in the HTML and PDF versions of this article and in the associated supplementary information.

References

Garcia, B.A. What does the future hold for top down mass spectrometry? J. Am. Soc. Mass Spectrom. 21, 193–202 (2010).
Article CAS Google Scholar
Siuti, N. & Kelleher, N.L. Decoding protein modifications using top-down mass spectrometry. Nat. Methods 4, 817–821 (2007).
Article CAS Google Scholar
Smith, L.M. & Kelleher, N.L. Proteoform: a single term describing protein complexity. Nat. Methods 10, 186–187 (2013).
Article CAS Google Scholar
Zhang, Z., Wu, S., Stenoien, D.L. & Paša-Tolic´, L. High-throughput proteomics. Annu. Rev. Anal. Chem. (Palo Alto Calif.) 7, 427–454 (2014).
Article CAS Google Scholar
Tran, J.C. et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480, 254–258 (2011).
Article CAS Google Scholar
Chait, B.T. Chemistry. Mass spectrometry: bottom-up or top-down? Science 314, 65–66 (2006).
Article CAS Google Scholar
Lanucara, F. & Eyers, C.E. Top-down mass spectrometry for the analysis of combinatorial post-translational modifications. Mass Spectrom. Rev. 32, 27–42 (2013).
Article CAS Google Scholar
Horn, D.M., Zubarev, R.A. & McLafferty, F.W. Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J. Am. Soc. Mass Spectrom. 11, 320–332 (2000).
Article CAS Google Scholar
Zabrouskov, V., Senko, M.W., Du, Y., Leduc, R.D. & Kelleher, N.L. New and automated MSn approaches for top-down identification of modified proteins. J. Am. Soc. Mass Spectrom. 16, 2027–2038 (2005).
Article CAS Google Scholar
Liu, X. et al. Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol. Cell. Proteomics 9, 2772–2782 (2010).
Article CAS Google Scholar
Kou, Q., Wu, S. & Liu, X. A new scoring function for top-down spectral deconvolution. BMC Genomics 15, 1140 (2014).
Article Google Scholar
LeDuc, R.D. et al. ProSight PTM: an integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res. 32, W340–W345 (2004).
Article CAS Google Scholar
Zamdborg, L. et al. ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry. Nucleic Acids Res. 35, W701–W706 (2007).
Article Google Scholar
Liu, X. et al. Protein identification using top-down. Mol. Cell. Proteomics 11, M111.008524 (2012).
Article Google Scholar
Kou, Q., Xun, L. & Liu, X. TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics 32, 3495–3497 (2016).
CAS PubMed PubMed Central Google Scholar
Sun, R.-X. et al. pTop 1.0: a high-accuracy and high-efficiency search engine for intact protein identification. Anal. Chem. 88, 3082–3090 (2016).
Article CAS Google Scholar
Cai, W. et al. MASH Suite Pro: a comprehensive software tool for top-down proteomics. Mol. Cell. Proteomics 15, 703–714 (2016).
Article CAS Google Scholar
Guner, H. et al. MASH Suite: a user-friendly and versatile software interface for high-resolution mass spectrometry data interpretation and visualization. J. Am. Soc. Mass Spectrom. 25, 464–470 (2014).
Article CAS Google Scholar
Kim, S., Gupta, N. & Pevzner, P.A. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J. Proteome Res. 7, 3354–3363 (2008).
Article CAS Google Scholar
Kim, S. & Pevzner, P.A.M.S.-G.F. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun. 5, 5277 (2014).
Article CAS Google Scholar
Elias, J.E. & Gygi, S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).
Article CAS Google Scholar
Pevzner, P.A., Dancík, V. & Tang, C.L. Mutation-tolerant protein identification by mass spectrometry. J. Comput. Biol. 7, 777–787 (2000).
Article CAS Google Scholar
Liu, X. et al. Identification of ultramodified proteins using top-down tandem mass spectra. J. Proteome Res. 12, 5830–5838 (2013).
Article CAS Google Scholar
Frank, A.M., Pesavento, J.J., Mizzen, C.A., Kelleher, N.L. & Pevzner, P.A. Interpreting top-down mass spectra using spectral alignment. Anal. Chem. 80, 2499–2505 (2008).
Article CAS Google Scholar
Tsur, D., Tanner, S., Zandi, E., Bafna, V. & Pevzner, P.A. Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol. 23, 1562–1567 (2005).
Article CAS Google Scholar
Frank, A., Tanner, S., Bafna, V. & Pevzner, P. Peptide sequence tags for fast database search in mass-spectrometry. J. Proteome Res. 4, 1287–1295 (2005).
Article CAS Google Scholar
Domon, B. & Aebersold, R. Options and considerations when selecting a quantitative proteomics strategy. Nat. Biotechnol. 28, 710–721 (2010).
Article CAS Google Scholar
Nagaraj, N. & Mann, M. Quantitative analysis of the intra- and inter-individual variability of the normal urinary proteome. J. Proteome Res. 10, 637–645 (2011).
Article CAS Google Scholar
Zhu, W., Smith, J.W. & Huang, C.-M. Mass spectrometry-based label-free quantitative proteomics. J. Biomed. Biotechnol. 2010, 840518 (2010).
PubMed Google Scholar
Qian, W.-J., Jacobs, J.M., Liu, T., Camp, D.G. II & Smith, R.D. Advances and challenges in liquid chromatography-mass spectrometry-based proteomics profiling for clinical applications. Mol. Cell. Proteomics 5, 1727–1744 (2006).
Article CAS Google Scholar
Li, S. et al. Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts. Cell Rep. 4, 1116–1130 (2013).
Article CAS Google Scholar
Tabb, D.L. et al. Reproducibility of differential proteomic technologies in CPTAC fractionated xenografts. J. Proteome Res. 15, 691–706 (2016).
Article CAS Google Scholar
Ntai, I. et al. Integrated bottom-up and top-down proteomics of patient-derived breast tumor xenografts. Mol. Cell. Proteomics 15, 45–56 (2016).
Article CAS Google Scholar
Senko, M.W., Beu, S.C. & McLaffertycor, F.W. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J. Am. Soc. Mass Spectrom. 6, 229–233 (1995).
Article CAS Google Scholar
Wang, X. et al. JUMP: a tag-based database search tool for peptide identification with high sensitivity and accuracy. Mol. Cell. Proteomics 13, 3663–3673 (2014).
Article CAS Google Scholar
Martens, L. et al. mzML--a community standard for mass spectrometry data. Mol. Cell. Proteomics 10, R110.000133 (2011).
Article Google Scholar
Jones, A.R. et al. The mzIdentML data standard for mass spectrometry-based proteomics results. Mol. Cell. Proteomics 11, M111.014381 (2012).
Article Google Scholar
Chambers, M.C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).
Article CAS Google Scholar

Download references

Acknowledgements

Portions of this work were supported by the NIH National Institute of General Medical Sciences grant GM103493 (R.D.S.), the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC) grant U24CA160019 (R.D.S.), the National Institute of Allergy and Infectious Diseases NIH/DHHS through interagency agreement Y1-A1-8401-01 (J. Adkins, PNNL), and the U.S. Department of Energy (DOE) Office of Science and Office of Biological and Environmental Research, under the Pan-omics program (R.D.S.). L.P.T., N.T., M.Z., and J.B.S. were supported as part of the “High Resolution and Mass Accuracy Capability” development project at the Environmental Molecular Science Laboratory (EMSL), a U.S. DOE national scientific user facility at Pacific Northwest National Laboratory (PNNL) in Richland, Washington. Battelle operates PNNL for the DOE under contract DE-AC05-76RLO01830.

Author information

Sangtae Kim
Present address: Illumina Inc., San Diego, California, USA

Authors and Affiliations

Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA
Jungkap Park, Paul D Piehowski, Christopher Wilkins, Joshua Mendoza, Grant M Fujimoto, Bryson C Gibbons, Yufeng Shen, Anil K Shukla, Ronald J Moore, Tao Liu, Vladislav A Petyuk, Richard D Smith, Samuel H Payne & Sangtae Kim
Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington, USA
Mowei Zhou, Jared B Shaw, Nikola Tolić & Ljiljana Paša-Tolić

Authors

Jungkap Park
View author publications
You can also search for this author in PubMed Google Scholar
Paul D Piehowski
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Wilkins
View author publications
You can also search for this author in PubMed Google Scholar
Mowei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Mendoza
View author publications
You can also search for this author in PubMed Google Scholar
Grant M Fujimoto
View author publications
You can also search for this author in PubMed Google Scholar
Bryson C Gibbons
View author publications
You can also search for this author in PubMed Google Scholar
Jared B Shaw
View author publications
You can also search for this author in PubMed Google Scholar
Yufeng Shen
View author publications
You can also search for this author in PubMed Google Scholar
Anil K Shukla
View author publications
You can also search for this author in PubMed Google Scholar
Ronald J Moore
View author publications
You can also search for this author in PubMed Google Scholar
Tao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Vladislav A Petyuk
View author publications
You can also search for this author in PubMed Google Scholar
Nikola Tolić
View author publications
You can also search for this author in PubMed Google Scholar
Ljiljana Paša-Tolić
View author publications
You can also search for this author in PubMed Google Scholar
Richard D Smith
View author publications
You can also search for this author in PubMed Google Scholar
Samuel H Payne
View author publications
You can also search for this author in PubMed Google Scholar
Sangtae Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.P., P.D.P., S.H.P., and S.K. designed and executed the study. J.P., C.W., J.M., G.M.F., B.C.G., and S.K. implemented algorithms in software. T.L. contributed samples. P.D.P., Y.S., A.K.S., R.J.M. performed LC-MS/MS experiments. J.P., P.D.P., J.B.S., V.A.P., M.Z., T.L., and N.T. analyzed data. L.P.-T. and R.D.S. provided technical leadership and oversight. J.P., P.D.P., and S.K. contributed to writing the manuscript with input from all authors.

Corresponding authors

Correspondence to Samuel H Payne or Sangtae Kim.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Top-down proteomics data analysis workflow in Informed-Proteomics.

Given a LC-MS data, ProMex detects LC-MS features, each of which represents a group of isotopomer envelopes corresponding to the same putative proteoform ion. Detected LC-MS features are fed into a database search tool, MSPathFinder to characterize proteoforms from MS/MS spectra. In LcMsSpectator, users can visualize top-down proteomics data and further refine analysis results reported by ProMex and MSPathFinder. LC-MS features or identified proteoforms from multiple datasets can be aligned and grouped to find differentially expressed features or proteoforms.

Supplementary Figure 2 Averaged Pearson correlation coefficients between observed and theoretical isotopomer envelopes of LC-MS features in Shewanella oneidensis samples.

Due to ions being dispersed widely across LC time, various charge states and isotope species, single isotope envelopes typically have a poor shape and correlation to expected profiles. Aggregating isotopomer envelopes across adjacent charge states and elution times increases the similarity to expected profiles. This improvement is more significant in high molecular weight proteins (e.g. > 15,000 Da).

Supplementary Figure 3 Sequence tag based search for multiply cleaved protein sequences.

Once a protein sequence matches to a sequence tag, two sequence graphs originating from the both ends of the sequence tag are generated and explored toward opposite directions. The flanking mass of sequence tags and the mass of LC-MS features are used to constrain candidate proteoforms to be searched.

Supplementary Figure 4 Examples of data plots in LcMsSpectator.

(a) Matched fragment ion peaks in MS2 spectrum view (left), and precursor ion peaks in previous and next MS1 spectra (right), (b) Peak error heat map for the fragment ions on the MS2 spectrum plot, and (c) Extracted Ion Chromatogram (XIC) view showing neighboring charge states of the precursor ion and the total area of all XIC points displayed.

Supplementary Figure 5 Example of PTM refining from acetylation to tri-methylation at K36 in histone H3 protein.

Peak error heat maps for the fragment ions. The degree of mass errors for fragment ions is represented by the color of grid cells. High mass errors are represented by either red or green cells as indicated by a color bar on right side. (a) Initially identified proteoform having acetylation at 27th Lys residue, and (b) Refined proteoform after changing the acetylation to tri-methylation. LcMsSpectator allows users to visually check and refine MSPathFinder results.

Supplementary Figure 6 LC-MS feature-based analysis on CompRef Sample.

(a) PCA plot and (b) volcano plot. 3604 features are up-regulated in P32(WHIM2) while 3696 features are up-regulated in P33(WHIM16) at 1% FDR and fold change > 2.

Supplementary Figure 7 Comparison of top-down data analysis pipeline.

(a) Differentially expressed proteoforms reported in Ntail et al., Mol Cell Proteomics (2016), and (b) those found by Informed-Proteomics software suite. The same dataset created for Study 3 in the article were used. Informed-Proteomics analysis pipeline found 2.7 and 2.4 times more differentially expressed proteoforms and proteins, respectively.

Supplementary Figure 8 Bayesian network modelling LC-MS features to determine the probability of observing aggregated isotopomer envelopes E_i given mass M .

The E_i is specified by four parameters: ratio abundance (A_i), isotopomer envelope similarity score (S_i), normalized intensity (I_i), and elution profile score (X_i). These four parameters are assumed as independent of each other at the given mass (M) and charge state (C_i).

Supplementary Figure 9 Multiple interpretations of an observed isotopomer envelope.

(a) a cluster of observed peaks, and (b-e) different theoretical isotopomer envelopes matched to the envelope in (a). Besides the true match in (b), theoretical envelopes involving ±1 Da monoisotopic mass errors in (c) or different monoisotopic masses with multiples of true charge in (d) or (e) have good matches with the observed peaks.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, J., Piehowski, P., Wilkins, C. et al. Informed-Proteomics: open-source software package for top-down proteomics. Nat Methods 14, 909–914 (2017). https://doi.org/10.1038/nmeth.4388

Download citation

Received: 25 October 2016
Accepted: 21 June 2017
Published: 07 August 2017
Issue Date: 01 September 2017
DOI: https://doi.org/10.1038/nmeth.4388

This article is cited by

Online protein unfolding characterized by ion mobility electron capture dissociation mass spectrometry: cytochrome C from neutral and acidic solutions
- Rebecca L. Cain
- Ian K. Webb
Analytical and Bioanalytical Chemistry (2023)
Proteogenomics 101: a primer on database search strategies
- Anurag Raj
- Suruchi Aggarwal
- Debasis Dash
Journal of Proteins and Proteomics (2023)
O-Pair Search with MetaMorpheus for O-glycopeptide characterization
- Lei Lu
- Nicholas M. Riley
- Lloyd M. Smith
Nature Methods (2020)
Getting more for less: new software solutions for glycoproteomics
- Jeremy L. Praissman
- Lance Wells
Nature Methods (2020)