Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Informed-Proteomics: open-source software package for top-down proteomics

An Author Correction to this article was published on 13 June 2018

This article has been updated

Abstract

Top-down proteomics, the analysis of intact proteins in their endogenous form, preserves valuable information about post-translation modifications, isoforms and proteolytic processing. The quality of top-down liquid chromatography–tandem MS (LC-MS/MS) data sets is rapidly increasing on account of advances in instrumentation and sample-processing protocols. However, top-down mass spectra are substantially more complex than conventional bottom-up data. New algorithms and software tools for confident proteoform identification and quantification are needed. Here we present Informed-Proteomics, an open-source software suite for top-down proteomics analysis that consists of an LC-MS feature-finding algorithm, a database search algorithm, and an interactive results viewer. We compare our tool with several other popular tools using human-in-mouse xenograft luminal and basal breast tumor samples that are known to have significant differences in protein abundance based on bottom-up analysis.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: LC-MS feature finding in ProMex.
Figure 2: Illustration of the sequence graph for 'KRATQKTRAM'.
Figure 3: Consistency of LC-MS feature detection in ten technical replicate analyses of an ovarian tumor sample.
Figure 4: Quantitative reproducibility for LC-MS features detected in ten technical replicate LC-MS analyses.
Figure 5: Protein identification and characterization results for a human ovarian tumor.
Figure 6: Differentially expressed proteoforms in CompRef breast tumor sample.

Similar content being viewed by others

Change history

  • 13 June 2018

    In the version of this article initially published, the authors erroneously reported the search mode that was used for ProSightPC 3.0 in the Online Methods and in Supplementary Table 3. The results presented in Fig. 5 were obtained with 'absolute mass' search mode, not 'biomarker discovery' search mode. The 'biomarker discovery' search mode of ProSightPC 3.0 looks for subsequences of those contained in the annotated proteoform database (e.g., truncated forms from degradation and/or cleavage). This search mode is expected to generate similar numbers of identifications as Informed-Proteomics, but is also expected to take dramatically longer (~480 CPU hours). Unfortunately, because of these heavy computational requirements, the authors were unable to complete an analysis using this search mode. They chose to use 'absolute mass' mode to illustrate the effect of search mode and database choice on the results. 'Absolute mass' mode is the most restrictive of the search modes illustrated in Fig. 5, as it searches only for proteoforms explicitly listed in the proteoform database within a user-defined mass tolerance. In addition, in the supplementary information originally published online, Supplementary Table 3 incorrectly stated that ProSightPC v3.0 was used in 'biomarker discovery' mode. 'Absolute mass' mode was the mode actually used in this comparison. These errors have been corrected in the HTML and PDF versions of this article and in the associated supplementary information.

References

  1. Garcia, B.A. What does the future hold for top down mass spectrometry? J. Am. Soc. Mass Spectrom. 21, 193–202 (2010).

    Article  CAS  Google Scholar 

  2. Siuti, N. & Kelleher, N.L. Decoding protein modifications using top-down mass spectrometry. Nat. Methods 4, 817–821 (2007).

    Article  CAS  Google Scholar 

  3. Smith, L.M. & Kelleher, N.L. Proteoform: a single term describing protein complexity. Nat. Methods 10, 186–187 (2013).

    Article  CAS  Google Scholar 

  4. Zhang, Z., Wu, S., Stenoien, D.L. & Paša-Tolic´, L. High-throughput proteomics. Annu. Rev. Anal. Chem. (Palo Alto Calif.) 7, 427–454 (2014).

    Article  CAS  Google Scholar 

  5. Tran, J.C. et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480, 254–258 (2011).

    Article  CAS  Google Scholar 

  6. Chait, B.T. Chemistry. Mass spectrometry: bottom-up or top-down? Science 314, 65–66 (2006).

    Article  CAS  Google Scholar 

  7. Lanucara, F. & Eyers, C.E. Top-down mass spectrometry for the analysis of combinatorial post-translational modifications. Mass Spectrom. Rev. 32, 27–42 (2013).

    Article  CAS  Google Scholar 

  8. Horn, D.M., Zubarev, R.A. & McLafferty, F.W. Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J. Am. Soc. Mass Spectrom. 11, 320–332 (2000).

    Article  CAS  Google Scholar 

  9. Zabrouskov, V., Senko, M.W., Du, Y., Leduc, R.D. & Kelleher, N.L. New and automated MSn approaches for top-down identification of modified proteins. J. Am. Soc. Mass Spectrom. 16, 2027–2038 (2005).

    Article  CAS  Google Scholar 

  10. Liu, X. et al. Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol. Cell. Proteomics 9, 2772–2782 (2010).

    Article  CAS  Google Scholar 

  11. Kou, Q., Wu, S. & Liu, X. A new scoring function for top-down spectral deconvolution. BMC Genomics 15, 1140 (2014).

    Article  Google Scholar 

  12. LeDuc, R.D. et al. ProSight PTM: an integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res. 32, W340–W345 (2004).

    Article  CAS  Google Scholar 

  13. Zamdborg, L. et al. ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry. Nucleic Acids Res. 35, W701–W706 (2007).

    Article  Google Scholar 

  14. Liu, X. et al. Protein identification using top-down. Mol. Cell. Proteomics 11, M111.008524 (2012).

    Article  Google Scholar 

  15. Kou, Q., Xun, L. & Liu, X. TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics 32, 3495–3497 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Sun, R.-X. et al. pTop 1.0: a high-accuracy and high-efficiency search engine for intact protein identification. Anal. Chem. 88, 3082–3090 (2016).

    Article  CAS  Google Scholar 

  17. Cai, W. et al. MASH Suite Pro: a comprehensive software tool for top-down proteomics. Mol. Cell. Proteomics 15, 703–714 (2016).

    Article  CAS  Google Scholar 

  18. Guner, H. et al. MASH Suite: a user-friendly and versatile software interface for high-resolution mass spectrometry data interpretation and visualization. J. Am. Soc. Mass Spectrom. 25, 464–470 (2014).

    Article  CAS  Google Scholar 

  19. Kim, S., Gupta, N. & Pevzner, P.A. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J. Proteome Res. 7, 3354–3363 (2008).

    Article  CAS  Google Scholar 

  20. Kim, S. & Pevzner, P.A.M.S.-G.F. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun. 5, 5277 (2014).

    Article  CAS  Google Scholar 

  21. Elias, J.E. & Gygi, S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).

    Article  CAS  Google Scholar 

  22. Pevzner, P.A., Dancík, V. & Tang, C.L. Mutation-tolerant protein identification by mass spectrometry. J. Comput. Biol. 7, 777–787 (2000).

    Article  CAS  Google Scholar 

  23. Liu, X. et al. Identification of ultramodified proteins using top-down tandem mass spectra. J. Proteome Res. 12, 5830–5838 (2013).

    Article  CAS  Google Scholar 

  24. Frank, A.M., Pesavento, J.J., Mizzen, C.A., Kelleher, N.L. & Pevzner, P.A. Interpreting top-down mass spectra using spectral alignment. Anal. Chem. 80, 2499–2505 (2008).

    Article  CAS  Google Scholar 

  25. Tsur, D., Tanner, S., Zandi, E., Bafna, V. & Pevzner, P.A. Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol. 23, 1562–1567 (2005).

    Article  CAS  Google Scholar 

  26. Frank, A., Tanner, S., Bafna, V. & Pevzner, P. Peptide sequence tags for fast database search in mass-spectrometry. J. Proteome Res. 4, 1287–1295 (2005).

    Article  CAS  Google Scholar 

  27. Domon, B. & Aebersold, R. Options and considerations when selecting a quantitative proteomics strategy. Nat. Biotechnol. 28, 710–721 (2010).

    Article  CAS  Google Scholar 

  28. Nagaraj, N. & Mann, M. Quantitative analysis of the intra- and inter-individual variability of the normal urinary proteome. J. Proteome Res. 10, 637–645 (2011).

    Article  CAS  Google Scholar 

  29. Zhu, W., Smith, J.W. & Huang, C.-M. Mass spectrometry-based label-free quantitative proteomics. J. Biomed. Biotechnol. 2010, 840518 (2010).

    PubMed  Google Scholar 

  30. Qian, W.-J., Jacobs, J.M., Liu, T., Camp, D.G. II & Smith, R.D. Advances and challenges in liquid chromatography-mass spectrometry-based proteomics profiling for clinical applications. Mol. Cell. Proteomics 5, 1727–1744 (2006).

    Article  CAS  Google Scholar 

  31. Li, S. et al. Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts. Cell Rep. 4, 1116–1130 (2013).

    Article  CAS  Google Scholar 

  32. Tabb, D.L. et al. Reproducibility of differential proteomic technologies in CPTAC fractionated xenografts. J. Proteome Res. 15, 691–706 (2016).

    Article  CAS  Google Scholar 

  33. Ntai, I. et al. Integrated bottom-up and top-down proteomics of patient-derived breast tumor xenografts. Mol. Cell. Proteomics 15, 45–56 (2016).

    Article  CAS  Google Scholar 

  34. Senko, M.W., Beu, S.C. & McLaffertycor, F.W. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J. Am. Soc. Mass Spectrom. 6, 229–233 (1995).

    Article  CAS  Google Scholar 

  35. Wang, X. et al. JUMP: a tag-based database search tool for peptide identification with high sensitivity and accuracy. Mol. Cell. Proteomics 13, 3663–3673 (2014).

    Article  CAS  Google Scholar 

  36. Martens, L. et al. mzML--a community standard for mass spectrometry data. Mol. Cell. Proteomics 10, R110.000133 (2011).

    Article  Google Scholar 

  37. Jones, A.R. et al. The mzIdentML data standard for mass spectrometry-based proteomics results. Mol. Cell. Proteomics 11, M111.014381 (2012).

    Article  Google Scholar 

  38. Chambers, M.C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

Portions of this work were supported by the NIH National Institute of General Medical Sciences grant GM103493 (R.D.S.), the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC) grant U24CA160019 (R.D.S.), the National Institute of Allergy and Infectious Diseases NIH/DHHS through interagency agreement Y1-A1-8401-01 (J. Adkins, PNNL), and the U.S. Department of Energy (DOE) Office of Science and Office of Biological and Environmental Research, under the Pan-omics program (R.D.S.). L.P.T., N.T., M.Z., and J.B.S. were supported as part of the “High Resolution and Mass Accuracy Capability” development project at the Environmental Molecular Science Laboratory (EMSL), a U.S. DOE national scientific user facility at Pacific Northwest National Laboratory (PNNL) in Richland, Washington. Battelle operates PNNL for the DOE under contract DE-AC05-76RLO01830.

Author information

Authors and Affiliations

Authors

Contributions

J.P., P.D.P., S.H.P., and S.K. designed and executed the study. J.P., C.W., J.M., G.M.F., B.C.G., and S.K. implemented algorithms in software. T.L. contributed samples. P.D.P., Y.S., A.K.S., R.J.M. performed LC-MS/MS experiments. J.P., P.D.P., J.B.S., V.A.P., M.Z., T.L., and N.T. analyzed data. L.P.-T. and R.D.S. provided technical leadership and oversight. J.P., P.D.P., and S.K. contributed to writing the manuscript with input from all authors.

Corresponding authors

Correspondence to Samuel H Payne or Sangtae Kim.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Top-down proteomics data analysis workflow in Informed-Proteomics.

Given a LC-MS data, ProMex detects LC-MS features, each of which represents a group of isotopomer envelopes corresponding to the same putative proteoform ion. Detected LC-MS features are fed into a database search tool, MSPathFinder to characterize proteoforms from MS/MS spectra. In LcMsSpectator, users can visualize top-down proteomics data and further refine analysis results reported by ProMex and MSPathFinder. LC-MS features or identified proteoforms from multiple datasets can be aligned and grouped to find differentially expressed features or proteoforms.

Supplementary Figure 2 Averaged Pearson correlation coefficients between observed and theoretical isotopomer envelopes of LC-MS features in Shewanella oneidensis samples.

Due to ions being dispersed widely across LC time, various charge states and isotope species, single isotope envelopes typically have a poor shape and correlation to expected profiles. Aggregating isotopomer envelopes across adjacent charge states and elution times increases the similarity to expected profiles. This improvement is more significant in high molecular weight proteins (e.g. > 15,000 Da).

Supplementary Figure 3 Sequence tag based search for multiply cleaved protein sequences.

Once a protein sequence matches to a sequence tag, two sequence graphs originating from the both ends of the sequence tag are generated and explored toward opposite directions. The flanking mass of sequence tags and the mass of LC-MS features are used to constrain candidate proteoforms to be searched.

Supplementary Figure 4 Examples of data plots in LcMsSpectator.

(a) Matched fragment ion peaks in MS2 spectrum view (left), and precursor ion peaks in previous and next MS1 spectra (right), (b) Peak error heat map for the fragment ions on the MS2 spectrum plot, and (c) Extracted Ion Chromatogram (XIC) view showing neighboring charge states of the precursor ion and the total area of all XIC points displayed.

Supplementary Figure 5 Example of PTM refining from acetylation to tri-methylation at K36 in histone H3 protein.

Peak error heat maps for the fragment ions. The degree of mass errors for fragment ions is represented by the color of grid cells. High mass errors are represented by either red or green cells as indicated by a color bar on right side. (a) Initially identified proteoform having acetylation at 27th Lys residue, and (b) Refined proteoform after changing the acetylation to tri-methylation. LcMsSpectator allows users to visually check and refine MSPathFinder results.

Supplementary Figure 6 LC-MS feature-based analysis on CompRef Sample.

(a) PCA plot and (b) volcano plot. 3604 features are up-regulated in P32(WHIM2) while 3696 features are up-regulated in P33(WHIM16) at 1% FDR and fold change > 2.

Supplementary Figure 7 Comparison of top-down data analysis pipeline.

(a) Differentially expressed proteoforms reported in Ntail et al., Mol Cell Proteomics (2016), and (b) those found by Informed-Proteomics software suite. The same dataset created for Study 3 in the article were used. Informed-Proteomics analysis pipeline found 2.7 and 2.4 times more differentially expressed proteoforms and proteins, respectively.

Supplementary Figure 8 Bayesian network modelling LC-MS features to determine the probability of observing aggregated isotopomer envelopes Ei given mass M .

The Ei is specified by four parameters: ratio abundance (Ai), isotopomer envelope similarity score (Si), normalized intensity (Ii), and elution profile score (Xi). These four parameters are assumed as independent of each other at the given mass (M) and charge state (Ci).

Supplementary Figure 9 Multiple interpretations of an observed isotopomer envelope.

(a) a cluster of observed peaks, and (b-e) different theoretical isotopomer envelopes matched to the envelope in (a). Besides the true match in (b), theoretical envelopes involving ±1 Da monoisotopic mass errors in (c) or different monoisotopic masses with multiples of true charge in (d) or (e) have good matches with the observed peaks.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9 and Supplementary Tables 1–3. (PDF 1529 kb)

Reporting Summary

Life Sciences Reporting Summary. (PDF 158 kb)

Supplementary Protocol

MSPathFinder Tutorial. (PDF 408 kb)

Supplementary Software

Informed-Proteomics software suite. (ZIP 8537 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, J., Piehowski, P., Wilkins, C. et al. Informed-Proteomics: open-source software package for top-down proteomics. Nat Methods 14, 909–914 (2017). https://doi.org/10.1038/nmeth.4388

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.4388

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer