This article has been updated


Top-down proteomics, the analysis of intact proteins in their endogenous form, preserves valuable information about post-translation modifications, isoforms and proteolytic processing. The quality of top-down liquid chromatography–tandem MS (LC-MS/MS) data sets is rapidly increasing on account of advances in instrumentation and sample-processing protocols. However, top-down mass spectra are substantially more complex than conventional bottom-up data. New algorithms and software tools for confident proteoform identification and quantification are needed. Here we present Informed-Proteomics, an open-source software suite for top-down proteomics analysis that consists of an LC-MS feature-finding algorithm, a database search algorithm, and an interactive results viewer. We compare our tool with several other popular tools using human-in-mouse xenograft luminal and basal breast tumor samples that are known to have significant differences in protein abundance based on bottom-up analysis.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Change history

  • 13 June 2018

    In the version of this article initially published, the authors erroneously reported the search mode that was used for ProSightPC 3.0 in the Online Methods and in Supplementary Table 3.  The results presented in Fig. 5 were obtained with 'absolute mass' search mode, not 'biomarker discovery' search mode. The 'biomarker discovery' search mode of ProSightPC 3.0 looks for subsequences of those contained in the annotated proteoform database (e.g., truncated forms from degradation and/or cleavage). This search mode is expected to generate similar numbers of identifications as Informed-Proteomics, but is also expected to take dramatically longer (~480 CPU hours). Unfortunately, because of these heavy computational requirements, the authors were unable to complete an analysis using this search mode. They chose to use 'absolute mass' mode to illustrate the effect of search mode and database choice on the results. 'Absolute mass' mode is the most restrictive of the search modes illustrated in Fig. 5, as it searches only for proteoforms explicitly listed in the proteoform database within a user-defined mass tolerance.  In addition, in the supplementary information originally published online, Supplementary Table 3 incorrectly stated that ProSightPC v3.0 was used in 'biomarker discovery' mode. 'Absolute mass' mode was the mode actually used in this comparison. These errors have been corrected in the HTML and PDF versions of this article and in the associated supplementary information.


  1. 1.

    What does the future hold for top down mass spectrometry? J. Am. Soc. Mass Spectrom. 21, 193–202 (2010).

  2. 2.

    & Decoding protein modifications using top-down mass spectrometry. Nat. Methods 4, 817–821 (2007).

  3. 3.

    & Proteoform: a single term describing protein complexity. Nat. Methods 10, 186–187 (2013).

  4. 4.

    , , & High-throughput proteomics. Annu. Rev. Anal. Chem. (Palo Alto Calif.) 7, 427–454 (2014).

  5. 5.

    et al. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480, 254–258 (2011).

  6. 6.

    Chemistry. Mass spectrometry: bottom-up or top-down? Science 314, 65–66 (2006).

  7. 7.

    & Top-down mass spectrometry for the analysis of combinatorial post-translational modifications. Mass Spectrom. Rev. 32, 27–42 (2013).

  8. 8.

    , & Automated reduction and interpretation of high resolution electrospray mass spectra of large molecules. J. Am. Soc. Mass Spectrom. 11, 320–332 (2000).

  9. 9.

    , , , & New and automated MSn approaches for top-down identification of modified proteins. J. Am. Soc. Mass Spectrom. 16, 2027–2038 (2005).

  10. 10.

    et al. Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol. Cell. Proteomics 9, 2772–2782 (2010).

  11. 11.

    , & A new scoring function for top-down spectral deconvolution. BMC Genomics 15, 1140 (2014).

  12. 12.

    et al. ProSight PTM: an integrated environment for protein identification and characterization by top-down mass spectrometry. Nucleic Acids Res. 32, W340–W345 (2004).

  13. 13.

    et al. ProSight PTM 2.0: improved protein identification and characterization for top down mass spectrometry. Nucleic Acids Res. 35, W701–W706 (2007).

  14. 14.

    et al. Protein identification using top-down. Mol. Cell. Proteomics 11, M111.008524 (2012).

  15. 15.

    , & TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics 32, 3495–3497 (2016).

  16. 16.

    et al. pTop 1.0: a high-accuracy and high-efficiency search engine for intact protein identification. Anal. Chem. 88, 3082–3090 (2016).

  17. 17.

    et al. MASH Suite Pro: a comprehensive software tool for top-down proteomics. Mol. Cell. Proteomics 15, 703–714 (2016).

  18. 18.

    et al. MASH Suite: a user-friendly and versatile software interface for high-resolution mass spectrometry data interpretation and visualization. J. Am. Soc. Mass Spectrom. 25, 464–470 (2014).

  19. 19.

    , & Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J. Proteome Res. 7, 3354–3363 (2008).

  20. 20.

    & MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun. 5, 5277 (2014).

  21. 21.

    & Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).

  22. 22.

    , & Mutation-tolerant protein identification by mass spectrometry. J. Comput. Biol. 7, 777–787 (2000).

  23. 23.

    et al. Identification of ultramodified proteins using top-down tandem mass spectra. J. Proteome Res. 12, 5830–5838 (2013).

  24. 24.

    , , , & Interpreting top-down mass spectra using spectral alignment. Anal. Chem. 80, 2499–2505 (2008).

  25. 25.

    , , , & Identification of post-translational modifications by blind search of mass spectra. Nat. Biotechnol. 23, 1562–1567 (2005).

  26. 26.

    , , & Peptide sequence tags for fast database search in mass-spectrometry. J. Proteome Res. 4, 1287–1295 (2005).

  27. 27.

    & Options and considerations when selecting a quantitative proteomics strategy. Nat. Biotechnol. 28, 710–721 (2010).

  28. 28.

    & Quantitative analysis of the intra- and inter-individual variability of the normal urinary proteome. J. Proteome Res. 10, 637–645 (2011).

  29. 29.

    , & Mass spectrometry-based label-free quantitative proteomics. J. Biomed. Biotechnol. 2010, 840518 (2010).

  30. 30.

    , , , & Advances and challenges in liquid chromatography-mass spectrometry-based proteomics profiling for clinical applications. Mol. Cell. Proteomics 5, 1727–1744 (2006).

  31. 31.

    et al. Endocrine-therapy-resistant ESR1 variants revealed by genomic characterization of breast-cancer-derived xenografts. Cell Rep. 4, 1116–1130 (2013).

  32. 32.

    et al. Reproducibility of differential proteomic technologies in CPTAC fractionated xenografts. J. Proteome Res. 15, 691–706 (2016).

  33. 33.

    et al. Integrated bottom-up and top-down proteomics of patient-derived breast tumor xenografts. Mol. Cell. Proteomics 15, 45–56 (2016).

  34. 34.

    , & Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J. Am. Soc. Mass Spectrom. 6, 229–233 (1995).

  35. 35.

    et al. JUMP: a tag-based database search tool for peptide identification with high sensitivity and accuracy. Mol. Cell. Proteomics 13, 3663–3673 (2014).

  36. 36.

    et al. mzML--a community standard for mass spectrometry data. Mol. Cell. Proteomics 10, R110.000133 (2011).

  37. 37.

    et al. The mzIdentML data standard for mass spectrometry-based proteomics results. Mol. Cell. Proteomics 11, M111.014381 (2012).

  38. 38.

    et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).

Download references


Portions of this work were supported by the NIH National Institute of General Medical Sciences grant GM103493 (R.D.S.), the National Cancer Institute Clinical Proteomic Tumor Analysis Consortium (CPTAC) grant U24CA160019 (R.D.S.), the National Institute of Allergy and Infectious Diseases NIH/DHHS through interagency agreement Y1-A1-8401-01 (J. Adkins, PNNL), and the U.S. Department of Energy (DOE) Office of Science and Office of Biological and Environmental Research, under the Pan-omics program (R.D.S.). L.P.T., N.T., M.Z., and J.B.S. were supported as part of the “High Resolution and Mass Accuracy Capability” development project at the Environmental Molecular Science Laboratory (EMSL), a U.S. DOE national scientific user facility at Pacific Northwest National Laboratory (PNNL) in Richland, Washington. Battelle operates PNNL for the DOE under contract DE-AC05-76RLO01830.

Author information

Author notes

    • Sangtae Kim

    Present address: Illumina Inc., San Diego, California, USA.


  1. Biological Sciences Division, Pacific Northwest National Laboratory, Richland, Washington, USA.

    • Jungkap Park
    • , Paul D Piehowski
    • , Christopher Wilkins
    • , Joshua Mendoza
    • , Grant M Fujimoto
    • , Bryson C Gibbons
    • , Yufeng Shen
    • , Anil K Shukla
    • , Ronald J Moore
    • , Tao Liu
    • , Vladislav A Petyuk
    • , Richard D Smith
    • , Samuel H Payne
    •  & Sangtae Kim
  2. Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, Washington, USA.

    • Mowei Zhou
    • , Jared B Shaw
    • , Nikola Tolić
    •  & Ljiljana Paša-Tolić


  1. Search for Jungkap Park in:

  2. Search for Paul D Piehowski in:

  3. Search for Christopher Wilkins in:

  4. Search for Mowei Zhou in:

  5. Search for Joshua Mendoza in:

  6. Search for Grant M Fujimoto in:

  7. Search for Bryson C Gibbons in:

  8. Search for Jared B Shaw in:

  9. Search for Yufeng Shen in:

  10. Search for Anil K Shukla in:

  11. Search for Ronald J Moore in:

  12. Search for Tao Liu in:

  13. Search for Vladislav A Petyuk in:

  14. Search for Nikola Tolić in:

  15. Search for Ljiljana Paša-Tolić in:

  16. Search for Richard D Smith in:

  17. Search for Samuel H Payne in:

  18. Search for Sangtae Kim in:


J.P., P.D.P., S.H.P., and S.K. designed and executed the study. J.P., C.W., J.M., G.M.F., B.C.G., and S.K. implemented algorithms in software. T.L. contributed samples. P.D.P., Y.S., A.K.S., R.J.M. performed LC-MS/MS experiments. J.P., P.D.P., J.B.S., V.A.P., M.Z., T.L., and N.T. analyzed data. L.P.-T. and R.D.S. provided technical leadership and oversight. J.P., P.D.P., and S.K. contributed to writing the manuscript with input from all authors.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Samuel H Payne or Sangtae Kim.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–9 and Supplementary Tables 1–3.

  2. 2.

    Reporting Summary

    Life Sciences Reporting Summary.

  3. 3.

    Supplementary Protocol

    MSPathFinder Tutorial.

Zip files

  1. 1.

    Supplementary Software

    Informed-Proteomics software suite.

About this article

Publication history





Further reading