Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The Perseus computational platform for comprehensive analysis of (prote)omics data

Abstract

A main bottleneck in proteomics is the downstream biological analysis of highly multivariate quantitative protein abundance data generated using mass-spectrometry-based analysis. We developed the Perseus software platform (http://www.perseus-framework.org) to support biological and biomedical researchers in interpreting protein quantification, interaction and post-translational modification data. Perseus contains a comprehensive portfolio of statistical tools for high-dimensional omics data analysis covering normalization, pattern recognition, time-series analysis, cross-omics comparisons and multiple-hypothesis testing. A machine learning module supports the classification and validation of patient groups for diagnosis and prognosis, and it also detects predictive protein signatures. Central to Perseus is a user-friendly, interactive workflow environment that provides complete documentation of computational methods used in a publication. All activities in Perseus are realized as plugins, and users can extend the software by programming their own, which can be shared through a plugin store. We anticipate that Perseus's arsenal of algorithms and its intuitive usability will empower interdisciplinary analysis of complex large data sets.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The Perseus data analysis platform.
Figure 2: Post-translational modifications.
Figure 3: Interaction proteomics.
Figure 4: Time-series analysis.
Figure 5: Cross-omics data comparison by 2D annotation enrichment analysis.
Figure 6: Machine learning for clinical proteomics and biomarker discovery.

Similar content being viewed by others

References

  1. Altelaar, A.F., Munoz, J. & Heck, A.J. Next-generation proteomics: towards an integrative view of proteome dynamics. Nat. Rev. Genet. 14, 35–48 (2013).

    Article  CAS  PubMed  Google Scholar 

  2. Cox, J. & Mann, M. Quantitative, high-resolution proteomics for data-driven systems biology. Annu. Rev. Biochem. 80, 273–299 (2011).

    Article  CAS  PubMed  Google Scholar 

  3. Eng, J.K., McCormack, A.L. & Yates, J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994). This publication describes the earliest approach to correlating tandem mass spectra of peptides to theoretical fragment-ion series calculated from in silico digests of known protein sequences with the aim of identifying peptides and proteins.

    Article  CAS  PubMed  Google Scholar 

  4. Perkins, D.N., Pappin, D.J., Creasy, D.M. & Cottrell, J.S. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).

    Article  CAS  PubMed  Google Scholar 

  5. Geer, L.Y. et al. Open mass spectrometry search algorithm. J. Proteome Res. 3, 958–964 (2004).

    Article  CAS  PubMed  Google Scholar 

  6. Craig, R. & Beavis, R.C. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20, 1466–1467 (2004).

    Article  CAS  PubMed  Google Scholar 

  7. Bern, M., Cai, Y. & Goldberg, D. Lookup peaks: a hybrid of de novo sequencing and database search for protein identification by tandem mass spectrometry. Anal. Chem. 79, 1393–1400 (2007).

    Article  CAS  PubMed  Google Scholar 

  8. Craig, R., Cortens, J.P. & Beavis, R.C. Open source system for analyzing, validating, and storing protein identification data. J. Proteome Res. 3, 1234–1242 (2004).

    Article  CAS  PubMed  Google Scholar 

  9. Nesvizhskii, A.I., Vitek, O. & Aebersold, R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat. Methods 4, 787–797 (2007).

    Article  CAS  PubMed  Google Scholar 

  10. Deutsch, E.W. et al. Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. Proteomics Clin. Appl. 9, 745–754 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008). Perseus has been developed in conjunction with MaxQuant, which comprises a complete quantitative workflow for the analysis of shotgun proteomics data, including support for a large variety of experimental techniques.

    Article  CAS  PubMed  Google Scholar 

  12. Vizcaino, J.A. et al. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 41, D1063–D1069 (2013).

    Article  CAS  PubMed  Google Scholar 

  13. Vizcaíno, J.A. et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. de Godoy, L.M. et al. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. Nature 455, 1251–1254 (2008).

    Article  CAS  PubMed  Google Scholar 

  15. Hebert, A.S. et al. The one hour yeast proteome. Mol. Cell. Proteomics 13, 339–347 (2014). In this paper the authors demonstrate that the yeast proteome can be analyzed within a 1-h measurement time, recovering nearly all expressed cellular proteins.

    Article  CAS  PubMed  Google Scholar 

  16. Nagaraj, N. et al. Deep proteome and transcriptome mapping of a human cancer cell line. Mol. Syst. Biol. 7, 548 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Beck, M. et al. The quantitative proteome of a human cell line. Mol. Syst. Biol. 7, 549 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Munoz, J. et al. The quantitative proteomes of human-induced pluripotent stem cells and embryonic stem cells. Mol. Syst. Biol. 7, 550 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Mann, M., Kulak, N.A., Nagaraj, N. & Cox, J. The coming age of complete, accurate, and ubiquitous proteomes. Mol. Cell 49, 583–590 (2013).

    Article  CAS  PubMed  Google Scholar 

  20. Wísniewski, J.R., Hein, M.Y., Cox, J. & Mann, M.A. 'Proteomic ruler' for protein copy number and concentration estimation without spike-in standards. Mol. Cell. Proteomics 13, 3497–3506 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, yermed MaxLFQ. Mol. Cell. Proteomics 13, 2513–2526 (2014). Here the MaxLFQ algorithm for relative label-free protein quantification is described. It enabled many researchers to conduct large proteomics studies with complex experimental designs without the need for labeling their samples.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Geiger, T., Cox, J., Ostasiewicz, P., Wisniewski, J.R. & Mann, M. Super-SILAC mix for quantitative proteomics of human tumor tissue. Nat. Methods 7, 383–385 (2010).

    Article  CAS  PubMed  Google Scholar 

  23. Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98, 5116–5121 (2001). A pioneering method is described for the robust detection of significantly changing biomolecules in large omics data sets. It uses repeated permutations of the data to determine FDRs.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Alter, O., Brown, P.O. & Botstein, D. Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97, 10101–10106 (2000).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005). GSEA is the forerunner of many methods for analyzing molecular profiling data to determine which sets of genes or proteins are correlated with a phenotypic class distinction.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. 57, 289–300 (1995). In this seminal paper a simple yet powerful procedure is shown to control the FDR for multiple testing of many independent hypotheses.

    Google Scholar 

  27. Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Beausoleil, S.A., Villén, J., Gerber, S.A., Rush, J. & Gygi, S.P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006).

    Article  CAS  PubMed  Google Scholar 

  29. Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).

    Article  CAS  PubMed  Google Scholar 

  30. Olsen, J.V. et al. Quantitative phosphoproteomics reveals widespread full phosphorylation site occupancy during mitosis. Sci. Signal. 3, ra3 (2010).

    Article  CAS  PubMed  Google Scholar 

  31. Sharma, K. et al. Ultradeep human phosphoproteome reveals a distinct regulatory nature of Tyr and Ser/Thr-based signaling. Cell Rep. 8, 1583–1594 (2014).

    Article  CAS  PubMed  Google Scholar 

  32. Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 43, D1049–D1056 (2015).

  33. Hornbeck, P.V. et al. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 43, D512–D520 (2015).

    Article  CAS  PubMed  Google Scholar 

  34. Tyanova, S., Cox, J., Olsen, J., Mann, M. & Frishman, D. Phosphorylation variation during the cell cycle scales with structural propensities of proteins. PLoS Comput. Biol. 9, e1002842 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Hein, M.Y. et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell 163, 712–723 (2015).

    Article  CAS  PubMed  Google Scholar 

  36. Huttlin, E.L. et al. The BioPlex network: a systematic exploration of the human interactome. Cell 162, 425–440 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hubner, N.C. et al. Quantitative proteomics combined with BAC TransgeneOmics reveals in vivo protein interactions. J. Cell Biol. 189, 739–754 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Selbach, M. & Mann, M. Protein interaction screening by quantitative immunoprecipitation combined with knockdown (QUICK). Nat. Methods 3, 981–983 (2006).

    Article  CAS  PubMed  Google Scholar 

  39. Keilhauer, E.C., Hein, M.Y. & Mann, M. Accurate protein complex retrieval by affinity enrichment mass spectrometry (AE-MS) rather than affinity purification mass spectrometry (AP-MS). Mol. Cell. Proteomics 14, 120–135 (2015).

    Article  CAS  PubMed  Google Scholar 

  40. Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Räschle, M. et al. DNA repair. Proteomics reveals dynamic assembly of repair complexes during bypass of DNA cross-links. Science 348, 1253671 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Spellman, P.T. et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Biol. Cell 9, 3273–3297 (1998).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Gauthier, N.P. et al. Cyclebase.org—a comprehensive multi-organism online database of cell-cycle experiments. Nucleic Acids Res. 36, D854–D859 (2008).

    Article  CAS  PubMed  Google Scholar 

  44. Eser, P. et al. Periodic mRNA synthesis and degradation co-operate during cell cycle gene expression. Mol. Syst. Biol. 10, 717 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Partch, C.L., Green, C.B. & Takahashi, J.S. Molecular architecture of the mammalian circadian clock. Trends Cell Biol. 24, 90–99 (2014).

    Article  CAS  PubMed  Google Scholar 

  46. Robles, M.S., Cox, J. & Mann, M. In vivo quantitative proteomics reveals a key contribution of post-transcriptional mechanisms to the circadian regulation of liver metabolism. PLoS Genet. 10, e1004047 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Wang, Z., Gerstein, M. & Snyder, M. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14, R36 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  49. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    Article  CAS  PubMed  Google Scholar 

  50. Ingolia, N.T., Ghaemmaghami, S., Newman, J.R. & Weissman, J.S. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324, 218–223 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 5, 621–628 (2008).

    Article  CAS  PubMed  Google Scholar 

  52. Schwanhäusser, B. et al. Global quantification of mammalian gene expression control. Nature 473, 337–342 (2011). In this publication a large-scale quantitative analysis of transcription and translation rates is performed, introducing the iBAQ technique for estimating protein abundances from mass-spectrometry data.

    Article  CAS  PubMed  Google Scholar 

  53. Aviner, R., Shenoy, A., Elroy-Stein, O. & Geiger, T. Uncovering hidden layers of cell cycle regulation through integrative multi-omic analysis. PLoS Genet. 11, e1005554 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Yates, A. et al. Ensembl 2016. Nucleic Acids Res. 44, D710–D716 (2016).

    Article  CAS  PubMed  Google Scholar 

  55. Cox, J. & Mann, M. 1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data. BMC Bioinformatics 13, S12 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Deeb, S.J. et al. Machine learning-based classification of diffuse large B-cell lymphoma patients by their protein expression profiles. Mol. Cell. Proteomics 14, 2497–2460 (2015).

    Article  CAS  Google Scholar 

  57. Iglesias-Gato, D. et al. The proteome of primary prostate cancer. Eur. Urol. 69, 942–952 (2016).

    Article  CAS  PubMed  Google Scholar 

  58. Tyanova, S. et al. Proteomic maps of breast cancer subtypes. Nat Commun. 7, 10259 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Vapnik, V.N. The Nature of Statistical Learning Theory (Springer, 1995).

  60. Chang, C.-C. & Lin, C.-J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011).

    Article  Google Scholar 

  61. Hastie, T., Tibshirani, R. & Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Springer, 2001).

  62. Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, Article17 (2005).

    Article  PubMed  Google Scholar 

  63. Ideker, T. & Krogan, N.J. Differential network biology. Mol. Syst. Biol. 8, 565 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Creixell, P. et al. Pathway and network analysis of cancer genomes. Nat. Methods 12, 615–621 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Hoops, S. et al. COPASI–a COmplex PAthway SImulator. Bioinformatics 22, 3067–3074 (2006).

    Article  CAS  PubMed  Google Scholar 

  66. Angermann, B.R. et al. Computational modeling of cellular signaling processes embedded into dynamic spatial contexts. Nat. Methods 9, 283–289 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Cowan, A.E., Moraru, II., Schaff, J.C., Slepchenko, B.M. & Loew, L.M. Spatial modeling of cell signaling networks. Methods Cell Biol. 110, 195–221 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Croft, D. et al. The Reactome pathway knowledgebase. Nucleic Acids Res. 42, D472–D477 (2014).

    Article  CAS  PubMed  Google Scholar 

  69. Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).

    Article  CAS  PubMed  Google Scholar 

  70. Tyanova, S. et al. Visualization of LC-MS/MS proteomics data in MaxQuant. Proteomics 15, 1453–1456 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  71. Ihaka, R. & Gentleman, R. R: a language for data analysis and graphics. J. Comput. Graph. Stat. 5, 299–314 (1996).

    Google Scholar 

  72. Troyanskaya, O. et al. Missing value estimation methods for DNA microarrays. Bioinformatics 17, 520–525 (2001).

    Article  CAS  PubMed  Google Scholar 

  73. Liew, A.W., Law, N.F. & Yan, H. Missing value imputation for gene expression data: computational techniques to recover missing data from available information. Brief. Bioinform. 12, 498–513 (2011).

    Article  PubMed  Google Scholar 

  74. UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 43, D204–D212 (2015).

  75. Hosp, F. et al. A double-barrel liquid chromatography-tandem mass spectrometry (LC-MS/MS) system to quantify 96 interactomes per day. Mol. Cell. Proteomics 14, 2030–2041 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Stingele, S. et al. Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. Mol. Syst. Biol. 8, 608 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This project has received funding from the European Union′s Horizon 2020 research and innovation programme under grant agreement no. 686547 (J.C.) and from the FP7 grant agreement GA ERC-2012-SyG_318987ToPAG (J.C.).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jürgen Cox.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Schematic representation of the Workflow window in Perseus

All data matrices uploaded in the running session of Perseus and all processing steps are displayed in the order of execution. The workflow allows the users to keep track of all steps in the analysis and to navigate through data matrices and visualization components just by clicking on the respective node in the diagram. The nodes can be modified to contain description and additional information for clarity. If a data matrix node is selected, information about the number of samples and data points is displayed in the most right panel of Perseus. Moreover, if an analysis node is selected, all parameters that were used in that step can be reviewed. Each data matrix, as well as all visualization windows can be exported in publication ready formats. The workflow scheme can be conveniently saved as a pdf file and used as a documentation of all steps of the analysis.

Supplementary Figure 2 Plug-in architecture of Perseus

The current structure of Perseus relies on a data matrix type and various functions for accessing and transforming the matrix are developed. The base code implementing these operations is open source and can be downloaded from GitHub (github.com/JurgenCox/perseus-plugins). The rest of the functionality is organized in two main interfaces: ‘Processing’ and ‘Analysis’ and the resulting module are added to the software core as plug-ins. Developers wishing to extend the software can build upon the main source code and contribute the new plug-ins to our online plug-in store.

Supplementary Figure 3 Missing value imputation

Perseus offers several imputation techniques including a method that draws random values from a distribution meant to simulate expression below the detection limit. The width and the down shift of the distribution can be set to closely represent the missing population. When missing values occur randomly, a distribution similar to that of the measured data is normally used for imputation. In contrast, a frequently used assumption in proteomics experiments is that low expression proteins give rise to missing values, therefore a Gaussian distribution with a median shifted from the measured data distribution median towards low expression should result in accurate imputation of such values. The mode parameter defines the measured data distribution to be used in the calculation of the random distribution. When the samples do not differ largely in their overall distribution, the use of the complete dataset is recommended. The measured distribution is shown in blue and the imputed values in orange. (a) No down-shift and distribution width of 0.5 do not simulate low abundant missing values. (b) Down-shift of 1.8 and distribution width of 0.5 simulate the assumption of low abundant proteins giving rise to missing values. (c) Down-shift of 3.6 and width of 0.5 result in an undesirable bi-modal distribution.

Supplementary Figure 4 Density-enhanced scatterplots between proteome, transcriptome and translatome levels produced by the upload plug-in

Short read NGS data as for instance produced by the Illumina platform can be imported for further analysis in the Perseus workflow. In the example we calculate RPKM values for each gene (Ingolia N. T. et al., Science, 2009) and compare these with iBAQ values calculated by MaxQuant from proteomics data derived from yeast (Kulak N. A. et al., Nature methods, 2014).

Supplementary Figure 5 Augmented data matrix

In addition to the main data matrix, Perseus can make use of background information complementary to the expression columns. (a) Often one of the first processing steps in data analysis is filtering for a minimum number of valid values. As some statistical methods require all values to be present (e.g. PCA) data imputation may be necessary. Upon imputation a second matrix is created in the background storing information of which values were measured and which – imputed and can later be used to highlight or remove the imputed values. (b) In a more advanced filtering option, first a ‘Quality matrix’ is created, which contains additional information about each expression value in the main matrix and which is used for filtering. For example, the number of peptides used for protein quantification can be used to filter proteins, which were identified with less than 2 peptides.

Supplementary information

Supplementary Text and Figures

Supplementary figures 1–5 and Supplementary Table 1 (PDF 1033 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tyanova, S., Temu, T., Sinitcyn, P. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 13, 731–740 (2016). https://doi.org/10.1038/nmeth.3901

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3901

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research