Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

The proteome landscape of the kingdoms of life


Proteins carry out the vast majority of functions in all biological domains, but for technological reasons their large-scale investigation has lagged behind the study of genomes. Since the first essentially complete eukaryotic proteome was reported1, advances in mass-spectrometry-based proteomics2 have enabled increasingly comprehensive identification and quantification of the human proteome3,4,5,6. However, there have been few comparisons across species7,8, in stark contrast with genomics initiatives9. Here we use an advanced proteomics workflow—in which the peptide separation step is performed by a microstructured and extremely reproducible chromatographic system—for the in-depth study of 100 taxonomically diverse organisms. With two million peptide and 340,000 stringent protein identifications obtained in a standardized manner, we double the number of proteins with solid experimental evidence known to the scientific community. The data also provide a large-scale case study for sequence-based machine learning, as we demonstrate by experimentally confirming the predicted properties of peptides from Bacteroides uniformis. Our results offer a comparative view of the functional organization of organisms across the entire evolutionary range. A remarkably high fraction of the total proteome mass in all kingdoms is dedicated to protein homeostasis and folding, highlighting the biological challenge of maintaining protein structure in all branches of life. Likewise, a universally high fraction is involved in supplying energy resources, although these pathways range from photosynthesis through iron sulfur metabolism to carbohydrate metabolism. Generally, however, proteins and proteomes are remarkably diverse between organisms, and they can readily be explored and functionally compared at

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Collection of organism samples across the tree of life, and integration of the proteomic workflow.
Fig. 2: Application of a deep learning model to predict peptide retention times for liquid chromatography with tandem mass spectrometry (LC-MS/MS) measurements.
Fig. 3: Organism-resolved integration of proteome data into a global analysis.
Fig. 4: Global view of the expression levels of functional groups across the 100 organisms.

Data availability

The MS-based proteomics data have been deposited in the ProteomeXchange Consortium via the PRIDE partner repository and are available via ProteomeXchange with identifier PXD014877 and PXD019483.

Code availability

Custom computer code is available at


  1. 1.

    de Godoy, L. M. F. et al. Comprehensive mass-spectrometry-based proteome quantification of haploid versus diploid yeast. Nature 455, 1251–1254 (2008).

    ADS  Article  Google Scholar 

  2. 2.

    Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016).

    ADS  CAS  Article  Google Scholar 

  3. 3.

    Nagaraj, N. et al. System-wide perturbation analysis with nearly complete coverage of the yeast proteome by single-shot ultra HPLC runs on a bench top Orbitrap. Mol. Cell. Proteomics 11, M111.013722 (2012).

    Article  Google Scholar 

  4. 4.

    Kim, M.-S. et al. A draft map of the human proteome. Nature 509, 575–581 (2014).

    ADS  CAS  Article  Google Scholar 

  5. 5.

    Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).

    ADS  CAS  Article  Google Scholar 

  6. 6.

    Bekker-Jensen, D. B. et al. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst. 4, 587–599 (2017).

    CAS  Article  Google Scholar 

  7. 7.

    Weiss, M., Schrimpf, S., Hengartner, M. O., Lercher, M. J. & von Mering, C. Shotgun proteomics data from multiple organisms reveals remarkable quantitative conservation of the eukaryotic core proteome. Proteomics 10, 1297–1306 (2010).

    CAS  Article  Google Scholar 

  8. 8.

    Marx, H. et al. A proteomic atlas of the legume Medicago truncatula and its nitrogen-fixing endosymbiont Sinorhizobium meliloti. Nat. Biotechnol. 34, 1198–1205 (2016).

    CAS  Article  Google Scholar 

  9. 9.

    Shendure, J. et al. DNA sequencing at 40: past, present and future. Nature 550, 345–353 (2017); correction Nature 568, E11 (2019).

    ADS  CAS  Article  Google Scholar 

  10. 10.

    Kulak, N. A., Pichler, G., Paron, I., Nagaraj, N. & Mann, M. Minimal, encapsulated proteomic-sample processing applied to copy-number estimation in eukaryotic cells. Nat. Methods 11, 319–324 (2014).

    CAS  Article  Google Scholar 

  11. 11.

    Geyer, P. E. et al. Plasma proteome profiling to assess human health and disease. Cell Syst. 2, 185–195 (2016).

    CAS  Article  Google Scholar 

  12. 12.

    De Beeck, J. O. et al. Digging deeper into the human proteome: a novel nanoflow LCMS setup using micro pillar array columns (μPACTM). Preprint at bioRxiv (2018).

  13. 13.

    Kulak, N. A., Geyer, P. E. & Mann, M. Loss-less nano-fractionator for high sensitivity, high coverage proteomics. Mol. Cell. Proteomics 16, 694–705 (2017).

    CAS  Article  Google Scholar 

  14. 14.

    Zhou, X.-X. et al. pDeep: predicting MS/MS spectra of peptides with deep learning. Anal. Chem. 89, 12690–12697 (2017).

    CAS  Article  Google Scholar 

  15. 15.

    Tiwary, S. et al. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nat. Methods 16, 519–525 (2019).

    CAS  Article  Google Scholar 

  16. 16.

    Gessulat, S. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 16, 509–518 (2019).

    CAS  Article  Google Scholar 

  17. 17.

    UniProt Consortium. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47 (D1), D506–D515 (2019).

    Article  Google Scholar 

  18. 18.

    Muñoz, J. & Heck, A. J. R. From the human genome to the human proteome. Angew. Chem. Int. Edn 53, 10864–10866 (2014).

    Article  Google Scholar 

  19. 19.

    Cox, J. et al. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol. Cell. Proteomics 13, 2513–2526 (2014).

    CAS  Article  Google Scholar 

  20. 20.

    Altenhoff, A. M. et al. Standardized benchmarking in the quest for orthologs. Nat. Methods 13, 425–430 (2016).

    CAS  Article  Google Scholar 

  21. 21.

    Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47 (D1), D309–D314 (2019).

    CAS  Article  Google Scholar 

  22. 22.

    The Gene Ontology Consortium. The Gene Ontology Resource: 20 years and still GOing strong. Nucleic Acids Res. 47 (D1), D330–D338 (2019).

    Article  Google Scholar 

  23. 23.

    Geer, L. Y. et al. The NCBI BioSystems database. Nucleic Acids Res. 38, D492–D496 (2010).

    CAS  Article  Google Scholar 

  24. 24.

    El-Gebali, S. et al. The Pfam protein families database in 2019. Nucleic Acids Res. 47 (D1), D427–D432 (2019).

    CAS  Article  Google Scholar 

  25. 25.

    Santos, A. et al. Clinical knowledge graph integrates proteomics data into clinical decision-making. Preprint at bioRxiv (2020).

  26. 26.

    Cox, J. & Mann, M. 1D and 2D annotation enrichment: a statistical method integrating quantitative proteomics with complementary high-throughput data. BMC Bioinformatics 13 (Suppl 16), S12 (2012).

    CAS  Article  Google Scholar 

  27. 27.

    Chi, H. et al. Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat. Biotechnol. 36, 1059–1061 (2018).

    CAS  Article  Google Scholar 

  28. 28.

    Zielinska, D. F., Gnad, F., Schropp, K., Wiśniewski, J. R. & Mann, M. Mapping N-glycosylation sites across seven evolutionarily distant species reveals a divergent substrate proteome despite a common core machinery. Mol. Cell 46, 542–548 (2012).

    CAS  Article  Google Scholar 

  29. 29.

    Wiśniewski, J. R., Wegler, C. & Artursson, P. Multiple-enzyme-digestion strategy improves accuracy and sensitivity of label- and standard-free absolute quantification to a level that is achievable by analysis with stable isotope-labeled standard spiking. J. Proteome Res. 18, 217–224 (2019).

    PubMed  Google Scholar 

  30. 30.

    Kelstrup, C. D. et al. Performance evaluation of the Q Exactive HF-X for shotgun proteomics. J. Proteome Res. 17, 727–738 (2018).

    CAS  Article  Google Scholar 

  31. 31.

    Scheltema, R. A. & Mann, M. SprayQc: a real-time LC-MS/MS quality monitoring system to maximize uptime using off the shelf components. J. Proteome Res. 11, 3458–3466 (2012).

    CAS  Article  Google Scholar 

  32. 32.

    Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).

    CAS  Article  Google Scholar 

  33. 33.

    Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).

    ADS  CAS  Article  Google Scholar 

  34. 34.

    Tyanova, S. et al. The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat. Methods 13, 731–740 (2016).

    CAS  Article  Google Scholar 

  35. 35.

    Wichmann, C. et al. MaxQuant.Live enables global targeting of more than 25,000 peptides. Mol. Cell. Proteomics 18, 982–994 (2019).

    CAS  Article  Google Scholar 

  36. 36.

    Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47 (D1), D442–D450 (2019).

    CAS  Article  Google Scholar 

  37. 37.

    Perkel, J. M. Why Jupyter is data scientists’ computational notebook of choice. Nature 563, 145–146 (2018).

    ADS  CAS  Article  Google Scholar 

Download references


We thank all members of the Proteomics and Signal Transduction Group and the Clinical Proteomics Group at the Max Planck Institute of Biochemistry, Martinsried, for help and discussions, and in particular I. Paron, C. Deiml, A. Strasser and B. Splettstoesser for technical assistance. We further thank the P. Bork group for supplying bacteria, the A. Pichlmair group for virus samples, F. Hosp for A. thaliana, I. Sinning for Neurospora crassa and the K.-P. Janssen group for cell line samples. Our work was partially supported by the Max Planck Society for the Advancement of Science, by the European Union’s Horizon 2020 research and innovation program with the Microb-Predict project (grant 825694), by grants from the Novo Nordisk Foundation (NNF15CC0001 and NNF15OC0016692), and by the Deutsche Forschungsgemeinschaft (DFG) project ‘Chemical proteomics inside us’ (grant 412136960).

Author information




J.B.M. and P.E.G. designed the experiments, performed and interpreted the MS-based proteomic analyses, carried out bioinformatics analyses and generated text and figures for the manuscript. P.V.T., S.D., S.V.W. and J.M.B. designed experiments and performed MS-based proteomics analyses. A.R.C. and A.S. integrated annotation data with proteomics data and implemented the Python code as well as graph-based structures. A.S. and M.O. implemented the web-accessible analyses. N.K., F.T. and M.T.S. carried out the machine learning analysis. M.M. supervised and guided the project, designed the experiments, interpreted MS-based proteomics data and wrote the manuscript.

Corresponding author

Correspondence to Matthias Mann.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature thanks Joshua Coon, Vera van Noort and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Comparison of the peptide retention times obtained by a μPAC and a fused silica capillary column.

a, The histograms illustrate the distribution of coefficients of variation (CVs) calculated from peptide retention times obtained by a μPAC and a fused silica capillary column. The CVs were calculated for peptides from 12 measurements of a HeLa cell digest on each column. b, All components, including lines, connectors, the column and the emitter, are displayed together with grounding and spray voltage connections. The pico tip emitter is from New Objective (catalogue number FS360-20-10-N-5-105CT).

Extended Data Fig. 2 Interlaboratory reproducibility and prediction of peptide retention time on the μPAC column.

a, The ability to produce chip-based columns in a reproducible manner, coupled with the statically fixed micrometre-sized pillars, results in highly reproducible performance and interlaboratory transferability of the μPAC-based approach. Shown are the corrected retention times of an excerpt of 5,000 peptides from the 43,000 overlapping peptides measured in two different HeLa cell digests by our Munich and Copenhagen laboratories, resulting in a Pearson correlation coefficient of peptide retention times of 0.995. b, To validate our model for predicting peptide retention times, we plot an excerpt of 1,000 peptides from the complete test-set of 54,490 peptides, with experimentally determined values on the x axis and predicted values on the y axis. The Pearson’s R2 correlation value for the complete predicted peptide set is 0.99.

Extended Data Fig. 3 Total numbers of identified peptides from 100 organisms across the tree of life.

The peptides uniquely identified for a certain organism are colour-coded from peptides identified in multiple species. Orange, archaea; blue, eukaryotes; green, bacteria.

Extended Data Fig. 4 Comparison and characterization of the LSTM model for predicting peptide retention times.

a, Box plots comparing R2 scores obtained from different models of peptide retention time, calculated from the linear regressions of correlations between the predicted test set to the measured peptide retention times. Sample sizes are shown in b. b, Table comparing the different models of peptide retention time. The training set was reduced in size (number of peptides included) in order to account for the exponentially growing calculation time of certain models. Statistics represent the linear regression of correlation from the predicted test set retention times to the measured retention times. c, Characterization of the LSTM model applied here for different sizes of training peptide set.

Extended Data Fig. 5 Overview of our data set of 100 organisms across the tree of life.

a, Illustration of all direct taxonomic levels below the superkingdom level that are covered by our data set. DPANN, Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota and Nanohaloarchaeota; FCB, Fibrobacteres, Chlorobi and Bacteroidetes; PCV, Planctomycetes, Chlamydiae and Verrucomicrobia; TACK, Thaumarchaeota, Crenarchaeota and Korarchaeota. b, Number of protein identification codes (IDs) in this study and their relation to TrEMBL IDs found in the PRIDE archive. c, Comparison of the Swiss-Prot database to the data set in this study with regards to organism and protein numbers. d, Numbers of identified protein groups and UniProt protein entries for all 100 organisms in our data set. The UniProt protein-entry identifications are colour-coded into Swiss-Prot (reviewed) and TrEMBL (predicted) entries.

Extended Data Fig. 6 Dynamic range curves for all organisms analysed here.

Protein intensities are log10-scaled and plotted against the abundance rank of each protein.

Extended Data Fig. 7 Cumulative protein intensities for all organisms analysed here.

On the x axis, proteins are ranked according to their abundance; the y axis shows the cumulative protein intensity. Proteins missing biological-process annotation are highlighted by grey lines in the background.

Extended Data Fig. 8 Quantitative analysis of different enzyme classes and functional protein domains across the tree of life.

a, We classified the contribution of peptides to the top 90% of protein mass within all 100 organisms according to the enzyme commission (EC) number, using the Unipept web-tool ( The alluvial plot illustrates the proportions of each enzyme class across all organisms in our study. b, Comparison of the three domains of life with respect to their normalized contribution of peptides to each enzyme class. c, Proteins that contribute to the top 90% of the protein mass within all 100 organisms studied herein were annotated according to their known functional protein domains, and the intensities for different functional domains of an organism were summed to display the most abundant functional protein domains across the tree of life. The intensity is displayed on a log10 scale.

Extended Data Fig. 9 Quantitative analysis of specific biological processes across the tree of life.

a, Linear display showing a global view of the expression levels of functional groups across the 100 organisms from Fig. 4. Summed intensities for functional terms are shown as grey lines, with the top ten most abundant terms in all organisms colour-coded according to the top key. b, Quantitative analysis of specific biological processes from the superkingdom of eukaryotes. Proteins were annotated with biological processes, and the intensities for each annotation term within an organisms were summed. Those biological processes that display differential expression across the superkingdom as well as photosynthetic processes are highlighted according to the bottom key.

Extended Data Fig. 10 Modified peptides.

Sum of modified peptides per organism, identified with pFind ( and colour-coded for archaea (red), eukaryotes (blue) and bacteria (green).

Supplementary information

Reporting Summary

Supplementary Table

Supplementary Table 1: Organisms analyzed in the study. All organisms analyzed in the Study are listed with source and taxonomy.

Supplementary Table

Supplementary Table 2: Identified and quantified protein groups. All identified protein groups for the 100 organisms are listed and quantitative information is added for quantified protein groups.

Supplementary Table

Supplementary Table 3: Reported modified peptides. Peptides with biologically relevant modifications as found by the pFind tool are listed.

Supplementary Table

Supplementary Table 4: Identified and quantified protein groups for 14 human cell lines. The deep human proteome derived from 14 human cell lines is listed with all identified and quantified protein groups.

Supplementary Table

Supplementary Table 5: Detailed summary information for technical and biological proteomics data. Technical relevant information on the 100 organism proteomes mass spectrometry data is listed.

Supplementary Table

Supplementary Table 6: Annotation data for the 100 most abundant proteins of the 100 organisms. The 100 most abundant protein groups per organism are listed with annotation data.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Müller, J.B., Geyer, P.E., Colaço, A.R. et al. The proteome landscape of the kingdoms of life. Nature 582, 592–596 (2020).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing