Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning

Abstract

In mass-spectrometry-based proteomics, the identification and quantification of peptides and proteins heavily rely on sequence database searching or spectral library matching. The lack of accurate predictive models for fragment ion intensities impairs the realization of the full potential of these approaches. Here, we extended the ProteomeTools synthetic peptide library to 550,000 tryptic peptides and 21 million high-quality tandem mass spectra. We trained a deep neural network, termed Prosit, resulting in chromatographic retention time and fragment ion intensity predictions that exceed the quality of the experimental data. Integrating Prosit into database search pipelines led to more identifications at >10× lower false discovery rates. We show the general applicability of Prosit by predicting spectra for proteases other than trypsin, generating spectral libraries for data-independent acquisition and improving the analysis of metaproteomes. Prosit is integrated into ProteomicsDB, allowing search result re-scoring and custom spectral library generation for any organism on the basis of peptide sequence alone.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Accurate retention time and fragment ion intensity prediction by deep learning.
Fig. 2: Collision energy calibration yields fragment intensity predictions with near-synthetic peptide spectrum quality.
Fig. 3: Evaluation of fragment ion intensity and iRT prediction for non-tryptic peptides.
Fig. 4: Prosit enables generation of in silico spectral libraries.
Fig. 5: Intensity prediction greatly improves database search quality.
Fig. 6: Prosit enables confident identification in large metaproteomic search spaces.

Similar content being viewed by others

Data availability

Reference spectra are available at https://www.proteomicsdb.org, and updates to the resource are available at http://www.proteometools.org. The mass spectrometric raw data of ProteomeTools have been deposited with the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD010595. The MaxQuant and Spectronaut search data including intermediate results underlying the presented analysis have been deposited with the dataset identifier PXD010871. Learned Prosit and Elude models are deposited at https://figshare.com/projects/prosit/35582.

Code availability

Source code and scripts are available on GitHub at https://github.com/kusterlab/prosit.

References

  1. Aebersold, R. & Mann, M. Mass-spectrometric exploration of proteome structure and function. Nature 537, 347–355 (2016).

    Article  CAS  Google Scholar 

  2. Zhang, Y., Fonslow, B. R., Shan, B., Baek, M.-C. & Yates, J. R. Protein analysis by shotgun/bottom-up proteomics. Chem. Rev. 113, 2343–2394 (2013).

    Article  CAS  Google Scholar 

  3. Mallick, P. & Kuster, B. Proteomics: a pragmatic perspective. Nat. Biotechnol. 28, 695 (2010).

    Article  CAS  Google Scholar 

  4. Sinitcyn, P., Rudolph, J. D. & Cox, J. Computational methods for understanding mass spectrometry-based shotgun proteomics data. Annu. Rev. Biomed. Data Sci. 1, 207–234 (2018).

    Article  Google Scholar 

  5. Cox, J. et al. Andromeda: a peptide search engine integrated into the maxquant environment. J. Proteome Res. 10, 1794–1805 (2011).

    Article  CAS  Google Scholar 

  6. Perkins, D. N., Pappin, D. J. C., Creasy, D. M. & Cottrell, J. S. Probability‐based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999).

    Article  CAS  Google Scholar 

  7. Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).

    Article  CAS  Google Scholar 

  8. Stein, S. E. & Scott, D. R. Optimization and testing of mass spectral library search algorithms for compound identification. J. Am. Soc. Mass Spectrom. 5, 859–866 (1994).

    Article  CAS  Google Scholar 

  9. Lam, H. et al. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667 (2007).

    Article  CAS  Google Scholar 

  10. Schubert, O. T. et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat. Protoc. 10, 426–441 (2015).

    Article  CAS  Google Scholar 

  11. Deutsch, E. W. et al. Expanding the use of spectral libraries in proteomics. J. Proteome Res. 17, 4051–4060 (2018).

    Article  CAS  Google Scholar 

  12. Gillet, L. C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012).

    Article  Google Scholar 

  13. Lange, V., Picotti, P., Domon, B. & Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 4, 222 (2008).

    Article  Google Scholar 

  14. Bruderer, R., Bernhardt, O. M., Gandhi, T. & Reiter, L. High‐precision iRT prediction in the targeted analysis of data‐independent acquisition and its impact on identification and quantitation. Proteomics 16, 2246–2256 (2016).

    Article  CAS  Google Scholar 

  15. Krokhin, O. V. & Spicer, V. Generation of accurate peptide retention data for targeted and data independent quantitative LC–MS analysis: chromatographic lessons in proteomics. Proteomics 16, 2931–2936 (2016).

    Article  CAS  Google Scholar 

  16. Moruz, L. et al. Chromatographic retention time prediction for posttranslationally modified peptides. Proteomics 12, 1151–1159 (2012).

    Article  CAS  Google Scholar 

  17. Elias, J. E., Gibbons, F. D., King, O. D., Roth, F. P. & Gygi, S. P. Intensity-based protein identification by machine learning from a library of tandem mass spectra. Nat. Biotechnol. 22, 214–219 (2004).

    Article  CAS  Google Scholar 

  18. Arnold, R. J., Jayasankar, N., Aggarwal, D., Tang, H. & Radivojac, P. A machine learning approach to predicting peptide fragmentation spectra. Pac. Symp. Biocomput. 2006, 219–230 (2006).

    Google Scholar 

  19. Frank, A. M. Predicting intensity ranks of peptide fragment ions. J. Proteome Res. 8, 2226–2240 (2009).

    Article  CAS  Google Scholar 

  20. Degroeve, S., Maddelein, D. & Martens, L. MS2PIP prediction server: compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation. Nucleic Acids Res. 43, W326–W330 (2015).

    Article  CAS  Google Scholar 

  21. Zhou, X.-X. et al. pDeep: predicting MS/MS spectra of peptides with deep learning. Anal. Chem. 89, 12690–12697 (2017).

    Article  CAS  Google Scholar 

  22. Zolg, D. et al. PROCAL: a set of 40 peptide standards for retention time indexing, column performance monitoring, and collision energy calibration. Proteomics 17, 1700263 (2017).

    Article  Google Scholar 

  23. Zolg, D. P. et al. Building ProteomeTools based on a complete synthetic human proteome. Nat. Methods 14, 259–262 (2017).

    Article  CAS  Google Scholar 

  24. Wu, Y. et al. Google’s neural machine translation system: bridging the gap between human and machine translation. Preprint at https://arxiv.org/abs/1609.08144 (2016).

  25. Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

  26. Xu, K. et al. Show, attend and tell: neural image caption generation with visual attention. In Proc. International Conference on Machine Learning (eds. Bach, F. & Blei, D.) 2048–2057 (JMLR, 2015).

  27. Krokhin, O. V. Sequence-specific retention calculator. Algorithm for peptide retention prediction in ion-pair RP-HPLC: application to 300- and 100-A pore size C18 sorbents. Anal. Chem. 78, 7785–7795 (2006).

    Article  CAS  Google Scholar 

  28. Toprak, U. H. et al. Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol. Cell. Proteomics 13, 2056–2071 (2014).

    Article  CAS  Google Scholar 

  29. Diedrich, J. K., Pinto, A. F. M. & Yates, J. R. Energy dependence of HCD on peptide fragmentation: stepped collisional energy finds the sweet spot. J. Am. Soc. Mass Spectrom. 24, 1690–1699 (2013).

    Article  CAS  Google Scholar 

  30. Bekker-Jensen, D. B. et al. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst. 4, 587–599 (2017).

    Article  CAS  Google Scholar 

  31. Bruderer, R. et al. Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol. Cell. Proteomics 16, 2296–2309 (2017).

    Article  CAS  Google Scholar 

  32. Fabre, B. et al. Spectral libraries for SWATH-MS assays for Drosophila melanogaster and Solanum lycopersicum. Proteomics 17, 1700216 (2017).

    Article  Google Scholar 

  33. Schmidt, T. et al. ProteomicsDB. Nucleic Acids Res. 46, D1271–D1281 (2017).

    Article  Google Scholar 

  34. Wilhelm, M. et al. Mass-spectrometry-based draft of the human proteome. Nature 509, 582–587 (2014).

    Article  CAS  Google Scholar 

  35. Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).

    Article  CAS  Google Scholar 

  36. The, M., MacCoss, M. J., Noble, W. S. & Käll, L. Fast and accurate protein false discovery rates on large-scale proteomics data sets with Percolator 3.0. J. Am. Soc. Mass Spectrom. 27, 1719–1727 (2016).

    Article  CAS  Google Scholar 

  37. Shanmugam, A. K. & Nesvizhskii, A. I. Effective leveraging of targeted search spaces for improving peptide identification in tandem mass spectrometry based proteomics. J. Proteome Res. 14, 5169–5178 (2015).

    Article  CAS  Google Scholar 

  38. Muth, T., Benndorf, D., Reichl, U., Rapp, E. & Martens, L. Searching for a needle in a stack of needles: challenges in metaproteomics data analysis. Mol. Biosyst. 9, 578–585 (2012).

    Article  Google Scholar 

  39. Rechenberger, J. et al. Challenges in clinical metaproteomics highlighted by the analysis of acute leukemia patients with gut colonization by multidrug-resistant enterobacteriaceae. Proteomes 7, 2 (2019).

    Article  Google Scholar 

  40. Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834 (2014).

    Article  CAS  Google Scholar 

  41. Muth, T. R. et al. Navigating through metaproteomics data: a logbook of database searching. Proteomics 15, 3439–3453 (2017).

    Article  Google Scholar 

  42. Nesvizhskii, A. I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11, 1114 (2014).

    Article  CAS  Google Scholar 

  43. Schumacher, F. R. et al. Building proteomic tool boxes to monitor MHC class I and class II peptides. Proteomics 17, 1600061 (2017).

    Article  Google Scholar 

  44. Zolg, D. et al. ProteomeTools: systematic characterization of 21 post-translational protein modifications by LC-MS/MS using synthetic peptides. Mol. Cell. Proteomics 17, 1850–1863 (2018).

    Article  CAS  Google Scholar 

  45. Wang, D. et al. A deep proteome and transcriptome abundance atlas of 29 healthy human tissues. Mol. Syst. Biol. 15, e8503 (2019).

    Article  Google Scholar 

  46. Dorfer, V., Maltsev, S., Winkler, S. & Mechtler, K. CharmeRT: boosting peptide identifications by chimeric spectra identification and retention time prediction. J. Proteome Res. 17, 2581–2589 (2018).

    Article  Google Scholar 

  47. Wenschuh, H. et al. Coherent membrane supports for parallel microsynthesis and screening of bioactive peptides. Pept. Sci. 55, 188–206 (2000).

    Article  CAS  Google Scholar 

  48. Chung, J., Gulcehre, C., Cho, K. & Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. Preprint at https://arxiv.org/abs/1412.3555 (2014).

  49. Bahdanau, D., Cho, K. & Bengio, Y. Neural machine translation by jointly learning to align and translate. Preprint at https://arxiv.org/abs/1409.0473 (2014).

  50. Moruz, L., Tomazela, D. & Käll, L. Training, selection, and robust calibration of retention time models for targeted proteomics. J. Proteome Res. 9, 5209–5216 (2010).

    Article  CAS  Google Scholar 

  51. Davis, S. et al. Expanding proteome coverage with CHarge Ordered Parallel Ion aNalysis (CHOPIN) combined with broad specificity proteolysis. J. Proteome Res. 16, 1288–1299 (2017).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This work was in part funded by the German Federal Ministry of Education and Research (BMBF, grant no. 031L0008A and no. 031L0168). The Titan Xp used in this research were donated by the NVIDIA corporation. The authors thank R. Bruderer (Biognosys) for sharing spectral libraries in textual and editable format, and R. Bruderer and members of the Kuster lab for fruitful discussions.

Author information

Authors and Affiliations

Authors

Contributions

H.-C.E., S.A., B.K. and M.W. conceived the study. S.G., T.S., D.P.Z., J.R., K.S., J.Z., T.K., U.R., B.D., A.H., B.K. and M.W. designed experiments. S.G., T.S., D.P.Z., P.S., T.K., K.S., J.R., J.Z., B.D. and M.W. performed experiments. S.G., T.S., D.P.Z. and P.S. analyzed data. S.G., T.S., P.S. and M.W. extended the web resource. S.G., T.S., D.P.Z., B.K. and M.W. wrote the manuscript

Corresponding authors

Correspondence to Bernhard Kuster or Mathias Wilhelm.

Ethics declarations

Competing interests

M.W. and B.K. are founders and shareholders of OmicScouts. They have no operational role in the company. K.S., J.Z., T.K., H.W. and U.R. are employees of JPT. B.D. and A.H. are employees of Thermo Fisher Scientific. S.G., H.-C.E. and S.A. are employees of SAP SE.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Overview of identified peptides in the ProteomeTools project.

(a) Recovery of synthesized peptide sequences across all four new datasets. Bars display the percentage of peptides identified in comparison to the peptides synthesized per pool of ~1000 peptides. Only identifications with an Andromeda Score of at least 50 are considered. (b) Identified peptides over Andromeda score cutoff for both the newly released dataset as well as the complete ProteomeTools peptide library. Numbers at the arbitrary cutoff of 100 are displayed for both datasets, the median Andromeda Score is indicated.

Supplementary Figure 2 The Prosit deep learning model and its training.

(a) Overview of the neural network architecture of the fragment ion intensity prediction model. The model takes precursor charge, normalized collision energy and the peptide sequence as input. First, for every input a specific encoder is trained, consisting of one dense layer for precursor charge and normalized collision energy. The encoder for the peptide sequence is split in an embedding layer connected to 2 bi-directional recurrent neural networks (BDN) with gated recurrent memory (GRU) units and an attention layer. Both encoder representations are element-wise multiplied for a fixed size latent space representation. The decoder for fragment ion intensity prediction consists of one bidirectional GRU resulting in 6 predictions for up to 29 fragmentation positions. The indexed retention time (iRT) model uses the same encoder but dense layers as decoder. (b) Model performance for 5 random splits of the ProteomeTools data into Training, Test and Holdout. The main panel shows best performing models from 5 random splits of the data. The inset details the median models error with intervals (shaded regions) ranging from the best performing model to the worst performing model over the 5 splits for Training Test and Holdout. (c) Comparison of Pearson correlation and normalized spectral contrast angle (short spectral angle) as measures for spectral similarity between predicted and measured spectra contained in the holdout set for fragment ion intensity prediction.

Supplementary Figure 3 iRT predition using SSRCalc and Prosit on the ProteomeTools holdout set.

Benchmark of the indexed retention time (iRT) prediction model of Prosit (a) in comparison to SSRCalc (b). Plotted are the predicted and measured iRT values of peptides (dots) in the holdout set. The required iRT window that would encompass 95% of all peptides is indicated by the red dashed lines.

Supplementary Figure 4 Comparison of Prosit and MS2PIP.

Subsets of the ProteomeTools holdout are used in this figure. None of the peptides and spectra in this dataset were used to train or test Prosit’s fragment intensity model. In boxplots, outliers are not shown, whiskers indicate 1.5 interquartile range (IQR), and black horizontal lines indicate median values. For reference, a spectral angle of 0.9 and 0.7 are indicated. (a) Benchmark of Prosit’s (green) and MS2PIP’s (orange) fragment ion intensity prediction compared to the experimental ProteomeTools spectrum respectively. Data is split by peptide length on a random subset of the ProteomeTools holdout dataset. (b) Same as (a) but split by precursor charge. (c) Same as (a) but split by collision energy. (d) Comparison of Prosit’s and MS2PIP’s fragment ion intensity prediction limited to spectra acquired at NCE 35 of the holdout set. Orange dots denote peptides that were (likely) part of MS2PIPs training data.

Supplementary Figure 5 Collision energy dependency of experimental and predicted spectra.

Heatmap of the median spectral angle when comparing experimental vs experimental (a), experimental vs predicted (b) and predicted vs predicted (c) spectra across 15 different normalized collision energies (NCEs) of ~40 synthetic peptides used for retention time and NCE calibration (Zolg et. al. 2017).

Supplementary Figure 6 Evaluation of model overfitting on internal and external datasets.

(a) Comparison precursor charges of peptides from the Bekker-Jensen tryptic dataset. Peptides that were also part of the ProteomeTools Holdout dataset exhibit a different precursor charge distribution than those that were not. (b) Spectral angle distributions by precursor charge for peptides in the Bekker-Jensen tryptic dataset split by whether they were also part of the ProteomeTools Holdout dataset. (c) Benchmark of Prosit’s (green) and MS2PIP’s (orange) fragment ion intensity prediction on tryptic peptides from the Bekker-Jensen dataset. The top histogram shows spectral angles for peptides that were also synthesized in the ProteomeTools project, but not used for training Prosit. The bottom histogram shows the distribution of spectral angles for peptides not part of ProteomeTools.

Supplementary Figure 7 Effect of the iRT model refinement for external datasets.

Predicted vs experimental retention times using the general Prosit indexed retention time (iRT) prediction model (a), Elude (b), or the refined Prosit model (c) on representative LC-MS/MS measurements for 4 proteases (left to right: Trypsin, Chymotrypsin, Glu-C and Lys-C) from the Bekker-Jensen et. al. dataset. Model refinement for Prosit was performed using the tryptic data from the same dataset. The required retention time window that would encompass 95% of all peptides is indicated by the red dashed lines. Sample number n and Pearson correlation are indicated.

Supplementary Figure 8 Effect of the refined Prosit iRT model on DIA spectral libraries.

Evaluation of the general (top) and refined indexed retention time (iRT) (bottom) prediction model of Prosit on C. elegans (a), E. coli (b) and S. cerevisiae (c) data from the Bruderer et. al. dataset. For refinement, the project specific HEK library was used. The required iRT window that would encompass 95% of all peptides is indicated by the red dashed lines. Sample number n and Pearson correlation are indicated.

Supplementary Figure 9 DIA analysis using predicted spectral libraries.

(a) Impact of the retention time refinement using Prosit on the number identified peptides using either the general (indicated by “-”) or refined (indicated by “+”) Prosit indexed retention time (iRT) prediction model. The number of shared (blue), gained (green) and lost (orange) identified peptide sequences is plotted with respect to the original filtered library. iRT refinement was performed using the experimental retention time of the filtered HEK-293 data. See Supplementary Figure 8 for iRT model refinement analysis. (b) Identical analysis as Figure 4 for S. cerevisiae and E. coli. (c) Re-analysis of Orbitrap/TOF based data independent acquisition (DIA)/SWATH datasets using predicted spectral libraries. Data and project specific spectral libraries were obtained from public repositories. To facilitate comparisons, the original library was filtered for entries that Prosit is not yet able to predict (other modifications besides oxidized methionine, neutral losses and peptides >30 amino acids). The original and filtered spectral libraries were queried against the DIA data using Spectronaut and the barcharts depict the number of shared (blue), gained (green) and lost (orange) identified peptide sequences when using the original filtered library compared to the original unfiltered library. (d) Identical analysis as Figure 4, however protein-groups instead of peptides are displayed

Supplementary Figure 10 Comparison of predicted spectra with QTOF originated spectra.

(a) Density distribution of normalized spectral contrast angles between predicted spectra and QTOF originated spectral libraries (Rosenberger et. al, Schubert et al, Fabre, et al). The spectral angle is calculated based on annotated fragment ions, excluding fragments with a neutral loss, less then 3 amino acids and m/z <300. (b) Representative mirror spectrum of one predicted spectrum at normalized collision energy (NCE) 30 (top) vs one experimental spectrum contained in the D. Melanogaster QTOF library. (c) Number of fragment ions annotated fragment ions, more than 3 amino acids and m/z >300 per spectrum in the S. cerevisiae library and after prediction. (d) Density distribution of normalized spectral contrast angles between predicted spectra and DDA QTOF spectra for S. cerevisiae (Schubert et al.). Besides neutral loss fragments, all ions were accounted for. (e) Density distribution of normalized spectral contrast angles between predicted spectra and DDA QTOF spectra for S. cerevisiae (Schubert et al.) as function of the most intense peak in the QTOF DDA spectrum.

Supplementary Figure 11 Prediction performance analysis of Prosit.

(a) Barplot of predicted spectra per second using Prosit’s fragment ion intensity prediction across several datasets investigated in this study, excluding data transformation as well as read and write operations. Numbers in each bar indicate the total number of predicted spectra. (b) Total prediction time including transformation, read and write operations plotted against the number of predicted spectra using Prosit’s fragment ion intensity prediction model for differently sized datasets.

Supplementary Figure 12 Percolator feature weights.

Barplots of final feature weights assigned by percolator for four different proteases when using the Prosit feature set (See Supplementary table 5 for description of the features). The evaluated percolator models were trained on Bekker-Jensen datasets with proteases (top to bottom): Trypsin, Chymostrypsin, Lys-C and Glu-C.

Supplementary Figure 13 FDR analysis Bekker-Jensen Trypsin.

(a) Percent of shared (blue), gained (green) and lost (red) peptide identification when using the Prosit score set at different peptide level FDR cutoffs in comparison to the number of identification when using the Andromeda score set at 1% peptide level FDR. (b) Spectral angle distributions of decoy (orange) and false negative classified target (green) peptide spectrum matches (PSMs). The top panels are filtered at 1% peptide level FDR and the bottom panels are filtered at 0.1% peptide level FDR. The left panels show the distributions for the Andromeda and the right panel for the Prosit scores set.

Supplementary Figure 14 FDR comparison of Prosit and MS2PIP on Bekker-Jensen Trypsin and Bekker-Jense Chymotrypsin.

Number of estimated true positive (#targets - #decoys at respective false discovery rate (FDR) cutoff) peptide spectrum matches using percolator at different peptide level FDR cutoffs when using the Andromeda (blue), full score set based on intensity predictions (orange) (see Supplementary Table 5 for feature set description). Dashed line indicates the number of true positive identifications when using the Andromeda feature set at 1% peptide level FDR. Top figures show the analysis for trypsin and bottom figures for chymotrypsin. On the left MS2PIP predictions were used and on the right Prosit predictions.

Supplementary Figure 15 Target-decoy analysis of Prosit and MS2PIP.

(a) Comparison of the spectral angles of MS2PIP predictions to spectral angles of Prosit predictions for target (green) and decoy (orange) peptide spectrum matches generated by Andromeda. A random subset of 10,000 PSMs from the Bekker-Jensen tryptic dataset are shown. The PSM for peptide LVDCLSR is analysed in (c). (b) As in (a), but on a random subset of 10,000 PSMS from the Bekker-Jensen chymotrypsin dataset. (c) Analysis of a PSM for which Prosit’s prediction has a lower spectral angle than MS2PIP’s prediction. The PSM is highlighted in (a). Top: Prosit prediction compared to 12 experimental ProteomeTools spectra. Middle: Prosit prediction compared to the experimental spectrum from Bekker-Jensen. Bottom: MS2PIP prediction compared to the experimental spectrum from Bekker-Jensen. SA and R state spectral angle and Pearson correlation, respectively.

Supplementary information

Supplementary Information

Supplementary Figs. 1–15 and Supplementary Notes

Reporting Summary

Supplementary Table 1

Peptide mapping and identifications in ProteomeTools

Supplementary Table 2

NCE effect on fragmentation

Supplementary Table 3

Comparison of Prosit to external data

Supplementary Table 4

DIA results

Supplementary Table 5

DDA results

Supplementary Table 6

Metaproteomics results

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gessulat, S., Schmidt, T., Zolg, D.P. et al. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat Methods 16, 509–518 (2019). https://doi.org/10.1038/s41592-019-0426-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-019-0426-7

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research