SignalP 5.0 improves signal peptide predictions using deep neural networks

Abstract

Signal peptides (SPs) are short amino acid sequences in the amino terminus of many newly synthesized proteins that target proteins into, or across, membranes. Bioinformatic tools can predict SPs from amino acid sequences, but most cannot distinguish between various types of signal peptides. We present a deep neural network-based approach that improves SP prediction across all domains of life and distinguishes between three types of prokaryotic SPs.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1
Fig. 2

Code availability

SignalP 5.0 is available at http://www.cbs.dtu.dk/services/SignalP/. The web version of SignalP 5.0 is free for all users, while the standalone package is free for academic users (and can be provided upon request) but is licensed for a fee to commercial users.

Data availability

The data sets used for training and testing SignalP 5.0 can be downloaded from http://www.cbs.dtu.dk/services/SignalP/data.php.

References

  1. 1.

    Nouwen, N., Berrelkamp, G. & Driessen, A. J. J. Mol. Biol. 372, 422–433 (2007).

    CAS  Article  Google Scholar 

  2. 2.

    Pohlschroder, M., Gimenez, M. I. & Jarrell, K. F. Curr. Opin. Microbiol. 8, 713–719 (2005).

    Article  Google Scholar 

  3. 3.

    Rapoport, T. A. Nature 450, 663–669 (2007).

    CAS  Article  Google Scholar 

  4. 4.

    Berks, B. C. Annu. Rev. Biochem. 84, 843–864 (2015).

    CAS  Article  Google Scholar 

  5. 5.

    von Heijne, G. Protein Eng. 2, 531–534 (1989).

    Article  Google Scholar 

  6. 6.

    Pohlschroder, M., Pfeiffer, F., Schulze, S. & Halim, M. F. A. FEMS Microbiol. Rev. 42, 694–717 (2018).

    CAS  Article  Google Scholar 

  7. 7.

    Sankaran, K. & Wu, H. C. J. Biol. Chem. 269, 19701–19706 (1994).

    CAS  PubMed  Google Scholar 

  8. 8.

    Szabo, Z. et al. J. Bacteriol. 189, 772–778 (2007).

    CAS  Article  Google Scholar 

  9. 9.

    Nielsen, H., Engelbrecht, J., Brunak, S. & von Heijne, G. Protein Eng. 10, 1–6 (1997).

    CAS  Article  Google Scholar 

  10. 10.

    Nielsen, H. & Krogh, A. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6, 122–130 (1998).

  11. 11.

    Bendtsen, J. D., Nielsen, H., von Heijne, G. & Brunak, S. J. Mol. Biol. 340, 783–795 (2004).

    Article  Google Scholar 

  12. 12.

    Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. Nat. Methods 8, 785–786 (2011).

    CAS  Article  Google Scholar 

  13. 13.

    Thompson, B. J. et al. Mol. Microbiol. 77, 943–957 (2010).

    CAS  Article  Google Scholar 

  14. 14.

    Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. Bioinformatics 28, 3150–3152 (2012).

    CAS  Article  Google Scholar 

  15. 15.

    Henikoff, S. & HenikoffJ. G. Proc. Natl Acad. Sci. USA 89, 10915–10919 (1992).

    CAS  Article  Google Scholar 

  16. 16.

    Frank, K. & Sippl, M. J. Bioinformatics 24, 2172–2176 (2008).

    CAS  Article  Google Scholar 

  17. 17.

    Altschul, S. F. et al. Nucleic Acids Res. 25, 3389–3402 (1997).

    CAS  Article  Google Scholar 

  18. 18.

    Matthews, B. W. Biochim. Biophys. Acta 405, 442–451 (1975).

    CAS  Article  Google Scholar 

  19. 19.

    Savojardo, C., Martelli, P. L., Fariselli, P. & Casadio, R. Bioinformatics 34, 1690–1696 (2017).

    Article  Google Scholar 

  20. 20.

    Bagos, P. G., Tsirigos, K. D., Plessas, S. K., Liakopoulos, T. D. & Hamodrakas, S. J. PEDS 22, 27–35 (2009).

  21. 21.

    Reynolds, S. M., Kall, L., Riffle, M. E., Bilmes, J. A. & Noble, W. S. PLoS Comput. Biol. 4, e1000213 (2008).

    Article  Google Scholar 

  22. 22.

    Kall, L., Krogh, A. & Sonnhammer, E. L. J. Mol. Biol. 338, 1027–1036 (2004).

    CAS  Article  Google Scholar 

  23. 23.

    Viklund, H., Bernsel, A., Skwark, M. & Elofsson, A. Bioinformatics 24, 2928–2929 (2008).

    CAS  Article  Google Scholar 

  24. 24.

    Tsirigos, K. D., Peters, C., Shu, N., Kall, L. & Elofsson, A. Nucleic Acids Res. 43, W401–W407 (2015).

    CAS  Article  Google Scholar 

  25. 25.

    Bagos, P. G., Nikolaou, E. P., Liakopoulos, T. D. & Tsirigos, K. D. Bioinformatics 26, 2811–2817 (2010).

    CAS  Article  Google Scholar 

  26. 26.

    Dilks, K., Rose, R. W., Hartmann, E. & Pohlschroder, M. J. Bacteriol. 185, 1478–1483 (2003).

    CAS  Article  Google Scholar 

  27. 27.

    UniProt Consortium. Nucleic Acids Res. 46, 2699 (2018).

    Article  Google Scholar 

  28. 28.

    Fraser, C. M. et al. Science 270, 397–403 (1995).

    CAS  Article  Google Scholar 

  29. 29.

    Sigrist, C. J. et al. Nucleic Acids Res. 41, D344–D347 (2013).

    CAS  Article  Google Scholar 

  30. 30.

    Bagos, P. G., Tsirigos, K. D., Liakopoulos, T. D. & Hamodrakas, S. J. J. Proteome. Res. 7, 5082–5093 (2008).

    CAS  Article  Google Scholar 

  31. 31.

    Dobson, L., Lango, T., Remenyi, I. & Tusnady, G. E. Nucleic Acids Res. 43, D283–D289 (2015).

    CAS  Article  Google Scholar 

  32. 32.

    Kozma, D., Simon, I. & Tusnady, G. E. Nucleic Acids Res. 41, D524–D529 (2013).

    CAS  Article  Google Scholar 

  33. 33.

    Juncker, A. S. et al. Protein Sci. 12, 1652–1662 (2003).

    CAS  Article  Google Scholar 

  34. 34.

    Kall, L., Krogh, A. & Sonnhammer, E. L. Bioinformatics 21, i251–i257 (2005).

    Article  Google Scholar 

  35. 35.

    Hiller, K., Grote, A., Scheer, M., Munch, R. & Jahn, D. Nucleic Acids Res. 32, W375–W379 (2004).

    CAS  Article  Google Scholar 

  36. 36.

    Gomi, M., Sonoyama, M. & Mitaku, S. Chem. Bio. Informat. J. 4, 142–147 (2004).

    CAS  Article  Google Scholar 

  37. 37.

    Bendtsen, J. D., Nielsen, H., Widdick, D., Palmer, T. & Brunak, S. BMC Bioinformatics 6, 167–173 (2005).

    Article  Google Scholar 

  38. 38.

    Zhang, Y. Z. & Shen, H. B. J. Chem. Inf. Model. 57, 988–999 (2017).

    CAS  Article  Google Scholar 

  39. 39.

    Chou, K. C. & Shen, H. B. Biochem. Biophys. Res. Commun. 357, 633–640 (2007).

    CAS  Article  Google Scholar 

  40. 40.

    Fariselli, P., Finocchiaro, G. & Casadio, R. Bioinformatics 19, 2498–2499 (2003).

    CAS  Article  Google Scholar 

  41. 41.

    LeCun, Y., Bengio, Y. & Hinton, G. Nature 521, 436–444 (2015).

    CAS  Article  Google Scholar 

  42. 42.

    Pan, S. J. & Yang, Q. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).

    Article  Google Scholar 

  43. 43.

    Lafferty, J. D., McCallum, A. & Pereira, F. C. N. Proc. Eighteenth Int. Conf. Mach. Learn. 282–289 (2001).

  44. 44.

    Hochreiter, S. & Schmidhuber, J. Neural Comput. 9, 1735–1780 (1997).

    CAS  Article  Google Scholar 

  45. 45.

    Graves, A. Supervised sequence labelling. in Supervised Sequence Labelling with Recurrent Neural Networks 5–13, https://doi.org/10.1007/978-3-642-24797-2_2 (Springer, Berlin and Heidelberg, Germany, 2012).

  46. 46.

    Almagro Armenteros, J. J., Sonderby, C. K., Sonderby, S. K., Nielsen, H. & Winther, O. Bioinformatics 33, 3387–3395 (2017).

    Article  Google Scholar 

  47. 47.

    Zhou, J., & Troyanskaya, O. G. Proc. 31st Int. Conf. Mach. Learn. 753–745 (2014).

  48. 48.

    Bishop, C. Pattern Recognition and Machine Learning (Springer, New York, 2006).

  49. 49.

    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    Google Scholar 

  50. 50.

    Hutter, F., Hoos, H. H. & Leyton-Brown, K. Proc. 5th Int. Conf. Learn. Intell. Optimiz. 507–523 (2011).

  51. 51.

    Abadi, et al. Proc 12th USENIX Conf. Operat. Syst. Des. Implement. 265–283 (2016).

Download references

Acknowledgements

SB would like to acknowledge support from the Novo Nordisk Foundation (grant NNF14CC0001).

Author information

Affiliations

Authors

Contributions

J.J.A.A. designed the model architecture and trained the SignalP5 method with help from C.K.S. K.D.T. collected the training and test data sets, performed the benchmarks and analyzed results. C.K.S., T.N.P., O.W., S.B. and G.v.H. provided suggestions during the design of SignalP5. K.D.T and H.N wrote the paper with input from J.J.A.A., C.K.S. and O.W. H.N. supervized and guided the project. All authors edited and approved the manuscript.

Corresponding author

Correspondence to Henrik Nielsen.

Ethics declarations

Competing interests

The downloadable version of SignalP 5.0 has been commercialized by the Technical University of Denmark (it is licensed for a fee to commercial users). The revenue from these commercial sales is divided between the program developers (J.J.A.A., K.D.T., C.K.S., T.N.P., O.W., S.B., G.v.H. and H.N.) and the Technical University of Denmark.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Integrated supplementary information

Supplementary Figure 1 Box plot of the probability of the predicted class for correct and incorrect predictions.

A probability close to 1 means a highly reliable prediction. For Archaea, Gram-Positive and Gram-Negative the probability threshold is 0.25, as there are four possible classes (Sec/SPI, Tat/SPI, Sec/SPII and Other). For Eukarya this threshold is 0.5, as it has only two classes (Sec/SPI and Other). A probability close to this threshold means a very unreliable prediction. All classes, namely Sec/SPI, Tat/SPI, Sec/SPII and Other are combined in this plot.

Supplementary Figure 2

Performance of SignalP 5.0 on cleavage site detection when considering a window of 0, 1, 2 and 3 amino acids centered on the real cleavage site.

Supplementary Figure 3

The SignalP 5.0 neural network architecture.

Supplementary information

Supplementary Information

Supplementary Figures 1–3, Supplementary Tables 1–12 and Supplementary Notes 1–3

Reporting Summary

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Almagro Armenteros, J.J., Tsirigos, K.D., Sønderby, C.K. et al. SignalP 5.0 improves signal peptide predictions using deep neural networks. Nat Biotechnol 37, 420–423 (2019). https://doi.org/10.1038/s41587-019-0036-z

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing