Brief Communication | Published:

SignalP 5.0 improves signal peptide predictions using deep neural networks

Nature Biotechnology (2019) | Download Citation

Abstract

Signal peptides (SPs) are short amino acid sequences in the amino terminus of many newly synthesized proteins that target proteins into, or across, membranes. Bioinformatic tools can predict SPs from amino acid sequences, but most cannot distinguish between various types of signal peptides. We present a deep neural network-based approach that improves SP prediction across all domains of life and distinguishes between three types of prokaryotic SPs.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Code availability

SignalP 5.0 is available at http://www.cbs.dtu.dk/services/SignalP/. The web version of SignalP 5.0 is free for all users, while the standalone package is free for academic users (and can be provided upon request) but is licensed for a fee to commercial users.

Data availability

The data sets used for training and testing SignalP 5.0 can be downloaded from http://www.cbs.dtu.dk/services/SignalP/data.php.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Nouwen, N., Berrelkamp, G. & Driessen, A. J. J. Mol. Biol. 372, 422–433 (2007).

  2. 2.

    Pohlschroder, M., Gimenez, M. I. & Jarrell, K. F. Curr. Opin. Microbiol. 8, 713–719 (2005).

  3. 3.

    Rapoport, T. A. Nature 450, 663–669 (2007).

  4. 4.

    Berks, B. C. Annu. Rev. Biochem. 84, 843–864 (2015).

  5. 5.

    von Heijne, G. Protein Eng. 2, 531–534 (1989).

  6. 6.

    Pohlschroder, M., Pfeiffer, F., Schulze, S. & Halim, M. F. A. FEMS Microbiol. Rev. 42, 694–717 (2018).

  7. 7.

    Sankaran, K. & Wu, H. C. J. Biol. Chem. 269, 19701–19706 (1994).

  8. 8.

    Szabo, Z. et al. J. Bacteriol. 189, 772–778 (2007).

  9. 9.

    Nielsen, H., Engelbrecht, J., Brunak, S. & von Heijne, G. Protein Eng. 10, 1–6 (1997).

  10. 10.

    Nielsen, H. & Krogh, A. Proc. Int. Conf. Intell. Syst. Mol. Biol. 6, 122–130 (1998).

  11. 11.

    Bendtsen, J. D., Nielsen, H., von Heijne, G. & Brunak, S. J. Mol. Biol. 340, 783–795 (2004).

  12. 12.

    Petersen, T. N., Brunak, S., von Heijne, G. & Nielsen, H. Nat. Methods 8, 785–786 (2011).

  13. 13.

    Thompson, B. J. et al. Mol. Microbiol. 77, 943–957 (2010).

  14. 14.

    Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. Bioinformatics 28, 3150–3152 (2012).

  15. 15.

    Henikoff, S. & HenikoffJ. G. Proc. Natl Acad. Sci. USA 89, 10915–10919 (1992).

  16. 16.

    Frank, K. & Sippl, M. J. Bioinformatics 24, 2172–2176 (2008).

  17. 17.

    Altschul, S. F. et al. Nucleic Acids Res. 25, 3389–3402 (1997).

  18. 18.

    Matthews, B. W. Biochim. Biophys. Acta 405, 442–451 (1975).

  19. 19.

    Savojardo, C., Martelli, P. L., Fariselli, P. & Casadio, R. Bioinformatics 34, 1690–1696 (2017).

  20. 20.

    Bagos, P. G., Tsirigos, K. D., Plessas, S. K., Liakopoulos, T. D. & Hamodrakas, S. J. PEDS 22, 27–35 (2009).

  21. 21.

    Reynolds, S. M., Kall, L., Riffle, M. E., Bilmes, J. A. & Noble, W. S. PLoS Comput. Biol. 4, e1000213 (2008).

  22. 22.

    Kall, L., Krogh, A. & Sonnhammer, E. L. J. Mol. Biol. 338, 1027–1036 (2004).

  23. 23.

    Viklund, H., Bernsel, A., Skwark, M. & Elofsson, A. Bioinformatics 24, 2928–2929 (2008).

  24. 24.

    Tsirigos, K. D., Peters, C., Shu, N., Kall, L. & Elofsson, A. Nucleic Acids Res. 43, W401–W407 (2015).

  25. 25.

    Bagos, P. G., Nikolaou, E. P., Liakopoulos, T. D. & Tsirigos, K. D. Bioinformatics 26, 2811–2817 (2010).

  26. 26.

    Dilks, K., Rose, R. W., Hartmann, E. & Pohlschroder, M. J. Bacteriol. 185, 1478–1483 (2003).

  27. 27.

    UniProt Consortium. Nucleic Acids Res. 46, 2699 (2018).

  28. 28.

    Fraser, C. M. et al. Science 270, 397–403 (1995).

  29. 29.

    Sigrist, C. J. et al. Nucleic Acids Res. 41, D344–D347 (2013).

  30. 30.

    Bagos, P. G., Tsirigos, K. D., Liakopoulos, T. D. & Hamodrakas, S. J. J. Proteome. Res. 7, 5082–5093 (2008).

  31. 31.

    Dobson, L., Lango, T., Remenyi, I. & Tusnady, G. E. Nucleic Acids Res. 43, D283–D289 (2015).

  32. 32.

    Kozma, D., Simon, I. & Tusnady, G. E. Nucleic Acids Res. 41, D524–D529 (2013).

  33. 33.

    Juncker, A. S. et al. Protein Sci. 12, 1652–1662 (2003).

  34. 34.

    Kall, L., Krogh, A. & Sonnhammer, E. L. Bioinformatics 21, i251–i257 (2005).

  35. 35.

    Hiller, K., Grote, A., Scheer, M., Munch, R. & Jahn, D. Nucleic Acids Res. 32, W375–W379 (2004).

  36. 36.

    Gomi, M., Sonoyama, M. & Mitaku, S. Chem. Bio. Informat. J. 4, 142–147 (2004).

  37. 37.

    Bendtsen, J. D., Nielsen, H., Widdick, D., Palmer, T. & Brunak, S. BMC Bioinformatics 6, 167–173 (2005).

  38. 38.

    Zhang, Y. Z. & Shen, H. B. J. Chem. Inf. Model. 57, 988–999 (2017).

  39. 39.

    Chou, K. C. & Shen, H. B. Biochem. Biophys. Res. Commun. 357, 633–640 (2007).

  40. 40.

    Fariselli, P., Finocchiaro, G. & Casadio, R. Bioinformatics 19, 2498–2499 (2003).

  41. 41.

    LeCun, Y., Bengio, Y. & Hinton, G. Nature 521, 436–444 (2015).

  42. 42.

    Pan, S. J. & Yang, Q. IEEE Trans. Knowl. Data Eng. 22, 1345–1359 (2010).

  43. 43.

    Lafferty, J. D., McCallum, A. & Pereira, F. C. N. Proc. Eighteenth Int. Conf. Mach. Learn. 282–289 (2001).

  44. 44.

    Hochreiter, S. & Schmidhuber, J. Neural Comput. 9, 1735–1780 (1997).

  45. 45.

    Graves, A. Supervised sequence labelling. in Supervised Sequence Labelling with Recurrent Neural Networks 5–13, https://doi.org/10.1007/978-3-642-24797-2_2 (Springer, Berlin and Heidelberg, Germany, 2012).

  46. 46.

    Almagro Armenteros, J. J., Sonderby, C. K., Sonderby, S. K., Nielsen, H. & Winther, O. Bioinformatics 33, 3387–3395 (2017).

  47. 47.

    Zhou, J., & Troyanskaya, O. G. Proc. 31st Int. Conf. Mach. Learn. 753–745 (2014).

  48. 48.

    Bishop, C. Pattern Recognition and Machine Learning (Springer, New York, 2006).

  49. 49.

    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. J. Mach. Learn. Res. 15, 1929–1958 (2014).

  50. 50.

    Hutter, F., Hoos, H. H. & Leyton-Brown, K. Proc. 5th Int. Conf. Learn. Intell. Optimiz. 507–523 (2011).

  51. 51.

    Abadi, et al. Proc 12th USENIX Conf. Operat. Syst. Des. Implement. 265–283 (2016).

Download references

Acknowledgements

SB would like to acknowledge support from the Novo Nordisk Foundation (grant NNF14CC0001).

Author information

Author notes

  1. These authors contributed equally: José Juan Almagro Armenteros, Konstantinos D. Tsirigos.

Affiliations

  1. Department of Bio and Health Informatics, Technical University of Denmark, Kgs Lyngby, Denmark

    • José Juan Almagro Armenteros
    • , Konstantinos D. Tsirigos
    • , Søren Brunak
    •  & Henrik Nielsen
  2. Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden

    • Konstantinos D. Tsirigos
    •  & Gunnar von Heijne
  3. Science for Life Laboratory, Stockholm University, Solna, Sweden

    • Konstantinos D. Tsirigos
    •  & Gunnar von Heijne
  4. Department of Genome Regulation, Max Planck Institute for Molecular Genetics, Berlin, Germany

    • Konstantinos D. Tsirigos
  5. Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark

    • Casper Kaae Sønderby
    •  & Ole Winther
  6. National Food Institute, Technical University of Denmark, Kgs Lyngby, Denmark

    • Thomas Nordahl Petersen
  7. Department of Applied Mathematics and Computer Science, Technical University of Denmark, Kgs Lyngby, Denmark

    • Ole Winther
  8. Disease Systems Biology Program, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark

    • Søren Brunak

Authors

  1. Search for José Juan Almagro Armenteros in:

  2. Search for Konstantinos D. Tsirigos in:

  3. Search for Casper Kaae Sønderby in:

  4. Search for Thomas Nordahl Petersen in:

  5. Search for Ole Winther in:

  6. Search for Søren Brunak in:

  7. Search for Gunnar von Heijne in:

  8. Search for Henrik Nielsen in:

Contributions

J.J.A.A. designed the model architecture and trained the SignalP5 method with help from C.K.S. K.D.T. collected the training and test data sets, performed the benchmarks and analyzed results. C.K.S., T.N.P., O.W., S.B. and G.v.H. provided suggestions during the design of SignalP5. K.D.T and H.N wrote the paper with input from J.J.A.A., C.K.S. and O.W. H.N. supervized and guided the project. All authors edited and approved the manuscript.

Competing interests

The downloadable version of SignalP 5.0 has been commercialized by the Technical University of Denmark (it is licensed for a fee to commercial users). The revenue from these commercial sales is divided between the program developers (J.J.A.A., K.D.T., C.K.S., T.N.P., O.W., S.B., G.v.H. and H.N.) and the Technical University of Denmark.

Corresponding author

Correspondence to Henrik Nielsen.

Integrated supplementary information

  1. Supplementary Figure 1 Box plot of the probability of the predicted class for correct and incorrect predictions.

    A probability close to 1 means a highly reliable prediction. For Archaea, Gram-Positive and Gram-Negative the probability threshold is 0.25, as there are four possible classes (Sec/SPI, Tat/SPI, Sec/SPII and Other). For Eukarya this threshold is 0.5, as it has only two classes (Sec/SPI and Other). A probability close to this threshold means a very unreliable prediction. All classes, namely Sec/SPI, Tat/SPI, Sec/SPII and Other are combined in this plot.

  2. Supplementary Figure 2

    Performance of SignalP 5.0 on cleavage site detection when considering a window of 0, 1, 2 and 3 amino acids centered on the real cleavage site.

  3. Supplementary Figure 3

    The SignalP 5.0 neural network architecture.

Supplementary information

  1. Supplementary Information

    Supplementary Figures 1–3, Supplementary Tables 1–12 and Supplementary Notes 1–3

  2. Reporting Summary

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41587-019-0036-z