Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Machine learning overcomes human bias in the discovery of self-assembling peptides

Abstract

Peptide materials have a wide array of functions, from tissue engineering and surface coatings to catalysis and sensing. Tuning the sequence of amino acids that comprise the peptide modulates peptide functionality, but a small increase in sequence length leads to a dramatic increase in the number of peptide candidates. Traditionally, peptide design is guided by human expertise and intuition and typically yields fewer than ten peptides per study, but these approaches are not easily scalable and are susceptible to human bias. Here we introduce a machine learning workflow—AI-expert—that combines Monte Carlo tree search and random forest with molecular dynamics simulations to develop a fully autonomous computational search engine to discover peptide sequences with high potential for self-assembly. We demonstrate the efficacy of the AI-expert to efficiently search large spaces of tripeptides and pentapeptides. The predictability of AI-expert performs on par or better than our human experts and suggests several non-intuitive sequences with high self-assembly propensity, outlining its potential to overcome human bias and accelerate peptide discovery.

This is a preview of subscription content, access via your institution

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Workflow adopted to discover self-assembling pentapeptides using inputs from human experts and the developed AI-expert.
Fig. 2: Performance comparison of the different search strategies for the space of tripeptides.
Fig. 3: Screening of pentapeptides from AI-expert and human experts.
Fig. 4: Experimental measurements of self-assembly in pentapeptides.
Fig. 5: Performance comparison of AI-expert and human experts.

Data availability

The data that support the findings of this study are available in the Extended Data figures (for synthesized pentapeptides), the Supplementary Information (for AI-expert-proposed pentapeptides) and the accompanying code repository at https://doi.org/10.5281/zenodo.6564202 (for tripeptides). Source data are provided with this paper.

Code availability

The codes underlying the AI-expert framework are freely available for general use under a Creative Commons Attribution 4.0 International license and are deposited at https://doi.org/10.5281/zenodo.6564202.

References

  1. Zhu, S. et al. Self-assembly of collagen-based biomaterials: preparation, characterizations and biomedical applications. J. Mater. Chem. B 6, 2650–2676 (2018).

    Article  PubMed  CAS  Google Scholar 

  2. Sorushanova, A. et al. The collagen suprafamily: from biosynthesis to advanced biomaterial development. Adv. Mater. 31, 1801651 (2019).

    Article  Google Scholar 

  3. Lewis, R. V. Spider silk: ancient ideas for new biomaterials. Chem. Rev. 106, 3762–3774 (2006).

    Article  PubMed  CAS  Google Scholar 

  4. Scholes, G. D., Fleming, G. R., Olaya-Castro, A. & Van Grondelle, R. Lessons from nature about solar light harvesting. Nat. Chem. 3, 763–774 (2011).

    Article  PubMed  CAS  Google Scholar 

  5. Luo, Q., Hou, C., Bai, Y., Wang, R. & Liu, J. Protein assembly: versatile approaches to construct highly ordered nanostructures. Chem. Rev. 116, 13571–13632 (2016).

    Article  PubMed  CAS  Google Scholar 

  6. Wei, G. et al. Self-assembling peptide and protein amyloids: from structure to tailored function in nanotechnology. Chem. Soc. Rev. 46, 4661–4708 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Ulijn, R. V. & Smith, A. M. Designing peptide based nanomaterials. Chem. Soc. Rev. 37, 664–675 (2008).

    Article  PubMed  CAS  Google Scholar 

  8. Adler-Abramovich, L. & Gazit, E. The physical properties of supramolecular peptide assemblies: from building block association to technological applications. Chem. Soc. Rev. 43, 6881–6893 (2014).

    Article  PubMed  CAS  Google Scholar 

  9. Wang, M. et al. Nanoribbons self-assembled from short peptides demonstrate the formation of polar zippers between β-sheets. Nat. Commun. 9, 5118 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Lakshmanan, A. et al. Aliphatic peptides show similar self-assembly to amyloid core sequences, challenging the importance of aromatic interactions in amyloidosis. Proc. Natl Acad. Sci. USA 110, 519–524 (2013).

    Article  PubMed  CAS  Google Scholar 

  11. Brahmachari, S., Arnon, Z. A., Frydman-Marom, A., Gazit, E. & Adler-Abramovich, L. Diphenylalanine as a reductionist model for the mechanistic characterization of β-amyloid modulators. ACS Nano 11, 5960–5969 (2017).

    Article  PubMed  CAS  Google Scholar 

  12. Yemini, M., Reches, M., Rishpon, J. & Gazit, E. Novel electrochemical biosensing platform using self-assembled peptide nanotubes. Nano Lett. 5, 183–186 (2005).

    Article  PubMed  CAS  Google Scholar 

  13. Zohrabi, T., Habibi, N., Zarrabi, A., Fanaei, M. & Lee, L. Y. Diphenylalanine peptide nanotubes self-assembled on functionalized metal surfaces for potential application in drug-eluting stent. J. Bio. Mater. Res. A 104, 2280–2290 (2016).

    Article  CAS  Google Scholar 

  14. Tao, K., Makam, P., Aizen, R. & Gazit, E. Self-assembling peptide semiconductors. Science 358, eaam9756 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Yan, X., Zhu, P. & Li, J. Self-assembly and application of diphenylalanine-based nanostructures. Chem. Soc. Rev. 39, 1877–1890 (2010).

    Article  PubMed  CAS  Google Scholar 

  16. Kholkin, A., Amdursky, N., Bdikin, I., Gazit, E. & Rosenman, G. Strong piezoelectricity in bioinspired peptide nanotubes. ACS Nano 4, 610–614 (2010).

    Article  PubMed  CAS  Google Scholar 

  17. Yan, X. et al. Transition of cationic dipeptide nanotubes into vesicles and oligonucleotide delivery. Angew. Chem. Int. Ed. 119, 2483–2486 (2007).

    Article  Google Scholar 

  18. Zhao, X. et al. Molecular self-assembly and applications of designer peptide amphiphiles. Chem. Soc. Rev. 39, 3480–3498 (2010).

    Article  PubMed  CAS  Google Scholar 

  19. Zelzer, M. & Ulijn, R. V. Next-generation peptide nanomaterials: molecular networks, interfaces and supramolecular functionality. Chem. Soc. Rev. 39, 3351–3357 (2010).

    Article  PubMed  CAS  Google Scholar 

  20. Cui, H., Webber, M. J. & Stupp, S. I. Self-assembly of peptide amphiphiles: from molecules to nanostructures to biomaterials. Peptide Sci. Original Res. Biomol. 94, 1–18 (2010).

    CAS  Google Scholar 

  21. Rufo, C. M. et al. Short peptides self-assemble to produce catalytic amyloids. Nat. Chem. 6, 303–309 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Gelain, F., Luo, Z. & Zhang, S. Self-assembling peptide EAK16 and RADA16 nanofiber scaffold hydrogel. Chem. Rev. 120, 13434–13460 (2020).

    Article  PubMed  CAS  Google Scholar 

  23. Solomon, L. A. et al. Tailorable exciton transport in doped peptide-amphiphile assemblies. ACS Nano 11, 9112–9118 (2017).

    Article  PubMed  CAS  Google Scholar 

  24. Palmer, L. C. & Stupp, S. I. Molecular self-assembly into one-dimensional nanostructures. Acc. Chem. Res. 41, 1674–1684 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Zhang, S. Discovery and design of self-assembling peptides. Interface Focus 7, 20170028 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  26. White, S. H. & Wimley, W. C. Hydrophobic interactions of peptides with membrane interfaces. Biochim. Biophys. Acta Biomembr. 1376, 339–352 (1998).

    Article  CAS  Google Scholar 

  27. Wimley, W. C., Creamer, T. P. & White, S. H. Solvation energies of amino acid side chains and backbone in a family of host-guest pentapeptides. Biochemistry 35, 5109–5124 (1996).

    Article  PubMed  CAS  Google Scholar 

  28. Chou, P. Y. & Fasman, G. D. Prediction of protein conformation. Biochemistry 13, 222–245 (1974).

    Article  PubMed  CAS  Google Scholar 

  29. Frederix, P. W. et al. Exploring the sequence space for (tri-) peptide self-assembly to design and discover new hydrogels. Nat. Chem. 7, 30–37 (2015).

    Article  Google Scholar 

  30. Batra, R., Song, L. & Ramprasad, R. Emerging materials intelligence ecosystems propelled by machine learning. Nat. Rev. Mater 6, 655–678 (2021).

    Article  Google Scholar 

  31. Balachandran, P. V., Kowalski, B., Sehirlioglu, A. & Lookman, T. Experimental search for high-temperature ferroelectric perovskites guided by two-step machine learning. Nat. Commun. 9, 1668 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Lookman, T., Balachandran, P. V., Xue, D., Hogden, J. & Theiler, J. Statistical inference and adaptive design for materials discovery. Curr. Opin. Solid State Mater. Sci. 21, 121–128 (2017).

    Article  CAS  Google Scholar 

  33. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, 2011).

  34. Browne, C. B. et al. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1–43 (2012).

    Article  Google Scholar 

  35. Frederix, P. W., Ulijn, R. V., Hunt, N. T. & Tuttle, T. Virtual screening for dipeptide aggregation: toward predictive tools for peptide self-assembly. J. Phys. Chem. Lett. 2, 2380–2384 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. Bekker, H. et al. in Physics Computing Vol. 92, 252–256 RA DeGroot, J Nadrchal (World Scientific Singapore, 1993).

  37. Abraham, M. J. et al. GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1, 19–25 (2015).

    Article  Google Scholar 

  38. Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In Proc. 5th International Conference on Computers and Games 72–83 (Springer, 2006).

  39. Kocsis, L. & Szepesvári, C. Bandit based Monte-Carlo planning. In Proc. 15th European Conference on Machine Learning 282–293 (Springer, 2006).

  40. Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016).

    Article  PubMed  CAS  Google Scholar 

  41. Dieb, T. M., Ju, S., Shiomi, J. & Tsuda, K. Monte Carlo tree search for materials design and discovery. MRS Commun. 9, 532–536 (2019).

    Article  CAS  Google Scholar 

  42. Srinivasan, S. et al. Artificial intelligence-guided De novo molecular design targeting COVID-19. ACS Omega. 6, 12557–12566 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Liu, Y.-C. & Tsuruoka, Y. Modification of improved upper confidence bounds for regulating exploration in Monte-Carlo tree search. Theor. Comput. Sci. 644, 92–105 (2016).

    Article  Google Scholar 

  44. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article  PubMed  CAS  Google Scholar 

  45. Monticelli, L. et al. The Martini coarse-grained force field: extension to proteins. J. Chem. Theory Comput. 4, 819–834 (2008).

    Article  PubMed  CAS  Google Scholar 

  46. Singh, G. & Tieleman, D. P. Using the Wimley-White hydrophobicity scale as a direct quantitative test of force fields: the Martini coarse-grained model. J. Chem. Theory Comput. 7, 2316–2324 (2011).

    Article  PubMed  CAS  Google Scholar 

  47. de Jong, D. H., Periole, X. & Marrink, S. J. Dimerization of amino acid side chains: lessons from the comparison of different force fields. J. Chem. Theory Comput. 8, 1003–1014 (2012).

    Article  PubMed  Google Scholar 

  48. Tang, J. D., Mura, C. & Lampe, K. J. Stimuli-responsive, pentapeptide, nanofiber hydrogel for tissue engineering. J. Am. Chem. Soc. 141, 4886–4899 (2019).

    Article  PubMed  CAS  Google Scholar 

  49. Clarke, D. E., Parmenter, C. D. & Scherman, O. A. Tunable pentapeptide self-assembled β-sheet hydrogels. Angew. Chem. Int. Ed. 57, 7709–7713 (2018).

    Article  CAS  Google Scholar 

  50. Reches, M., Porat, Y. & Gazit, E. Amyloid fibril formation by pentapeptide and tetrapeptide fragments of human calcitonin. J. Bio. Chem. 277, 35475–35480 (2002).

    Article  CAS  Google Scholar 

  51. Guterman, T. et al. Real-time in-situ monitoring of a tunable pentapeptide gel-crystal transition. Angew. Chem. 131, 16016–16022 (2019).

    Article  Google Scholar 

  52. Tsiolaki, P. L., Hamodrakas, S. J. & Iconomidou, V. A. The pentapeptide LQVVR plays a pivotal role in human cystatin C fibrillization. FEBS Lett. 589, 159–164 (2015).

    Article  PubMed  CAS  Google Scholar 

  53. Krysmann, M. J. et al. Self-assembly and hydrogelation of an amyloid peptide fragment. Biochemistry 47, 4597–4605 (2008).

    Article  PubMed  CAS  Google Scholar 

  54. Kong, J. & Yu, S. Fourier transform infrared spectroscopic analysis of protein secondary structures. Acta Biochim. Biophys. Sin. 39, 549–559 (2007).

    Article  PubMed  CAS  Google Scholar 

  55. Fujiwara, K., Toda, H. & Ikeguchi, M. Dependence of α-helical and β-sheet amino acid propensities on the overall protein fold type. BMC Struct. Biol. 12, 18 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  56. RDKit open source toolkit for cheminformatics; http://www.rdkit.org/

  57. Gobbi, A. & Poppinger, D. Genetic optimization of combinatorial libraries. Biotechnol. Bioeng. 61, 47–54 (1998).

    Article  PubMed  CAS  Google Scholar 

  58. Humphrey, W., Dalke, A. & Schulten, K. VMD: visual molecular dynamics. J. Mol. Graph. 14, 33–38 (1996).

    PubMed  CAS  Google Scholar 

  59. martinize.py; http://cgmartini.nl/index.php/tools2/proteins-and-bilayers/204-martinize

  60. Berendsen, H. J., Postma, J. V., van Gunsteren, W. F., DiNola, A. & Haak, J. R. Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81, 3684–3690 (1984).

    Article  CAS  Google Scholar 

  61. Hess, B. P-LINCS: a parallel linear constraint solver for molecular simulation. J. Chem. Theory Comput. 4, 116–122 (2008).

    Article  PubMed  CAS  Google Scholar 

  62. Marrink, S. J., Risselada, H. J., Yefimov, S., Tieleman, D. P. & De Vries, A. H. The Martini force field: coarse grained model for biomolecular simulations. J. Phys. Chem. B 111, 7812–7824 (2007).

    Article  PubMed  CAS  Google Scholar 

  63. Marrink, S. J., De Vries, A. H. & Mark, A. E. Coarse grained model for semiquantitative lipid simulations. J. Phys. Chem. B 108, 750–760 (2004).

    Article  CAS  Google Scholar 

  64. Yesylevskyy, S. O., Schäfer, L. V., Sengupta, D. & Marrink, S. J. Polarizable water model for the coarse-grained Martini force field. PLoS Comput. Biol. 6, e1000810 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  66. Batra, R. et al. Screening of therapeutic agents for COVID-19 using machine learning and ensemble docking studies. J. Phys. Chem. Lett. 11, 7058–7065 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  67. Kim, C., Chandrasekaran, A., Huan, T. D., Das, D. & Ramprasad, R. Polymer genome: a data-powered polymer informatics platform for property predictions. J. Phys. Chem. C 122, 17575–17585 (2018).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

Work performed at the Center for Nanoscale Materials, a US Department of Energy (DOE) Office of Science User Facility, was supported by the US DOE, Office of Basic Energy Sciences, under contract no. DE-AC02-06CH11357, and additionally supported by the University of Chicago and the DOE under DOE contract no. DE-AC02-06CH11357 awarded to UChicago Argonne, LLC, operator of the Argonne National Laboratory. This material is based on work supported by the DOE, Office of Science, BES Data, Artificial Intelligence and Machine Learning at DOE Scientific User Facilities programme (Digital Twins). We gratefully acknowledge the computing resources provided on Bebop, the high-performance computing clusters operated by the Laboratory Computing Resource Center (LCRC) at Argonne National Laboratory. S.K.R.S.S. acknowledges support from the UIC faculty start-up fund. We acknowledge T. Tuttle for sharing computational data on tripeptides.

Author information

Authors and Affiliations

Authors

Contributions

R.B., S.K.R.S.S. and H.C.F. designed the study. R.B., T.D.L., H. Chan, S.S. and S.K.R.S.S. designed the AI-expert algorithm. R.B. organized and analysed the AI-expert output. H. Cui, I.V.K., V.N., L.C.P., L.A.S. and H.C.F. designed the human-expert peptides. H.C.F. performed the experimental work (peptide synthesis, LCMS, UV–vis plate reader and FTIR). R.B., S.K.R.S.S. and H.C.F. wrote the manuscript.

Corresponding authors

Correspondence to H. Christopher Fry or Subramanian K. R. S. Sankaranarayanan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Chemistry thanks Shuguang Zhang, Jin Kim Montclare, Fabien Plisson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Top scoring tripeptides.

Top ranked tripeptides identified using the brute-force computational search on 8000 candidates. The score is based on the reward function rtri. Abbreviations: AP, aggregation propensity; logP; hydrophobicity.

Source data

Extended Data Fig. 2 Overall results for the synthesized pentapeptides.

Computational (AP, logP) and experimental (LC(RT), OD800nm) measurements, along with the associated reward scores (rpenta, rtri) and experimental score (ExpScore) are provided. β-sheet scale corrected rpenta and rtri scores, respectively titled rpentawB and rtriwB, are also included. Cases where aggregation (Agg.) was observed are marked 1 with a bold font.

Source data

Extended Data Fig. 3 Diversity of pentapeptides proposed by our human experts.

Frequency of occurrence of (left panel) amino acids in the 29 human expert proposed sequences and (right panel) the overall charge distribution of those sequences. It is evident that human experts preferred to include V, K and F amino acids and overall charge neutral pentapeptides sequences. The complete list of the pentapeptides proposed by the human experts and the rationale for choosing/rejecting a sequences for synthesis is provided in Supplementary Information Table S2.

Source data

Extended Data Fig. 4 MCTS hyperparameter study.

Effect of the exploration constant c in Eq. 1 on the search efficiency of AI-expert for the case of tripeptides with (a) just the MCTS scheme and (b) with the MCTS+RF scheme. The boxplots showcase the number of runs needed to find the topmost scoring tripeptide. The minima and maxima bounds of box represent the 25th and 75th percentile, the middle line the median, the upper whiskers extended to last datum less than 75th percentile + 1.5(IQR), lower whiskers extended to first datum greater than 25th percentile - 1.5(IQR), and data beyond the whiskers are plotted as individual points. Here, IQR signify interquartile range given by 75th - 25th percentile. The results are based on n=10 statistically independent runs. Number of trials needed using a brute-force or random search (on average) are also shown using dotted lines. The MCTS+RF scheme performs the best—not only is the MCTS+RF scheme less sensitive to the choice of c parameter, it also finds the topmost scoring tripeptide more efficiently. The MCTS+RF scheme with c = 10 was found to be most efficient and thus was selected for the pentapeptide search.

Extended Data Fig. 5 Machine learning aggregation propensity.

Performance of the random forest (RF) model to predict the computed aggregation propensity (AP) in a) tripeptides and b) pentapeptides. In both cases improvement in the RF model performance with increasing size of training data (left panel) is shown, along with an example parity plot of the test data when it constitutes 20 % of the total dataset. In a) n=10 statistically independent runs with a random split of test-train data (from 8000 total cases) were performed. Here, data are presented as mean values +1.5/-1.5 SD. In b) the test-train split (from ~ 6600 total cases using rpenta) was performed in a special manner to capture the progressive improvement of the RF model during the MCTS run. Since within the MCTS+RF scheme the training data was generated in an online fashion, the RF model training set consists of AP values evaluated in the early stages of the MCTS run while the test set contains AP values evaluated in the later stage of the run. Abbreviation: MAE, mean absolute error; SD, standard deviation.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–4, Discussion and Tables 1–4.

Reporting Summary

Source data

Source Data Figure 3

Source data for AI-proposed ALL pentapeptides, top AI, top human, and synthesized pentapeptides.

Source Data Figure 4

Source data for pentapeptide characterization.

Source Data Figure 5

Source data for pentapeptide characterization with beta-sheet factor.

Source Data Extended Data Figure 1

Source data for top-scoring tripeptides.

Source Data Extended Data Figure 2

Source data for overall results for the synthesized pentapeptides.

Source Data Extended Data Figure 3

Source data for diversity analysis of human expert proposed candidates.

Source Data Extended Data Figure 5

Source data for RF surrogate models of aggregation propensity.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Batra, R., Loeffler, T.D., Chan, H. et al. Machine learning overcomes human bias in the discovery of self-assembling peptides. Nat. Chem. (2022). https://doi.org/10.1038/s41557-022-01055-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41557-022-01055-3

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing