Abstract
There are more amino acid permutations within a 40-residue sequence than atoms on Earth. This vast chemical search space hinders the use of human learning to design functional polymers. Here we show how machine learning enables the de novo design of abiotic nuclear-targeting miniproteins to traffic antisense oligomers to the nucleus of cells. We combined high-throughput experimentation with a directed evolution-inspired deep-learning approach in which the molecular structures of natural and unnatural residues are represented as topological fingerprints. The model is able to predict activities beyond the training dataset, and simultaneously deciphers and visualizes sequence–activity predictions. The predicted miniproteins, termed ‘Mach’, reach an average mass of 10 kDa, are more effective than any previously known variant in cells and can also deliver proteins into the cytosol. The Mach miniproteins are non-toxic and efficiently deliver antisense cargo in mice. These results demonstrate that deep learning can decipher design principles to generate highly active biomolecules that are unlikely to be discovered by empirical approaches.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
The main data supporting the findings of the current study are available within the paper and its Supplementary Information, which provides additional methods information, supplementary figures and data. Supplementary Table 1 includes sequences and activity of the modular library. Data used for training of the model is available at https://github.com/learningmatter-mit/peptimizer, and archived in the Zenodo repository53. Source data are provided with this paper.
Code availability
All the code used for model training and analysis is available at https://github.com/learningmatter-mit/peptimizer, and archived in Zenodo repository at https://zenodo.org/record/4815385#.YK_VCjZKhhE. Tutorial Jupyter notebooks are also in the repository, and demo Google Colab notebooks can be found at github.com/pikulsomesh/tutorials.
References
Lemonick, S. Exploring chemical space: can AI take us where no human has gone before? Chemical & Engineering News (6 April 2020); https://cen.acs.org/physical-chemistry/computational-chemistry/Exploring-chemical-space-AI-take/98/i13
Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).
Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).
Spänig, S. & Heider, D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min. 12, 7 (2019).
Witten, J. & Witten, Z. Deep learning regression model for antimicrobial peptide design. Preprint at bioRxiv https://doi.org/10.1101/692681 (2019).
Liu, G. et al. Antibody complementarity determining region design using high-capacity machine learning. Bioinformatics 36, 2126–2133 (2020).
Wolfe, J. M. et al. Machine learning to predict cell-penetrating peptides for antisense delivery. ACS Cent. Sci. 4, 512–520 (2018).
Su, R., Hu, J., Zou, Q., Manavalan, B. & Wei, L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform. 21, 408–420 (2020).
Sanders, W. S., Johnston, C. I., Bridges, S. M., Burgess, S. C. & Willeford, K. O. Prediction of cell penetrating peptides by support vector machines. PLoS Comput. Biol. 7, e1002101 (2011).
Manavalan, B., Subramaniyam, S., Shin, T. H., Kim, M. O. & Lee, G. Machine-learning-based prediction of cell-penetrating peptides and their uptake efficiency with improved accuracy. J. Proteome Res. 17, 2715–2726 (2018).
Crook, Z. R., Nairn, N. W. & Olson, J. M. Miniproteins as a powerful modality in drug development. Trends Biochem. Sci. 45, 332–346 (2020).
Beaulieu, M.-E. et al. Intrinsic cell-penetrating activity propels Omomyc from proof of concept to viable anti-MYC therapy. Sci. Transl. Med. 11, eaar5012 (2019).
Juliano, R. L. The delivery of therapeutic oligonucleotides. Nucleic Acids Res. 44, 6518–6548 (2016).
Slastnikova, T. A., Ulasov, A. V., Rosenkranz, A. A. & Sobolev, A. S. Targeted intracellular delivery of antibodies: the state of the art. Front. Pharmacol. 9, 1208 (2018).
Miersch, S. & Sidhu, S. S. Intracellular targeting with engineered proteins. F1000Research 5, 1947 (2016).
Trenevska, I., Li, D. & Banham, A. H. Therapeutic antibodies against intracellular tumor antigens. Front. Immunol. 8, 1001 (2017).
Fu, A., Tang, R., Hardie, J., Farkas, M. E. & Rotello, V. M. Promises and pitfalls of intracellular delivery of proteins. Bioconjug. Chem. 25, 1602–1608 (2014).
Illien, F. et al. Quantitative fluorescence spectroscopy and flow cytometry analyses of cell-penetrating peptides internalization pathways: optimization, pitfalls, comparison with mass spectrometry quantification. Sci. Rep. 6, 36938 (2016).
Wolfe, J. M. et al. Perfluoroaryl bicyclic cell‐penetrating peptides for delivery of antisense oligonucleotides. Angew. Chem. 130, 4846–4849 (2018).
Betts, C. et al. Pip6-PMO, a new generation of peptide–oligonucleotide conjugates with improved cardiac exon skipping activity for DMD treatment. Mol. Ther. Nucleic Acids 1, e38 (2012).
Boisguérin, P. et al. Delivery of therapeutic oligonucleotides with cell penetrating peptides. Adv. Drug Deliv. Rev. 87, 52–67 (2015).
Chery, J. RNA therapeutics: RNAi and antisense mechanisms and clinical applications. Postdoc J. 4, 35–50 (2016).
Mendell, J. R. et al. Eteplirsen for the treatment of Duchenne muscular dystrophy. Ann. Neurol. 74, 637–647 (2013).
Moulton, J. & Jiang, S. Gene knockdowns in adult animals: PPMOs and vivo-morpholinos. Molecules 14, 1304–1323 (2009).
McClorey, G. & Banerjee, S. Cell-penetrating peptides to enhance delivery of oligonucleotide-based therapeutics. Biomedicines 6, 51 (2018).
Sarepta Therapeutics announces positive clinical results from MOMENTUM, a Phase 2 clinical trial of SRP-5051 in patients with Duchenne muscular dystrophy amenable to skipping exon 51. GlobeNewswire News Room http://www.globenewswire.com/news-release/2020/12/07/2140613/0/en/Sarepta-Therapeutics-Announces-Positive-Clinical-Results-from-MOMENTUM-a-Phase-2-Clinical-Trial-of-SRP-5051-in-Patients-with-Duchenne-Muscular-Dystrophy-Amenable-to-Skipping-Exon-5.html (2020)
Cardozo, A. K. et al. Cell-permeable peptides induce dose- and length-dependent cytotoxic effects. Biochim. Biophys. Acta 1768, 2222–2234 (2007).
Fadzen, C. M. et al. Chimeras of cell-penetrating peptides demonstrate synergistic improvement in antisense efficacy. Biochemistry 58, 3980–3989 (2019).
Wolfe, J. Peptide Conjugation to Enhance Oligonucleotide Delivery PhD thesis (MIT, 2018).
Wei, L., Tang, J. & Zou, Q. SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides. BMC Genomics 18, 742 (2017).
Pandey, P., Patel, V., George, N. V. & Mallajosyula, S. S. KELM-CPPpred: kernel extreme learning machine based prediction model for cell-penetrating peptides. J. Proteome Res. 17, 3214–3222 (2018).
Chen, B. et al. Predicting HLA class II antigen presentation through integrated deep learning. Nat. Biotechnol. 37, 1332–1343 (2019).
Lee, E. Y., Wong, G. C. L. & Ferguson, A. L. Machine learning-enabled discovery and design of membrane-active peptides. Bioorg. Med. Chem. 26, 2708–2718 (2018).
Dobchev, D. A. et al. Prediction of cell-penetrating peptides using artificial neural networks. Curr. Comput. Aided Drug Des. 6, 79–89 (2010).
Jearawiriyapaisarn, N. et al. Sustained dystrophin expression induced by peptide-conjugated morpholino oligomers in the muscles of mdx mice. Mol. Ther. 16, 1624–1629 (2008).
Morgan, H. L. The generation of a unique machine description for chemical structures—a technique developed at Chemical Abstracts Service. J. Chem. Doc. 5, 107–113 (1965).
Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).
Moniz, J. R. A. & Krueger, D. Nested LSTMs. Proc. Mach. Learn. Res. 77, 530–544 (2017).
Agrawal, P. et al. CPPsite 2.0: a repository of experimentally validated cell-penetrating peptides. Nucleic Acids Res. 44, D1098–D1103 (2015).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Proc. IEEE Int. Conf. Comput. Vis. 618–626 (IEEE, 2017); https://doi.org/10.1109/ICCV.2017.74
McCloskey, K., Taly, A., Monti, F., Brenner, M. P. & Colwell, L. J. Using attribution to decode binding mechanism in neural network models for chemistry. Proc. Natl Acad. Sci. USA 116, 11624–11629 (2019).
Sanchez-Lengeling, B. et al. Machine learning for scent: learning generalizable perceptual representations of small molecules. Preprint at https://arxiv.org/abs/1910.10685 (2019).
Hartrampf, N. et al. Synthesis of proteins by automated flow chemistry. Science 368, 980–987 (2020).
Hanvey, J. C. et al. Antisense and antigene properties of peptide nucleic acids. Science 258, 1481–1485 (1992).
Choe, S. et al. The crystal structure of diphtheria toxin. Nature 357, 216–222 (1992).
Wilson, B. A., Reich, K. A., Weinstein, B. R. & Collier, R. J. Active-site mutations of diphtheria toxin: effects of replacing glutamic acid-148 with aspartic acid, glutamine, or serine. Biochemistry 29, 8643–8651 (1990).
Abes, S. et al. Vectorization of morpholino oligomers by the (R–Ahx–R)4 peptide allows efficient splicing correction in the absence of endosomolytic agents. J. Control. Release 116, 304–313 (2006).
Cerrato, C. P., Künnapuu, K. & Langel, Ü. Cell-penetrating peptides with intracellular organelle targeting. Expert Opin. Drug Deliv. 14, 245–255 (2017).
Nischan, N. et al. Covalent attachment of cyclic TAT peptides to GFP results in protein delivery into live cells with immediate bioavailability. Angew. Chem. Int. Ed. 54, 1950–1953 (2015).
Mijalis, A. J. et al. A fully automated flow-based approach for accelerated peptide synthesis. Nat. Chem. Biol. 13, 464–466 (2017).
Wolfe, J. M. Peptide Conjugation to Enhance Oligonucleotide Delivery (Massachusetts Institute of Technology, 2018).
Sazani, P. et al. Systemically delivered antisense oligomers upregulate gene expression in mouse tissues. Nat. Biotechnol. 20, 1228–1233 (2002).
Mohapatra, S. learningmatter-mit/peptimizer: initial release. Zenodo https://doi.org/10.5281/zenodo.4815385 (2021).
Acknowledgements
We thank A. R. Loftis and J. Rodriguez for assistance with recombinant protein expression, C. Backlund for assistance with immunoassays, W. C. Salmon at the W. M. Keck Microscopy Facility at the Whitehead Institute for help with imaging, the Swanson Biotechnology Center Flow Cytometry Facility at the Koch Institute for the use of their flow cytometers and B. Mastis and S. Foley for help with the in vivo studies. We also thank Z.-N. Choo for igniting our interest in machine learning. This research was funded by Sarepta Therapeutics, by the MIT-SenseTime Alliance on Artificial Intelligence and by an award from the Abdul Latif Jameel Clinic for Machine Learning in Health (J-Clinic). C.K.S. (NSF Award no. 4000057398) acknowledges the National Science Foundation Graduate Research Fellowship (NSF grant no. 1122374) for research support.
Author information
Authors and Affiliations
Contributions
C.K.S., S.M., J.M.W., B.L.P. and R.G.-B. conceptualized the research. J.M.W. and C.M.F. synthesized and tested the modular library. S.M. and R.G.B. developed the machine learning model with input from C.K.S. and B.L.P. C.K.S. synthesized the Mach peptides and constructs, performed experiments and analysed the results. K.B., C.-L.W. and J.A.W. performed the in vivo study with input from A.B.M. C.K.S., S.M., A.L., B.L.P. and R.G.-B. wrote the manuscript with input from all the authors.
Corresponding authors
Ethics declarations
Competing interests
B.L.P. is a co-founder of Amide Technologies and of Resolute Bio. Both companies focus on the development of protein and peptide therapeutics. The following authors are inventors on patents and patent applications related to the technology described: J.M.W., C.M.F. and B.L.P are co-inventors on patents WO 2020028254A1 (6 February 2020), WO2019178479A1 (19 September 2019), WO2019079386A1 (25 April 2019) and WO2019079367A1 (24 April 2019), which describe trimeric peptides for antisense delivery, chimeric peptides for antisense delivery, CPPs for antisense delivery and bicyclic peptide oligonucleotide conjugates, respectively. A.B.M., K.B., C.-L.W. and J.A.W. are employees of Sarepta Therapeutics, and Sarepta Therapeutics provided a portion of the funding for the work.
Additional information
Peer review information Nature Chemistry thanks Dominik Heider, Ülo Langel, Zigang Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Methods, Discussion, Figs. 1–26, Tables 1–13 and Data.
Supplementary Data 1
Sequence and Activity information for peptides from the combinatorial library.
Source data
Source Data Fig. 1
Source Data and peptide sequences for Figure 1C.
Source Data Fig. 2
Source Data for scatterplots showing sequence vs activity in Figure 2C-E.
Source Data Fig. 3
Source Data for bar graph in Figure 3D.
Source Data Fig. 4
Source Data for graphs in Figure 4A-E and 4G-I.
Rights and permissions
About this article
Cite this article
Schissel, C.K., Mohapatra, S., Wolfe, J.M. et al. Deep learning to design nuclear-targeting abiotic miniproteins. Nat. Chem. 13, 992–1000 (2021). https://doi.org/10.1038/s41557-021-00766-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41557-021-00766-3