Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Deep learning to design nuclear-targeting abiotic miniproteins


There are more amino acid permutations within a 40-residue sequence than atoms on Earth. This vast chemical search space hinders the use of human learning to design functional polymers. Here we show how machine learning enables the de novo design of abiotic nuclear-targeting miniproteins to traffic antisense oligomers to the nucleus of cells. We combined high-throughput experimentation with a directed evolution-inspired deep-learning approach in which the molecular structures of natural and unnatural residues are represented as topological fingerprints. The model is able to predict activities beyond the training dataset, and simultaneously deciphers and visualizes sequence–activity predictions. The predicted miniproteins, termed ‘Mach’, reach an average mass of 10 kDa, are more effective than any previously known variant in cells and can also deliver proteins into the cytosol. The Mach miniproteins are non-toxic and efficiently deliver antisense cargo in mice. These results demonstrate that deep learning can decipher design principles to generate highly active biomolecules that are unlikely to be discovered by empirical approaches.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type



Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Machine-learning model based on directed evolution predicts highly active abiotic miniproteins for macromolecule delivery.
Fig. 2: Machine-learning-based generator–predictor–optimizer loop predicts nuclear-targeting abiotic miniproteins.
Fig. 3: Interpretation of predictor CNN unveils activated substructures.
Fig. 4: Mach miniproteins are highly active in vitro and in vivo and deliver other biomacromolecules into the cytosol.

Data availability

The main data supporting the findings of the current study are available within the paper and its Supplementary Information, which provides additional methods information, supplementary figures and data. Supplementary Table 1 includes sequences and activity of the modular library. Data used for training of the model is available at, and archived in the Zenodo repository53. Source data are provided with this paper.

Code availability

All the code used for model training and analysis is available at, and archived in Zenodo repository at Tutorial Jupyter notebooks are also in the repository, and demo Google Colab notebooks can be found at


  1. Lemonick, S. Exploring chemical space: can AI take us where no human has gone before? Chemical & Engineering News (6 April 2020);

  2. Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37, 1038–1040 (2019).

    Article  CAS  PubMed  Google Scholar 

  3. Stokes, J. M. et al. A deep learning approach to antibiotic discovery. Cell 180, 688–702 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Spänig, S. & Heider, D. Encodings and models for antimicrobial peptide classification for multi-resistant pathogens. BioData Min. 12, 7 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Witten, J. & Witten, Z. Deep learning regression model for antimicrobial peptide design. Preprint at bioRxiv (2019).

  6. Liu, G. et al. Antibody complementarity determining region design using high-capacity machine learning. Bioinformatics 36, 2126–2133 (2020).

    Article  CAS  PubMed  Google Scholar 

  7. Wolfe, J. M. et al. Machine learning to predict cell-penetrating peptides for antisense delivery. ACS Cent. Sci. 4, 512–520 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Su, R., Hu, J., Zou, Q., Manavalan, B. & Wei, L. Empirical comparison and analysis of web-based cell-penetrating peptide prediction tools. Brief Bioinform. 21, 408–420 (2020).

    Article  PubMed  CAS  Google Scholar 

  9. Sanders, W. S., Johnston, C. I., Bridges, S. M., Burgess, S. C. & Willeford, K. O. Prediction of cell penetrating peptides by support vector machines. PLoS Comput. Biol. 7, e1002101 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Manavalan, B., Subramaniyam, S., Shin, T. H., Kim, M. O. & Lee, G. Machine-learning-based prediction of cell-penetrating peptides and their uptake efficiency with improved accuracy. J. Proteome Res. 17, 2715–2726 (2018).

    Article  PubMed  CAS  Google Scholar 

  11. Crook, Z. R., Nairn, N. W. & Olson, J. M. Miniproteins as a powerful modality in drug development. Trends Biochem. Sci. 45, 332–346 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Beaulieu, M.-E. et al. Intrinsic cell-penetrating activity propels Omomyc from proof of concept to viable anti-MYC therapy. Sci. Transl. Med. 11, eaar5012 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Juliano, R. L. The delivery of therapeutic oligonucleotides. Nucleic Acids Res. 44, 6518–6548 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Slastnikova, T. A., Ulasov, A. V., Rosenkranz, A. A. & Sobolev, A. S. Targeted intracellular delivery of antibodies: the state of the art. Front. Pharmacol. 9, 1208 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Miersch, S. & Sidhu, S. S. Intracellular targeting with engineered proteins. F1000Research 5, 1947 (2016).

    Article  CAS  Google Scholar 

  16. Trenevska, I., Li, D. & Banham, A. H. Therapeutic antibodies against intracellular tumor antigens. Front. Immunol. 8, 1001 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Fu, A., Tang, R., Hardie, J., Farkas, M. E. & Rotello, V. M. Promises and pitfalls of intracellular delivery of proteins. Bioconjug. Chem. 25, 1602–1608 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Illien, F. et al. Quantitative fluorescence spectroscopy and flow cytometry analyses of cell-penetrating peptides internalization pathways: optimization, pitfalls, comparison with mass spectrometry quantification. Sci. Rep. 6, 36938 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Wolfe, J. M. et al. Perfluoroaryl bicyclic cell‐penetrating peptides for delivery of antisense oligonucleotides. Angew. Chem. 130, 4846–4849 (2018).

    Article  Google Scholar 

  20. Betts, C. et al. Pip6-PMO, a new generation of peptide–oligonucleotide conjugates with improved cardiac exon skipping activity for DMD treatment. Mol. Ther. Nucleic Acids 1, e38 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Boisguérin, P. et al. Delivery of therapeutic oligonucleotides with cell penetrating peptides. Adv. Drug Deliv. Rev. 87, 52–67 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Chery, J. RNA therapeutics: RNAi and antisense mechanisms and clinical applications. Postdoc J. 4, 35–50 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Mendell, J. R. et al. Eteplirsen for the treatment of Duchenne muscular dystrophy. Ann. Neurol. 74, 637–647 (2013).

    Article  CAS  PubMed  Google Scholar 

  24. Moulton, J. & Jiang, S. Gene knockdowns in adult animals: PPMOs and vivo-morpholinos. Molecules 14, 1304–1323 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. McClorey, G. & Banerjee, S. Cell-penetrating peptides to enhance delivery of oligonucleotide-based therapeutics. Biomedicines 6, 51 (2018).

    Article  PubMed Central  CAS  Google Scholar 

  26. Sarepta Therapeutics announces positive clinical results from MOMENTUM, a Phase 2 clinical trial of SRP-5051 in patients with Duchenne muscular dystrophy amenable to skipping exon 51. GlobeNewswire News Room (2020)

  27. Cardozo, A. K. et al. Cell-permeable peptides induce dose- and length-dependent cytotoxic effects. Biochim. Biophys. Acta 1768, 2222–2234 (2007).

    Article  CAS  PubMed  Google Scholar 

  28. Fadzen, C. M. et al. Chimeras of cell-penetrating peptides demonstrate synergistic improvement in antisense efficacy. Biochemistry 58, 3980–3989 (2019).

    Article  CAS  PubMed  Google Scholar 

  29. Wolfe, J. Peptide Conjugation to Enhance Oligonucleotide Delivery PhD thesis (MIT, 2018).

  30. Wei, L., Tang, J. & Zou, Q. SkipCPP-Pred: an improved and promising sequence-based predictor for predicting cell-penetrating peptides. BMC Genomics 18, 742 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Pandey, P., Patel, V., George, N. V. & Mallajosyula, S. S. KELM-CPPpred: kernel extreme learning machine based prediction model for cell-penetrating peptides. J. Proteome Res. 17, 3214–3222 (2018).

    Article  CAS  PubMed  Google Scholar 

  32. Chen, B. et al. Predicting HLA class II antigen presentation through integrated deep learning. Nat. Biotechnol. 37, 1332–1343 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Lee, E. Y., Wong, G. C. L. & Ferguson, A. L. Machine learning-enabled discovery and design of membrane-active peptides. Bioorg. Med. Chem. 26, 2708–2718 (2018).

    Article  CAS  PubMed  Google Scholar 

  34. Dobchev, D. A. et al. Prediction of cell-penetrating peptides using artificial neural networks. Curr. Comput. Aided Drug Des. 6, 79–89 (2010).

    Article  CAS  PubMed  Google Scholar 

  35. Jearawiriyapaisarn, N. et al. Sustained dystrophin expression induced by peptide-conjugated morpholino oligomers in the muscles of mdx mice. Mol. Ther. 16, 1624–1629 (2008).

    Article  CAS  PubMed  Google Scholar 

  36. Morgan, H. L. The generation of a unique machine description for chemical structures—a technique developed at Chemical Abstracts Service. J. Chem. Doc. 5, 107–113 (1965).

    Article  CAS  Google Scholar 

  37. Rogers, D. & Hahn, M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50, 742–754 (2010).

    Article  CAS  PubMed  Google Scholar 

  38. Moniz, J. R. A. & Krueger, D. Nested LSTMs. Proc. Mach. Learn. Res. 77, 530–544 (2017).

    Google Scholar 

  39. Agrawal, P. et al. CPPsite 2.0: a repository of experimentally validated cell-penetrating peptides. Nucleic Acids Res. 44, D1098–D1103 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Proc. IEEE Int. Conf. Comput. Vis. 618–626 (IEEE, 2017);

  41. McCloskey, K., Taly, A., Monti, F., Brenner, M. P. & Colwell, L. J. Using attribution to decode binding mechanism in neural network models for chemistry. Proc. Natl Acad. Sci. USA 116, 11624–11629 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Sanchez-Lengeling, B. et al. Machine learning for scent: learning generalizable perceptual representations of small molecules. Preprint at (2019).

  43. Hartrampf, N. et al. Synthesis of proteins by automated flow chemistry. Science 368, 980–987 (2020).

    Article  CAS  PubMed  Google Scholar 

  44. Hanvey, J. C. et al. Antisense and antigene properties of peptide nucleic acids. Science 258, 1481–1485 (1992).

    Article  CAS  PubMed  Google Scholar 

  45. Choe, S. et al. The crystal structure of diphtheria toxin. Nature 357, 216–222 (1992).

    Article  CAS  PubMed  Google Scholar 

  46. Wilson, B. A., Reich, K. A., Weinstein, B. R. & Collier, R. J. Active-site mutations of diphtheria toxin: effects of replacing glutamic acid-148 with aspartic acid, glutamine, or serine. Biochemistry 29, 8643–8651 (1990).

    Article  CAS  PubMed  Google Scholar 

  47. Abes, S. et al. Vectorization of morpholino oligomers by the (R–Ahx–R)4 peptide allows efficient splicing correction in the absence of endosomolytic agents. J. Control. Release 116, 304–313 (2006).

    Article  CAS  PubMed  Google Scholar 

  48. Cerrato, C. P., Künnapuu, K. & Langel, Ü. Cell-penetrating peptides with intracellular organelle targeting. Expert Opin. Drug Deliv. 14, 245–255 (2017).

    Article  CAS  PubMed  Google Scholar 

  49. Nischan, N. et al. Covalent attachment of cyclic TAT peptides to GFP results in protein delivery into live cells with immediate bioavailability. Angew. Chem. Int. Ed. 54, 1950–1953 (2015).

    Article  CAS  Google Scholar 

  50. Mijalis, A. J. et al. A fully automated flow-based approach for accelerated peptide synthesis. Nat. Chem. Biol. 13, 464–466 (2017).

    Article  CAS  PubMed  Google Scholar 

  51. Wolfe, J. M. Peptide Conjugation to Enhance Oligonucleotide Delivery (Massachusetts Institute of Technology, 2018).

  52. Sazani, P. et al. Systemically delivered antisense oligomers upregulate gene expression in mouse tissues. Nat. Biotechnol. 20, 1228–1233 (2002).

    Article  CAS  PubMed  Google Scholar 

  53. Mohapatra, S. learningmatter-mit/peptimizer: initial release. Zenodo (2021).

Download references


We thank A. R. Loftis and J. Rodriguez for assistance with recombinant protein expression, C. Backlund for assistance with immunoassays, W. C. Salmon at the W. M. Keck Microscopy Facility at the Whitehead Institute for help with imaging, the Swanson Biotechnology Center Flow Cytometry Facility at the Koch Institute for the use of their flow cytometers and B. Mastis and S. Foley for help with the in vivo studies. We also thank Z.-N. Choo for igniting our interest in machine learning. This research was funded by Sarepta Therapeutics, by the MIT-SenseTime Alliance on Artificial Intelligence and by an award from the Abdul Latif Jameel Clinic for Machine Learning in Health (J-Clinic). C.K.S. (NSF Award no. 4000057398) acknowledges the National Science Foundation Graduate Research Fellowship (NSF grant no. 1122374) for research support.

Author information

Authors and Affiliations



C.K.S., S.M., J.M.W., B.L.P. and R.G.-B. conceptualized the research. J.M.W. and C.M.F. synthesized and tested the modular library. S.M. and R.G.B. developed the machine learning model with input from C.K.S. and B.L.P. C.K.S. synthesized the Mach peptides and constructs, performed experiments and analysed the results. K.B., C.-L.W. and J.A.W. performed the in vivo study with input from A.B.M. C.K.S., S.M., A.L., B.L.P. and R.G.-B. wrote the manuscript with input from all the authors.

Corresponding authors

Correspondence to Rafael Gómez-Bombarelli or Bradley L. Pentelute.

Ethics declarations

Competing interests

B.L.P. is a co-founder of Amide Technologies and of Resolute Bio. Both companies focus on the development of protein and peptide therapeutics. The following authors are inventors on patents and patent applications related to the technology described: J.M.W., C.M.F. and B.L.P are co-inventors on patents WO 2020028254A1 (6 February 2020), WO2019178479A1 (19 September 2019), WO2019079386A1 (25 April 2019) and WO2019079367A1 (24 April 2019), which describe trimeric peptides for antisense delivery, chimeric peptides for antisense delivery, CPPs for antisense delivery and bicyclic peptide oligonucleotide conjugates, respectively. A.B.M., K.B., C.-L.W. and J.A.W. are employees of Sarepta Therapeutics, and Sarepta Therapeutics provided a portion of the funding for the work.

Additional information

Peer review information Nature Chemistry thanks Dominik Heider, Ülo Langel, Zigang Li and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Methods, Discussion, Figs. 1–26, Tables 1–13 and Data.

Reporting Summary

Supplementary Data 1

Sequence and Activity information for peptides from the combinatorial library.

Source data

Source Data Fig. 1

Source Data and peptide sequences for Figure 1C.

Source Data Fig. 2

Source Data for scatterplots showing sequence vs activity in Figure 2C-E.

Source Data Fig. 3

Source Data for bar graph in Figure 3D.

Source Data Fig. 4

Source Data for graphs in Figure 4A-E and 4G-I.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schissel, C.K., Mohapatra, S., Wolfe, J.M. et al. Deep learning to design nuclear-targeting abiotic miniproteins. Nat. Chem. 13, 992–1000 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research