Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Sequence-specific prediction of the efficiencies of adenine and cytosine base editors

Abstract

Base editors, including adenine base editors (ABEs)1 and cytosine base editors (CBEs)2,3, are widely used to induce point mutations. However, determining whether a specific nucleotide in its genomic context can be edited requires time-consuming experiments. Furthermore, when the editable window contains multiple target nucleotides, various genotypic products can be generated. To develop computational tools to predict base-editing efficiency and outcome product frequencies, we first evaluated the efficiencies of an ABE and a CBE and the outcome product frequencies at 13,504 and 14,157 target sequences, respectively, in human cells. We found that there were only modest asymmetric correlations between the activities of the base editors and Cas9 at the same targets. Using deep-learning-based computational modeling, we built tools to predict the efficiencies and outcome frequencies of ABE- and CBE-directed editing at any target sequence, with Pearson correlations ranging from 0.50 to 0.95. These tools and results will facilitate modeling and therapeutic correction of genetic diseases by base editing.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Fig. 1: Characterization of adenine and CBEs based on large-scale activity data.
Fig. 2: Development and evaluation of computational models predicting both efficiencies and outcomes of ABE- and CBE-induced base conversion at given target sequences.
Fig. 3: Predicted results for ABE- and CBE-induced modeling and correction of disease-relevant human point mutations.

Data availability

The deep sequencing data from this study have been submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive under accession number SRP150719 (PRJNA476544). The data sets used in this study are provided as Supplementary Tables 2, 3 and 4.

Code availability

Source code for DeepABE, DeepCBE, DeepCBE-CA and custom Python scripts used for the base-editing frequency and outcome calculations are available on github (at https://github.com/MyungjaeSong/Paired-Library and https://github.com/CRISPRJWCHOI/BaseEditing_tool).

References

  1. Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  2. Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, pii: aaf8729 (2016).

    Google Scholar 

  4. Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).

    CAS  PubMed  Google Scholar 

  5. Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Wong, N., Liu, W. & Wang, X. WU-CRISPR: characteristics of functional guide RNAs for the CRISPR/Cas9 system. Genome Biol. 16, 218 (2015).

    PubMed  PubMed Central  Google Scholar 

  8. Xu, H. et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 1147–1157 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  9. Moreno-Mateos, M. A. et al. CRISPRscan: designing highly efficient sgRNAs for CRISPR-–Cas9 targeting in vivo. Nat. Methods 12, 982–988 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  10. Chari, R., Mali, P., Moosburner, M. & Church, G. M. Unraveling CRISPR–Cas9 genome engineering parameters via a library-on-library approach. Nat. Methods 12, 823–826 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  11. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  12. Kim, H. K. et al. In vivo high-throughput profiling of CRISPR-Cpf1 activity. Nat. Methods 14, 153–159 (2017).

    CAS  PubMed  Google Scholar 

  13. Kuan, P. F. et al. A systematic evaluation of nucleotide properties for CRISPR sgRNA design. BMC Bioinformatics 18, 297 (2017).

    PubMed  PubMed Central  Google Scholar 

  14. Chari, R., Yeo, N. C., Chavez, A. & Church, G. M. sgRNA Scorer 2.0: a species-independent model to predict CRISPR/Cas9 activity. ACS Synth. Biol. 6, 902–904 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Labuhn, M. et al. Refined sgRNA efficacy prediction improves large- and small-scale CRISPR-Cas9 applications. Nucleic Acids Res. 46, 1375–1385 (2018).

    CAS  PubMed  Google Scholar 

  16. Peng, H., Zheng, Y., Blumenstein, M., Tao, D. & Li, J. CRISPR/Cas9 cleavage efficiency regression through boosting algorithms and Markov sequence profiling. Bioinformatics 34, 3069–3077 (2018).

    CAS  PubMed  Google Scholar 

  17. Wilson, L. O. W., Reti, D., O’Brien, A. R., Dunne, R. A. & Bauer, D. C. High activity target-site identification using phenotypic independent CRISPR-Cas9 core functionality. CRISPR J. 1, 182–190 (2018).

    CAS  PubMed  Google Scholar 

  18. Chuai, G. et al. DeepCRISPR: optimized CRISPR guide RNA design by deep learning. Genome Biol. 19, 80 (2018).

    PubMed  PubMed Central  Google Scholar 

  19. Kim, H. K. et al. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).

    CAS  PubMed  Google Scholar 

  20. Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Bae, S., Kweon, J., Kim, H. S. & Kim, J. S. Microhomology-based choice of Cas9 nuclease target sites. Nat. Methods 11, 705–706 (2014).

    CAS  PubMed  Google Scholar 

  22. Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. Chakrabarti, A. M. et al. Target-specific precision of CRISPR-mediated genome editing. Mol. Cell 73, 699–713 (2019). e696.

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37, 64–72 (2018).

    Google Scholar 

  25. Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774 (2017).

    PubMed  PubMed Central  Google Scholar 

  26. Truong, D. J. et al. Development of an intein-mediated split-Cas9 system for gene therapy. Nucleic Acids Res. 43, 6450–6458 (2015).

    CAS  PubMed  PubMed Central  Google Scholar 

  27. Lai, Y. et al. Efficient in vivo gene expression by trans-splicing adeno-associated viral vectors. Nat. Biotechnol. 23, 1435–1439 (2005).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Ryu, S. M. et al. Adenine base editing in mouse embryos and an adult mouse model of Duchenne muscular dystrophy. Nat. Biotechnol. 36, 536–539 (2018).

    CAS  PubMed  Google Scholar 

  29. Villiger, L. et al. Treatment of a metabolic liver disease by in vivo genome base editing in adult mice. Nat. Med. 24, 1519–1525 (2018).

    CAS  PubMed  Google Scholar 

  30. Zhou, C. et al. Off-target RNA mutation induced by DNA base editing and its elimination by mutagenesis. Nature 571, 275–278 (2019).

    CAS  PubMed  Google Scholar 

  31. Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Gehrke, J. M. et al. An APOBEC3A-Cas9 base editor with minimized bystander and off-target activities. Nat. Biotechnol. 36, 977–982 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Thuronyi, B. W. et al. Continuous evolution of base editors with expanded target compatibility and improved activity. Nat. Biotechnol. 37, 1070–1079 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Grunewald, J. et al. CRISPR DNA base editors with reduced RNA off-target and self-editing activities. Nat. Biotechnol. 37, 1041–1048 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Doman, J. L., Raguram, A., Newby, G. A. & Liu, D. R. Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nat. Biotechnol. 38, 620–628 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  36. Zuo, E. et al. A rationally engineered cytosine base editor retains high on-target activity while reducing both DNA and RNA off-target effects. Nat. Methods 17, 600–604 (2020).

    CAS  PubMed  Google Scholar 

  37. Gaudelli, N. M. et al. Directed evolution of adenine base editors with increased activity and therapeutic application. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0491-6 (2020).

  38. Richter, M. F. et al. Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. https://doi.org/10.1038/s41587-020-0453-z (2020).

  39. Shalem, O. et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 343, 84–87 (2014).

    CAS  PubMed  Google Scholar 

  40. Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Kim, H. W. et al. Differential effects on sodium current impairments by distinct SCN1A mutations in GABAergic neurons derived from Dravet syndrome patients. Brain Dev. 40, 287–298 (2018).

    PubMed  Google Scholar 

  42. Consortium, E. P. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

    Google Scholar 

  43. Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).

    PubMed  PubMed Central  Google Scholar 

  44. Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

    Google Scholar 

  45. Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    CAS  PubMed  Google Scholar 

  46. Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    CAS  PubMed  Google Scholar 

  48. Caruana, R., Lawrence, S. & Giles, C. L. Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In Proc. Advances in Neural Information Processing Systems Vol. 13 (eds Leen, T. K. et al.) 402 (MIT Press, 2000); https://dl.acm.org/doi/proceedings/10.5555/3008751

  49. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Machine Learn. Res. 15, 1929–1958 (2014).

    Google Scholar 

  50. Kingma, D. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980 (2014).

  51. Wolfe, J., Jin, X., Bahr, T. & Holzer, N. Application of Softmax regression and its validation for spectral-based land cover mapping. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. XLII-1/W1, 455–459 (2017).

    Google Scholar 

Download references

Acknowledgements

We are grateful to W. Wurst and O. Ortiz for providing the split-Cas9 plasmids. We thank J. Lee, J. Kweon, Y. Kim and G. Yu for technical assistance. We are also grateful to S. Park and Y. Kim for assisting with the experiments. This work was supported in part by the National Research Foundation of Korea (grant nos. 2017R1A2B3004198 (H.K.), 2017M3A9B4062403 (H.K.) and 2018R1A5A2025079 (H.K.)), Brain Korea 21 Plus Project (Yonsei University College of Medicine), Yonsei University Future-leading Research Initiative of 2015 (Challenge; grant no. RMS2 2015-22-0092) and the Korean Health Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (grant nos. HI17C0676 (H.K.) and HI16C1012 (H.K.)).

Author information

Authors and Affiliations

Authors

Contributions

M.S. performed experiments to build data sets of efficiencies and outcomes of base editing at integrated and endogenous sites. Y.K. and S.-Y.S. critically assisted the base-editing data generation by M.S., S.L., S.M., S.Y., M.S. and H.K.K. developed the framework and carried out the model training, computational validation and development of the web tools. J.P. and J.W.C. contributed substantially to bioinformatics analyses. Z.Q., J.H.K. and H.C.K. contributed to experiments using human iPSCs. H.H.K. conceived and designed the study. H.K.K., M.S., S.L. and H.H.K. analyzed the data and wrote the manuscript.

Corresponding author

Correspondence to Hyongbum Henry Kim.

Ethics declarations

Competing interests

Yonsei University has filed a patent application based on this work, in which M.S., H.K.K., S.L. and H.H.K. are coinventors (patent no. PCT/KR2019/011166).

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Text, Figs. 1–12 and Table 1.

Reporting Summary

Supplementary Table 2

Data sets of ABE- and CBE-directed target conversion efficiencies and edited sequence outcomes at integrated sequences.

Supplementary Table 3

Data sets of ABE- and CBE-directed target conversion efficiencies and edited sequence outcomes at endogenous sites.

Supplementary Table 4

Data subsets of Endo_ABE and Endo_CBE obtained by stratified random sampling and the exact P value for Fig. 2e.

Supplementary Table 5

The results of computational prediction of the base-editing efficiencies and outcomes for the ABE- and CBE-directed generation and correction of pathogenic/likely pathogenic point mutations.

Supplementary Table 6

Oligonucleotides used in this study.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Song, M., Kim, H.K., Lee, S. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat Biotechnol 38, 1037–1043 (2020). https://doi.org/10.1038/s41587-020-0573-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-020-0573-5

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing