Deep learning models to predict the editing efficiencies and outcomes of diverse base editors

Kim, Nahye; Choi, Sungchul; Kim, Sungjae; Song, Myungjae; Seo, Jung Hwa; Min, Seonwoo; Park, Jinman; Cho, Sung-Rae; Kim, Hyongbum Henry

doi:10.1038/s41587-023-01792-x

Article
Published: 15 May 2023

Deep learning models to predict the editing efficiencies and outcomes of diverse base editors

Nahye Kim^1,2^na1,
Sungchul Choi¹^na1,
Sungjae Kim³,
Myungjae Song^1,2,
Jung Hwa Seo ORCID: orcid.org/0000-0002-8489-7972⁴,
Seonwoo Min⁵,
Jinman Park^1,2,
Sung-Rae Cho ORCID: orcid.org/0000-0003-1429-2684^2,4,6,7 &
…
Hyongbum Henry Kim ORCID: orcid.org/0000-0002-4693-738X^{1,2,8,9,10,11,12,13}

Nature Biotechnology volume 42, pages 484–497 (2024)Cite this article

7102 Accesses
9 Citations
48 Altmetric
Metrics details

Subjects

Abstract

Applications of base editing are frequently restricted by the requirement for a protospacer adjacent motif (PAM), and selecting the optimal base editor (BE) and single-guide RNA pair (sgRNA) for a given target can be difficult. To select for BEs and sgRNAs without extensive experimental work, we systematically compared the editing windows, outcomes and preferred motifs for seven BEs, including two cytosine BEs, two adenine BEs and three C•G to G•C BEs at thousands of target sequences. We also evaluated nine Cas9 variants that recognize different PAM sequences and developed a deep learning model, DeepCas9variants, for predicting which variants function most efficiently at sites with a given target sequence. We then develop a computational model, DeepBE, that predicts editing efficiencies and outcomes of 63 BEs that were generated by incorporating nine Cas9 variants as nickase domains into the seven BE variants. The predicted median efficiencies of BEs with DeepBE-based design were 2.9- to 20-fold higher than those of rationally designed SpCas9-containing BEs.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: The editing window, product purity and preferred motifs for CBE and ABE variants.**

**Fig. 2: Editing window, product purity and preferred motifs for CGBE variants.**

**Fig. 3: Comparison of Cas9 variants with different PAM compatibilities and incorporation of the variants as the nickase domain for BEs.**

**Fig. 4: Specificity of SpCas9-YE1-BE4max, SpCas9-ABE8e(V106W) and SpCas9 variants with different PAM compatibilities.**

**Fig. 5: Deep-learning-based computational prediction of activities of Cas9 nuclease variants and BE variants.**

**Fig. 6: Correction of pathogenic or likely pathogenic single nucleotide polymorphisms using BEs.**

Sequence-specific prediction of the efficiencies of adenine and cytosine base editors

Article 06 July 2020

Prediction of base editor off-targets by deep learning

Article Open access 02 September 2023

Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods

Article Open access 12 August 2021

Data availability

We have submitted the deep sequencing data from this study to the NCBI Sequence Read Archive under accession number PRJNA821929 (ref. ⁵²). We provide the datasets used in this study as Supplementary Tables 1–9. Source data are provided with this paper.

Code availability

We have made the source code for DeepBE available on Github at https://github.com/NahyeKim/DeepBE (ref. ⁵³).

References

Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kurt, I. C. et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat. Biotechnol. 39, 41–46 (2021).
Article CAS PubMed Google Scholar
Zhao, D. et al. Glycosylase base editors enable C-to-A and C-to-G base changes. Nat. Biotechnol. 39, 35–40 (2021).
Article CAS PubMed Google Scholar
Chen, L. et al. Programmable C:G to G:C genome editing with CRISPR-Cas9-directed base excision repair proteins. Nat. Commun. 12, 1384 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Koblan, L. W. et al. Efficient C*G-to-G*C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat. Biotechnol. 39, 1414–1425 (2021).
Article CAS PubMed PubMed Central Google Scholar
Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774 (2017).
Article PubMed PubMed Central Google Scholar
Doman, J. L., Raguram, A., Newby, G. A. & Liu, D. R. Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nat. Biotechnol. 38, 620–628 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yu, Y. et al. Cytosine base editors with minimized unguided DNA and RNA off-target events and high on-target activity. Nat. Commun. 11, 2052 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Richter, M. F. et al. Author correction: Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 38, 901 (2020).
Article CAS PubMed Google Scholar
Gaudelli, N. M. et al. Directed evolution of adenine base editors with increased activity and therapeutic application. Nat. Biotechnol. 38, 892–900 (2020).
Article CAS PubMed Google Scholar
Kleinstiver, B. P. et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).
Article ADS PubMed PubMed Central Google Scholar
Kleinstiver, B. P. et al. High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Miller, S. M. et al. Continuous evolution of SpCas9 variants compatible with non-G PAMs. Nat. Biotechnol. 38, 471–481 (2020).
Article CAS PubMed PubMed Central Google Scholar
Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants. Science 368, 290–296 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Chatterjee, P. et al. An engineered ScCas9 with broad PAM range and high specificity and activity. Nat. Biotechnol. 38, 1154–1158 (2020).
Article CAS PubMed Google Scholar
Chatterjee, P. et al. A Cas9 with PAM recognition for adenine dinucleotides. Nat. Commun. 11, 2474 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, N. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat. Biotechnol. 38, 1328–1336 (2020).
Article CAS PubMed Google Scholar
Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zafra, M. P. et al. Optimized base editors enable efficient editing in cells, organoids and mice. Nat. Biotechnol. 36, 888–893 (2018).
Article CAS PubMed PubMed Central Google Scholar
Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. 38, 1037–1043 (2020).
Article CAS PubMed Google Scholar
Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).
Article CAS PubMed Google Scholar
Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198–206 (2021).
Article CAS PubMed Google Scholar
Jin, S. et al. Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292–295 (2019).
Article ADS CAS PubMed Google Scholar
Zuo, E. et al. Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. Science 364, 289–292 (2019).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Grunewald, J. et al. Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors. Nature 569, 433–437 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371–376 (2017).
Article CAS PubMed PubMed Central Google Scholar
Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 e430 (2020).
Article CAS PubMed PubMed Central Google Scholar
Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR-Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).
Article CAS PubMed Google Scholar
Kim, H. K. et al. In vivo high-throughput profiling of CRISPR-Cpf1 activity. Nat. Methods 14, 153–159 (2017).
Article CAS PubMed Google Scholar
Beale, R. C. et al. Comparison of the differential context-dependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo. J. Mol. Biol. 337, 585–596 (2004).
Article CAS PubMed Google Scholar
Liu, L. D. et al. Intrinsic nucleotide preference of diversifying base editors guides antibody ex vivo affinity maturation. Cell Rep. 25, 884–892 e883 (2018).
Article CAS PubMed Google Scholar
Kim, H. K. et al. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).
Article CAS PubMed Google Scholar
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
Article CAS PubMed Google Scholar
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).
Article CAS PubMed Google Scholar
Ding, Q. et al. Permanent alteration of PCSK9 with in vivo CRISPR-Cas9 genome editing. Circ. Res. 115, 488–492 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).
Article ADS CAS PubMed PubMed Central Google Scholar
Chadwick, A. C., Wang, X. & Musunuru, K. In vivo base editing of PCSK9 (proprotein convertase subtilisin/kexin type 9) as a therapeutic alternative to genome editing. Arterioscler. Thromb. Vasc. Biol. 37, 1741–1747 (2017).
Article CAS PubMed PubMed Central Google Scholar
Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).
Article PubMed PubMed Central Google Scholar
Sastry, L., Xu, Y., Cooper, R., Pollok, K. & Cornetta, K. Evaluation of plasmid DNA removal from lentiviral vectors by benzonase treatment. Hum. Gene Ther. 15, 221–226 (2004).
Article CAS PubMed Google Scholar
Sack, L. M., Davoli, T., Xu, Q., Li, M. Z. & Elledge, S. J. Sources of error in mammalian genetic screens. G3 (Bethesda) 6, 2781–2790 (2016).
Article CAS PubMed PubMed Central Google Scholar
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Article CAS PubMed PubMed Central Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article ADS CAS PubMed Google Scholar
Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In Proc. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) 265–283 (USENIX Association, 2016).
Kim, N. et al. Evaluation of Cas9 and base editor variants Datasets. Sequence Read Archive. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA821929/ (2023).
Kim, N. et al. Deep learning models to predict the editing efficiencies and outcomes of diverse base editors. Source Code. GitHub https://github.com/NahyeKim/DeepBE (2023).

Download references

Acknowledgements

We would like to thank Y. Kim and S. Park for assisting with the experiments and S. H. Kwon and J. Yu for providing technical advice. This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2022R1A3B1078084 (H.H.K.), 2018R1A5A2025079 (H.H.K.) and 2022R1C1C2004229 (N.K.)), the Bio and Medical Technology Development Program of the NRF funded by the Korean government (MSIT) (2022M3A9E4017127 (H.H.K.) and 2022M3A9F3017506 (H.H.K.)), the Korea Drug Development Fund funded by the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, and the Ministry of Health and Welfare, Republic of Korea (HN21C0917 (H.H.K.)), the Yonsei Signature Research Cluster Program of 2021-22-0014 (H.H.K.), the Brain Korea 21 FOUR Project for Medical Science (Yonsei University College of Medicine), the SNUH Kun-hee Lee Child Cancer and Rare Disease Project, Republic of Korea (22B-000-0101 (H.H.K.)), the Yonsei Fellow Program, funded by Lee Youn Jae, and the Korea Health Technology R&D Project funded by the Ministry of Health and Welfare, Republic of Korea (HI21C1314 (H.H.K.)).

Author information

These authors contributed equally: Nahye Kim, Sungchul Choi.

Authors and Affiliations

Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea
Nahye Kim, Sungchul Choi, Myungjae Song, Jinman Park & Hyongbum Henry Kim
Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul, Republic of Korea
Nahye Kim, Myungjae Song, Jinman Park, Sung-Rae Cho & Hyongbum Henry Kim
Precision Medicine Institute, Macrogen, Seoul, Republic of Korea
Sungjae Kim
Department and Research Institute of Rehabilitation Medicine, Yonsei University College of Medicine, Seoul, Republic of Korea
Jung Hwa Seo & Sung-Rae Cho
LG AI Research, Seoul, Republic of Korea
Seonwoo Min
Graduate Program of Biomedical Engineering, Yonsei University College of Medicine, Seoul, Republic of Korea
Sung-Rae Cho
Rehabilitation Institute of Neuromuscular Disease, Yonsei University College of Medicine, Seoul, Republic of Korea
Sung-Rae Cho
Graduate Program of NanoScience and Technology, Yonsei University, Seoul, Republic of Korea
Hyongbum Henry Kim
Center for Nanomedicine, Institute for Basic Science (IBS), Seoul, Republic of Korea
Hyongbum Henry Kim
Yonsei-IBS Institute, Yonsei University, Seoul, Republic of Korea
Hyongbum Henry Kim
Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea
Hyongbum Henry Kim
Institute for Immunology and Immunological Diseases, Yonsei University College of Medicine, Seoul, Republic of Korea
Hyongbum Henry Kim
Department of Otolaryngology, University of California, San Francisco, CA, USA
Hyongbum Henry Kim

Authors

Nahye Kim
View author publications
You can also search for this author in PubMed Google Scholar
Sungchul Choi
View author publications
You can also search for this author in PubMed Google Scholar
Sungjae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Myungjae Song
View author publications
You can also search for this author in PubMed Google Scholar
Jung Hwa Seo
View author publications
You can also search for this author in PubMed Google Scholar
Seonwoo Min
View author publications
You can also search for this author in PubMed Google Scholar
Jinman Park
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Rae Cho
View author publications
You can also search for this author in PubMed Google Scholar
Hyongbum Henry Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.K. performed all wet experiments, including the high-throughput evaluations. S.C. and N.K. developed deep-learning-based prediction models. M.S. and N.K. determined base editing efficiencies at endogenous sites. S.M. developed related web tools. S.K. contributed substantially to bioinformatics analyses. J.P. contributed to bioinformatic analyses. J.H.S. and S.-R.C. performed western blotting to measure Cas9 variant and BE variant protein levels. H.K.K. and N.K. conceived and designed the study. N.K. and H.H.K. analyzed the data and wrote the paper.

Corresponding author

Correspondence to Hyongbum Henry Kim.

Ethics declarations

Competing interests

Yonsei University has filed a patent (10-2022-0053742) based on this work, in which N.K. and H.H.K. are listed as the coinventors. H.H.K. is a consultant for EcoR1 capital.

Peer review

Peer review information

Nature Biotechnology thanks Hui Yang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Generation and evaluation of variant-expressing cell lines.

a, Schematic overview of the library experiment. b, Western blot analysis of Cas9 protein levels in cells expressing Cas9 variants, base editor variants with different deaminases, and base editor variants containing Cas9 variants. The results from three independent western blot analyses are presented (n = 3). Error bars show mean ± s.e.m. Subsets of variants without statistically significant differences (P > 0.05, analysis of variance (ANOVA) followed by Tukey’s post hoc test (two-sided)) in their protein levels are indicated by letters, such as a and b.

Source data

Extended Data Fig. 2 Comparison of base editing efficiencies induced by different CBEs (a) and ABEs (b).

The red triangles indicate target sequences at which the editing efficiency of one base editor is at least 30% higher than that of the other base editor. Heatmaps show the relative distribution of nucleotides neighboring the target nucleotide in the target sequences within the red triangles. Results at position 6 are also shown in Fig. 1d, h. The number of target sequences (n) = 1,027 and 2,903 at position 5, 917 and 3,119 for position 6, and 701 and 3,255 at position 7 for NG-YE1-BE4max and NG-SsAPOBEC3B, respectively (a) and 4,026 and 701 at position 4, 3,448 and 698 at position 5, 3,258 and 674 at position 6, 3,254 and 847 at position 7, and 3,255 and 1,081 at position 8 for NG-ABE8e(V106W) and NG-ABE8.17-m + V106W, respectively (b).

Extended Data Fig. 3 Comparison of base editing efficiencies induced by different CGBEs.

The red triangles indicate target sequences at which the editing efficiency of one base editor is at least 30% higher than that of the other base editor. Heatmaps show the relative distribution of nucleotides neighboring the target nucleotide in target sequences within the red triangles. Results at position 6 are also shown in Fig. 2d–f. The number of target sequences (n) = 1,807 and 830 at position 5, 1,413 and 957 at position 6, and 1,249 and 1,227 at position 7 for NG-CGBE1 and NG-miniCGBE1, respectively (a), 2,643 and 972 at position 5, 2,894 and 817 at position 6, 2,706 and 838 at position 7 for NG-miniCGBE1 and NG-APOBEC-nCas9-Ung, respectively (b), and 2,885 and 808 at position 5, 2,995 and 766 at position 6, and 2,715 and 891 at position 7 for NG-CGBE1 and NG-APOBEC-nCas9-Ung, respectively (c).

Extended Data Fig. 4 Average Cas9 variant-induced indel frequencies at target sequences containing each 4-nt PAM sequence.

If the average indel frequencies were lower than 10%, the corresponding candidate PAMs are shown in white. The red boxes indicate PAM sequences that were associated with the highest average indel frequencies among the examined Cas9 variants with different PAM compatibilities; these PAM sequences at which the corresponding Cas9 variants showed the highest average indel frequencies are also shown in Fig. 3b. SpCas9 (a), VRAR variant (b), xCas9 (c), SpCas9-NG (d), SpCas9-NRRH (e), SpCas9-NRRH (f), SpCas9-NRCH (g), SpG (h), SpRY (l), and Sc++ (j) are used.

Extended Data Fig. 5 Comparison of Cas9 variants with different PAM compatibilities.

Maximum average indel frequencies generated by any of the ten Cas9 variants (left heatmaps) and the corresponding Cas9 variants that showed the highest average activities (right heatmaps) at target sequences containing each 4-nt PAM sequence. If the maximum average indel frequencies were lower than 5% (a) and 20% (b), the corresponding candidate PAMs were shown in white.

Extended Data Fig. 6 Representative dot plots showing correlations between indel frequencies induced by Cas9 variants at targets with 12 example PAMs.

The Pearson correlation coefficients (r) are shown. The number of target sequences (n) = 29 for SpCas9 and VRQR, 30 for SpCas9 and xCas9, 30 for SpCas9 and SpCas9-NG, 30 for SpCas9 and SpCas9-NRRH, 28 for SpCas9 and SpCas9-NRTH, 30 for SpCas9 and SpCas9-NRCH, 30 for SpCas9 and SpRY, 30 for VRQR and SpCas9-NRRH, 30 for xCas9 and SpCas9-NRRH, 29 for SpCas9-NRCH and SpG, and 28 for SpCas9-NRCH and SpRY.

Extended Data Fig. 7 Correlations between predicted DeepNG-BE_efficiency and DeepNG-BE_proportion_30bp scores and measured base editing efficiencies (a) and proportions (b).

The Pearson correlation coefficient (r) and the Spearman correlation (R) are shown. The number of target sequences (n) = 1,753, 1,998, 2,218, 2,214, 1,754, 1,752, and 1,750 (a) and 2,633, 6,446, 3,212, 2,320, 1,060, 859, and 748 (b) for NG-YE1-BE4max, NG-SsAPOBEC3B, NG-ABE8e(V106W), NG-ABE8.17-m + V106W, NG-CGBE1, NG-miniCGBE1, and NG-APOBEC-nCas9-Ung, respectively.

Extended Data Fig. 8 Architecture of DeepBE generation.

To predict overall base editing efficiency (DeepBE-efficiency), we used the prediction scores of DeepCas9variants along with input sequence information for the base editing window ± 1 base pair. Additionally, we obtained DeepBE-proportion scores based on the 20-base pair target sequence information. The DeepBE-proportion scores and DeepBE-efficiency scores were multiplied together to generate DeepBE prediction scores.

Extended Data Fig. 9 Performance of conventional machine learning- or shallow learning-based models and deep-learning-based models in predicting the efficiencies of Cas9 nuclease (a) and base editor variants (b).

Heatmaps show Pearson’s (left panel) and Spearman’s (right panel) correlation coefficients. In b, the deamination and nuclease domains are shown to indicate base editor variants.

Extended Data Fig. 10 Evaluation of DeepBE in predicting base editor activities at endogenous targets in three different cell lines.

a-c, Correlations between DeepBE-predicted base editing outcome frequencies and measured outcome frequencies at endogenous targets in HEK293T (a), HCT116 (b), and U20S (c) cells. The number of targets (n) = 240, 312, and 174 (a), 219, 293, and 158 (b), and 219, 330, and 181 (c) for SpCas9-NRCH-YE1-BE4max, SpCas9-NRCH-ABE8e(V106W), and SpCas9-NRCH-CGBE1, respectively. d,e, Comparison of measured base editing outcome frequencies at targets with different chromatin accessibilities (d) and different functional regions (that is, coding versus non-coding) (e). The boxes represent the 25^th, 50^th, and 75^th percentiles; the whiskers show the 10^th and 90^th percentiles. The Wilcoxon rank-sum test; two-sided. The number of targets (n) = 114 (Non-DNase I hypersensitive sites (DHS)) and 33 (DHS; HEK293T), 102 and 40 (HCT116), and 98 and 28 (U2OS), 267 and 44 (HEK293T), 246 and 47 (HCT116), and 269 and 61 (U2OS), and 144 and 30 (HEK293T), 71 and 23 (HCT116), and 151 (Non-DHS) and 30 (HDS; U2OS) (d) and 79 (coding) and 68 (non-coding; HEK293T), 80 and 62 (HCT116), and 71 and 55 (U2OS), 150 and 161 (HEK293T), 149 and 144 (HCT116), and 143 and 187 (U2OS), and 78 and 96 (HEK293T), 79 and 40 (HCT116), and 89 (coding) and 92 (non-coding; U2OS) (e) for NRCH-YE1-BE4max, NRCH-ABE8e(V106W), and NRCH-CGBE1, respectively.

Supplementary information

Supplementary Information

Supplementary Figs. 1–24.

Reporting Summary

Supplementary Tables 1–9

Supplementary Tables 1–9 (titles and descriptions of each table are included in each spreadsheet file).

Source data

Source Data Extended Data Fig. 1

Unprocessed original images of gels and western blots.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kim, N., Choi, S., Kim, S. et al. Deep learning models to predict the editing efficiencies and outcomes of diverse base editors. Nat Biotechnol 42, 484–497 (2024). https://doi.org/10.1038/s41587-023-01792-x

Download citation

Received: 29 September 2022
Accepted: 13 April 2023
Published: 15 May 2023
Issue Date: March 2024
DOI: https://doi.org/10.1038/s41587-023-01792-x

This article is cited by

An adenine base editor variant expands context compatibility
- Yu-Lan Xiao
- Yuan Wu
- Weixin Tang
Nature Biotechnology (2024)
Recent advances in CRISPR-based functional genomics for the study of disease-associated genetic variants
- Heon Seok Kim
- Jiyeon Kweon
- Yongsub Kim
Experimental & Molecular Medicine (2024)
Deep learning models to predict the editing efficiencies and outcomes of diverse base editors
- Nahye Kim
- Sungchul Choi
- Hyongbum Henry Kim
Nature Biotechnology (2024)
Base Editors-Mediated Gene Therapy in Hematopoietic Stem Cells for Hematologic Diseases
- Chengpeng Zhang
- Jinchao Xu
- Peng Xu
Stem Cell Reviews and Reports (2024)
Optimization of base editors for the functional correction of SMN2 as a treatment for spinal muscular atrophy
- Christiano R. R. Alves
- Leillani L. Ha
- Benjamin P. Kleinstiver
Nature Biomedical Engineering (2023)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links