Abstract
Applications of base editing are frequently restricted by the requirement for a protospacer adjacent motif (PAM), and selecting the optimal base editor (BE) and single-guide RNA pair (sgRNA) for a given target can be difficult. To select for BEs and sgRNAs without extensive experimental work, we systematically compared the editing windows, outcomes and preferred motifs for seven BEs, including two cytosine BEs, two adenine BEs and three C•G to G•C BEs at thousands of target sequences. We also evaluated nine Cas9 variants that recognize different PAM sequences and developed a deep learning model, DeepCas9variants, for predicting which variants function most efficiently at sites with a given target sequence. We then develop a computational model, DeepBE, that predicts editing efficiencies and outcomes of 63 BEs that were generated by incorporating nine Cas9 variants as nickase domains into the seven BE variants. The predicted median efficiencies of BEs with DeepBE-based design were 2.9- to 20-fold higher than those of rationally designed SpCas9-containing BEs.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
We have submitted the deep sequencing data from this study to the NCBI Sequence Read Archive under accession number PRJNA821929 (ref. 52). We provide the datasets used in this study as Supplementary Tables 1–9. Source data are provided with this paper.
Code availability
We have made the source code for DeepBE available on Github at https://github.com/NahyeKim/DeepBE (ref. 53).
References
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).
Gaudelli, N. M. et al. Programmable base editing of A*T to G*C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).
Rees, H. A. & Liu, D. R. Base editing: precision chemistry on the genome and transcriptome of living cells. Nat. Rev. Genet. 19, 770–788 (2018).
Kurt, I. C. et al. CRISPR C-to-G base editors for inducing targeted DNA transversions in human cells. Nat. Biotechnol. 39, 41–46 (2021).
Zhao, D. et al. Glycosylase base editors enable C-to-A and C-to-G base changes. Nat. Biotechnol. 39, 35–40 (2021).
Chen, L. et al. Programmable C:G to G:C genome editing with CRISPR-Cas9-directed base excision repair proteins. Nat. Commun. 12, 1384 (2021).
Koblan, L. W. et al. Efficient C*G-to-G*C base editors developed using CRISPRi screens, target-library analysis, and machine learning. Nat. Biotechnol. 39, 1414–1425 (2021).
Komor, A. C. et al. Improved base excision repair inhibition and bacteriophage Mu Gam protein yields C:G-to-T:A base editors with higher efficiency and product purity. Sci. Adv. 3, eaao4774 (2017).
Doman, J. L., Raguram, A., Newby, G. A. & Liu, D. R. Evaluation and minimization of Cas9-independent off-target DNA editing by cytosine base editors. Nat. Biotechnol. 38, 620–628 (2020).
Yu, Y. et al. Cytosine base editors with minimized unguided DNA and RNA off-target events and high on-target activity. Nat. Commun. 11, 2052 (2020).
Richter, M. F. et al. Author correction: Phage-assisted evolution of an adenine base editor with improved Cas domain compatibility and activity. Nat. Biotechnol. 38, 901 (2020).
Gaudelli, N. M. et al. Directed evolution of adenine base editors with increased activity and therapeutic application. Nat. Biotechnol. 38, 892–900 (2020).
Kleinstiver, B. P. et al. Engineered CRISPR-Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).
Kleinstiver, B. P. et al. High-fidelity CRISPR-Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).
Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).
Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).
Nishimasu, H. et al. Engineered CRISPR-Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).
Miller, S. M. et al. Continuous evolution of SpCas9 variants compatible with non-G PAMs. Nat. Biotechnol. 38, 471–481 (2020).
Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR-Cas9 variants. Science 368, 290–296 (2020).
Chatterjee, P. et al. An engineered ScCas9 with broad PAM range and high specificity and activity. Nat. Biotechnol. 38, 1154–1158 (2020).
Chatterjee, P. et al. A Cas9 with PAM recognition for adenine dinucleotides. Nat. Commun. 11, 2474 (2020).
Kim, N. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat. Biotechnol. 38, 1328–1336 (2020).
Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).
Zafra, M. P. et al. Optimized base editors enable efficient editing in cells, organoids and mice. Nat. Biotechnol. 36, 888–893 (2018).
Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).
Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).
Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. 38, 1037–1043 (2020).
Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).
Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).
Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198–206 (2021).
Jin, S. et al. Cytosine, but not adenine, base editors induce genome-wide off-target mutations in rice. Science 364, 292–295 (2019).
Zuo, E. et al. Cytosine base editor generates substantial off-target single-nucleotide variants in mouse embryos. Science 364, 289–292 (2019).
Grunewald, J. et al. Transcriptome-wide off-target RNA editing induced by CRISPR-guided DNA base editors. Nature 569, 433–437 (2019).
Kim, Y. B. et al. Increasing the genome-targeting scope and precision of base editing with engineered Cas9-cytidine deaminase fusions. Nat. Biotechnol. 35, 371–376 (2017).
Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 e430 (2020).
Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome editing with CRISPR-Cas nucleases, base editors, transposases and prime editors. Nat. Biotechnol. 38, 824–844 (2020).
Kim, H. K. et al. In vivo high-throughput profiling of CRISPR-Cpf1 activity. Nat. Methods 14, 153–159 (2017).
Beale, R. C. et al. Comparison of the differential context-dependence of DNA deamination by APOBEC enzymes: correlation with mutation spectra in vivo. J. Mol. Biol. 337, 585–596 (2004).
Liu, L. D. et al. Intrinsic nucleotide preference of diversifying base editors guides antibody ex vivo affinity maturation. Cell Rep. 25, 884–892 e883 (2018).
Kim, H. K. et al. Deep learning improves prediction of CRISPR-Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).
Ding, Q. et al. Permanent alteration of PCSK9 with in vivo CRISPR-Cas9 genome editing. Circ. Res. 115, 488–492 (2014).
Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).
Chadwick, A. C., Wang, X. & Musunuru, K. In vivo base editing of PCSK9 (proprotein convertase subtilisin/kexin type 9) as a therapeutic alternative to genome editing. Arterioscler. Thromb. Vasc. Biol. 37, 1741–1747 (2017).
Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR-Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).
Sastry, L., Xu, Y., Cooper, R., Pollok, K. & Cornetta, K. Evaluation of plasmid DNA removal from lentiviral vectors by benzonase treatment. Hum. Gene Ther. 15, 221–226 (2004).
Sack, L. M., Davoli, T., Xu, Q., Li, M. Z. & Elledge, S. J. Sources of error in mammalian genetic screens. G3 (Bethesda) 6, 2781–2790 (2016).
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In Proc. 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) 265–283 (USENIX Association, 2016).
Kim, N. et al. Evaluation of Cas9 and base editor variants Datasets. Sequence Read Archive. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA821929/ (2023).
Kim, N. et al. Deep learning models to predict the editing efficiencies and outcomes of diverse base editors. Source Code. GitHub https://github.com/NahyeKim/DeepBE (2023).
Acknowledgements
We would like to thank Y. Kim and S. Park for assisting with the experiments and S. H. Kwon and J. Yu for providing technical advice. This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2022R1A3B1078084 (H.H.K.), 2018R1A5A2025079 (H.H.K.) and 2022R1C1C2004229 (N.K.)), the Bio and Medical Technology Development Program of the NRF funded by the Korean government (MSIT) (2022M3A9E4017127 (H.H.K.) and 2022M3A9F3017506 (H.H.K.)), the Korea Drug Development Fund funded by the Ministry of Science and ICT, the Ministry of Trade, Industry and Energy, and the Ministry of Health and Welfare, Republic of Korea (HN21C0917 (H.H.K.)), the Yonsei Signature Research Cluster Program of 2021-22-0014 (H.H.K.), the Brain Korea 21 FOUR Project for Medical Science (Yonsei University College of Medicine), the SNUH Kun-hee Lee Child Cancer and Rare Disease Project, Republic of Korea (22B-000-0101 (H.H.K.)), the Yonsei Fellow Program, funded by Lee Youn Jae, and the Korea Health Technology R&D Project funded by the Ministry of Health and Welfare, Republic of Korea (HI21C1314 (H.H.K.)).
Author information
Authors and Affiliations
Contributions
N.K. performed all wet experiments, including the high-throughput evaluations. S.C. and N.K. developed deep-learning-based prediction models. M.S. and N.K. determined base editing efficiencies at endogenous sites. S.M. developed related web tools. S.K. contributed substantially to bioinformatics analyses. J.P. contributed to bioinformatic analyses. J.H.S. and S.-R.C. performed western blotting to measure Cas9 variant and BE variant protein levels. H.K.K. and N.K. conceived and designed the study. N.K. and H.H.K. analyzed the data and wrote the paper.
Corresponding author
Ethics declarations
Competing interests
Yonsei University has filed a patent (10-2022-0053742) based on this work, in which N.K. and H.H.K. are listed as the coinventors. H.H.K. is a consultant for EcoR1 capital.
Peer review
Peer review information
Nature Biotechnology thanks Hui Yang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Generation and evaluation of variant-expressing cell lines.
a, Schematic overview of the library experiment. b, Western blot analysis of Cas9 protein levels in cells expressing Cas9 variants, base editor variants with different deaminases, and base editor variants containing Cas9 variants. The results from three independent western blot analyses are presented (n = 3). Error bars show mean ± s.e.m. Subsets of variants without statistically significant differences (P > 0.05, analysis of variance (ANOVA) followed by Tukey’s post hoc test (two-sided)) in their protein levels are indicated by letters, such as a and b.
Extended Data Fig. 2 Comparison of base editing efficiencies induced by different CBEs (a) and ABEs (b).
The red triangles indicate target sequences at which the editing efficiency of one base editor is at least 30% higher than that of the other base editor. Heatmaps show the relative distribution of nucleotides neighboring the target nucleotide in the target sequences within the red triangles. Results at position 6 are also shown in Fig. 1d, h. The number of target sequences (n) = 1,027 and 2,903 at position 5, 917 and 3,119 for position 6, and 701 and 3,255 at position 7 for NG-YE1-BE4max and NG-SsAPOBEC3B, respectively (a) and 4,026 and 701 at position 4, 3,448 and 698 at position 5, 3,258 and 674 at position 6, 3,254 and 847 at position 7, and 3,255 and 1,081 at position 8 for NG-ABE8e(V106W) and NG-ABE8.17-m + V106W, respectively (b).
Extended Data Fig. 3 Comparison of base editing efficiencies induced by different CGBEs.
The red triangles indicate target sequences at which the editing efficiency of one base editor is at least 30% higher than that of the other base editor. Heatmaps show the relative distribution of nucleotides neighboring the target nucleotide in target sequences within the red triangles. Results at position 6 are also shown in Fig. 2d–f. The number of target sequences (n) = 1,807 and 830 at position 5, 1,413 and 957 at position 6, and 1,249 and 1,227 at position 7 for NG-CGBE1 and NG-miniCGBE1, respectively (a), 2,643 and 972 at position 5, 2,894 and 817 at position 6, 2,706 and 838 at position 7 for NG-miniCGBE1 and NG-APOBEC-nCas9-Ung, respectively (b), and 2,885 and 808 at position 5, 2,995 and 766 at position 6, and 2,715 and 891 at position 7 for NG-CGBE1 and NG-APOBEC-nCas9-Ung, respectively (c).
Extended Data Fig. 4 Average Cas9 variant-induced indel frequencies at target sequences containing each 4-nt PAM sequence.
If the average indel frequencies were lower than 10%, the corresponding candidate PAMs are shown in white. The red boxes indicate PAM sequences that were associated with the highest average indel frequencies among the examined Cas9 variants with different PAM compatibilities; these PAM sequences at which the corresponding Cas9 variants showed the highest average indel frequencies are also shown in Fig. 3b. SpCas9 (a), VRAR variant (b), xCas9 (c), SpCas9-NG (d), SpCas9-NRRH (e), SpCas9-NRRH (f), SpCas9-NRCH (g), SpG (h), SpRY (l), and Sc++ (j) are used.
Extended Data Fig. 5 Comparison of Cas9 variants with different PAM compatibilities.
Maximum average indel frequencies generated by any of the ten Cas9 variants (left heatmaps) and the corresponding Cas9 variants that showed the highest average activities (right heatmaps) at target sequences containing each 4-nt PAM sequence. If the maximum average indel frequencies were lower than 5% (a) and 20% (b), the corresponding candidate PAMs were shown in white.
Extended Data Fig. 6 Representative dot plots showing correlations between indel frequencies induced by Cas9 variants at targets with 12 example PAMs.
The Pearson correlation coefficients (r) are shown. The number of target sequences (n) = 29 for SpCas9 and VRQR, 30 for SpCas9 and xCas9, 30 for SpCas9 and SpCas9-NG, 30 for SpCas9 and SpCas9-NRRH, 28 for SpCas9 and SpCas9-NRTH, 30 for SpCas9 and SpCas9-NRCH, 30 for SpCas9 and SpRY, 30 for VRQR and SpCas9-NRRH, 30 for xCas9 and SpCas9-NRRH, 29 for SpCas9-NRCH and SpG, and 28 for SpCas9-NRCH and SpRY.
Extended Data Fig. 7 Correlations between predicted DeepNG-BE_efficiency and DeepNG-BE_proportion_30bp scores and measured base editing efficiencies (a) and proportions (b).
The Pearson correlation coefficient (r) and the Spearman correlation (R) are shown. The number of target sequences (n) = 1,753, 1,998, 2,218, 2,214, 1,754, 1,752, and 1,750 (a) and 2,633, 6,446, 3,212, 2,320, 1,060, 859, and 748 (b) for NG-YE1-BE4max, NG-SsAPOBEC3B, NG-ABE8e(V106W), NG-ABE8.17-m + V106W, NG-CGBE1, NG-miniCGBE1, and NG-APOBEC-nCas9-Ung, respectively.
Extended Data Fig. 8 Architecture of DeepBE generation.
To predict overall base editing efficiency (DeepBE-efficiency), we used the prediction scores of DeepCas9variants along with input sequence information for the base editing window ± 1 base pair. Additionally, we obtained DeepBE-proportion scores based on the 20-base pair target sequence information. The DeepBE-proportion scores and DeepBE-efficiency scores were multiplied together to generate DeepBE prediction scores.
Extended Data Fig. 9 Performance of conventional machine learning- or shallow learning-based models and deep-learning-based models in predicting the efficiencies of Cas9 nuclease (a) and base editor variants (b).
Heatmaps show Pearson’s (left panel) and Spearman’s (right panel) correlation coefficients. In b, the deamination and nuclease domains are shown to indicate base editor variants.
Extended Data Fig. 10 Evaluation of DeepBE in predicting base editor activities at endogenous targets in three different cell lines.
a-c, Correlations between DeepBE-predicted base editing outcome frequencies and measured outcome frequencies at endogenous targets in HEK293T (a), HCT116 (b), and U20S (c) cells. The number of targets (n) = 240, 312, and 174 (a), 219, 293, and 158 (b), and 219, 330, and 181 (c) for SpCas9-NRCH-YE1-BE4max, SpCas9-NRCH-ABE8e(V106W), and SpCas9-NRCH-CGBE1, respectively. d,e, Comparison of measured base editing outcome frequencies at targets with different chromatin accessibilities (d) and different functional regions (that is, coding versus non-coding) (e). The boxes represent the 25th, 50th, and 75th percentiles; the whiskers show the 10th and 90th percentiles. The Wilcoxon rank-sum test; two-sided. The number of targets (n) = 114 (Non-DNase I hypersensitive sites (DHS)) and 33 (DHS; HEK293T), 102 and 40 (HCT116), and 98 and 28 (U2OS), 267 and 44 (HEK293T), 246 and 47 (HCT116), and 269 and 61 (U2OS), and 144 and 30 (HEK293T), 71 and 23 (HCT116), and 151 (Non-DHS) and 30 (HDS; U2OS) (d) and 79 (coding) and 68 (non-coding; HEK293T), 80 and 62 (HCT116), and 71 and 55 (U2OS), 150 and 161 (HEK293T), 149 and 144 (HCT116), and 143 and 187 (U2OS), and 78 and 96 (HEK293T), 79 and 40 (HCT116), and 89 (coding) and 92 (non-coding; U2OS) (e) for NRCH-YE1-BE4max, NRCH-ABE8e(V106W), and NRCH-CGBE1, respectively.
Supplementary information
Supplementary Information
Supplementary Figs. 1–24.
Supplementary Tables 1–9
Supplementary Tables 1–9 (titles and descriptions of each table are included in each spreadsheet file).
Source data
Source Data Extended Data Fig. 1
Unprocessed original images of gels and western blots.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kim, N., Choi, S., Kim, S. et al. Deep learning models to predict the editing efficiencies and outcomes of diverse base editors. Nat Biotechnol 42, 484–497 (2024). https://doi.org/10.1038/s41587-023-01792-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-023-01792-x
This article is cited by
-
An adenine base editor variant expands context compatibility
Nature Biotechnology (2024)
-
Recent advances in CRISPR-based functional genomics for the study of disease-associated genetic variants
Experimental & Molecular Medicine (2024)
-
Deep learning models to predict the editing efficiencies and outcomes of diverse base editors
Nature Biotechnology (2024)
-
Base Editors-Mediated Gene Therapy in Hematopoietic Stem Cells for Hematologic Diseases
Stem Cell Reviews and Reports (2024)
-
Optimization of base editors for the functional correction of SMN2 as a treatment for spinal muscular atrophy
Nature Biomedical Engineering (2023)