Abstract
Several Streptococcus pyogenes Cas9 (SpCas9) variants have been developed to improve an enzyme’s specificity or to alter or broaden its protospacer-adjacent motif (PAM) compatibility, but selecting the optimal variant for a given target sequence and application remains difficult. To build computational models to predict the sequence-specific activity of 13 SpCas9 variants, we first assessed their cleavage efficiency at 26,891 target sequences. We found that, of the 256 possible four-nucleotide NNNN sequences, 156 can be used as a PAM by at least one of the SpCas9 variants. For the high-fidelity variants, overall activity could be ranked as SpCas9 ≥ Sniper-Cas9 > eSpCas9(1.1) > SpCas9-HF1 > HypaCas9 ≈ xCas9 >> evoCas9, whereas their overall specificities could be ranked as evoCas9 >> HypaCas9 ≥ SpCas9-HF1 ≈ eSpCas9(1.1) > xCas9 > Sniper-Cas9 > SpCas9. Using these data, we developed 16 deep-learning-based computational models that accurately predict the activity of these variants at any target sequence.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
We have submitted the deep sequencing data from this study to the NCBI Sequence Read Archive under accession number SRR10215483. We have provided the data sets used in this study as Supplementary Tables 1–3.
Code availability
We have made the source code for DeepSpCas9variants and the custom Python scripts used for the indel frequency calculations available on Github at https://github.com/NahyeKim/DeepSpCas9variants and https://github.com/CRISPRJWCHOI/CRISPR_toolkit/tree/master/Indel_searcher_2.
References
Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).
Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).
Cho, S. W., Kim, S., Kim, J. M. & Kim, J. S. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230–232 (2013).
Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).
Hwang, W. Y. et al. Efficient genome editing in zebrafish using a CRISPR–Cas system. Nat. Biotechnol. 31, 227–229 (2013).
Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR–Cas systems. Nat. Biotechnol. 31, 233–239 (2013).
Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016).
Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).
Chen, J. S. et al. Enhanced proofreading governs CRISPR-Cas9 targeting accuracy. Nature 550, 407–410 (2017).
Casini, A. et al. A highly specific SpCas9 variant is identified by in vivo screening in yeast. Nat. Biotechnol. 36, 265–271 (2018).
Lee, J. K. et al. Directed evolution of CRISPR–Cas9 to increase its specificity. Nat. Commun. 9, 3048 (2018).
Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).
Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).
Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).
Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).
Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).
Zafra, M. P. et al. Optimized base editors enable efficient editing in cells, organoids and mice. Nat. Biotechnol. 36, 888–893 (2018).
Tzelepis, K. et al. A CRISPR dropout screen identifies genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep. 17, 1193–1205 (2016).
Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2017).
Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).
Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).
Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR–Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).
He, X. et al. Boosting activity of high-fidelity CRISPR/Cas9 variants using a tRNAGln-processing system in human cells. J. Biol. Chem. 294, 9308–9315 (2019).
Zhang, D. et al. Perfectly matched 20-nucleotide guide RNA sequences enable robust genome editing using high-fidelity SpCas9 nucleases. Genome Biol. 18, 191 (2017).
Xie, K., Minkenberg, B. & Yang, Y. Boosting CRISPR/Cas9 multiplex editing capability with the endogenous tRNA-processing system. Proc. Natl Acad. Sci. USA 112, 3570–3575 (2015).
Schlub, T. E., Smyth, R. P., Grimm, A. J., Mak, J. & Davenport, M. P. Accurately measuring recombination between closely related HIV-1 genomes. PLoS Comput. Biol. 6, e1000766 (2010).
Sack, L. M., Davoli, T., Xu, Q., Li, M. Z. & Elledge, S. J. Sources of error in mammalian genetic screens. G3 6, 2781–2790 (2016).
Feldman, D., Singh, A., Garrity, A. J. & Blainey, P. C. Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens. Preprint at bioRxiv https://doi.org/10.1101/262121 (2018).
Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).
Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).
Xu, L., Zhao, L., Gao, Y., Xu, J. & Han, R. Empower multiplex cell and tissue-specific CRISPR-mediated gene manipulation with self-cleaving ribozymes and tRNA. Nucleic Acids Res. 45, e28 (2017).
Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10, 4284 (2019).
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Xu, H. et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 1147–1157 (2015).
Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).
Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR–Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).
Du, D. et al. Genetic interaction mapping in mammalian cells using CRISPR interference. Nat. Methods 14, 577–580 (2017).
Kim, S., Bae, T., Hwang, J. & Kim, J. S. Rescue of high-specificity Cas9 variants using sgRNAs with matched 5′ nucleotides. Genome Biol. 18, 218 (2017).
Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).
Sastry, L., Xu, Y., Cooper, R., Pollok, K. & Cornetta, K. Evaluation of plasmid DNA removal from lentiviral vectors by benzonase treatment. Hum. Gene Ther. 15, 221–226 (2004).
Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).
Caruana, R., Lawrence, S. & Giles, C. L. Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference (MIT Press, 2000).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In Proc. 12th USENIX Conference on Operating Systems Design and Implementation 265–283 (USENIX Association, 2016); https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
Acknowledgements
We would like to thank Younggwang Kim, S. Park and Younghye Kim for assisting with the experiments. This work was supported in part by the National Research Foundation of Korea (grants 2017R1A2B3004198 (to H.H.K.), 2017M3A9B4062403 (to H.H.K.) and 2018R1A5A2025079 (to H.H.K.)), Brain Korea 21 Plus Project (Yonsei University College of Medicine), the Institute for Basic Science (IBS-R026-D1) and the Korean Health Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (grants HI17C0676 (to H.H.K.) and HI16C1012 (to S.-R.C. and H.H.K.)).
Author information
Authors and Affiliations
Contributions
N.K. performed most wet experiments, including high-throughput evaluation of SpCas9 variant activities. H.K.K. helped substantially with N.K.’s experiments and provided critical technical advice for high-throughput experiments. S.L., S.M., S.Y. and H.K.K. developed DeepSpCas9variants and the related web tools. J.P. and J.W.C. contributed substantially to bioinformatics analyses. J.H.S. and S.-R.C. performed western blotting to measure SpCas9 variant protein levels. Together with H.K.K. and N.K., H.H.K. conceived and designed the study. N.K., H.K.K. and H.H.K. analyzed the data and wrote the manuscript.
Corresponding author
Ethics declarations
Competing interests
Yonsei University has filed a patent based on this work, in which N.K., H.K.K. and H.H.K. are the co-inventors (patent no. 10-2019-0127304).
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Text, Supplementary Discussions 1 and 2, Supplementary Figs. 1–17 and Supplementary Notes 1–3.
Supplementary Table 1
Library A design and indel frequencies from the library.
Supplementary Table 2
Library B design and indel frequencies from the library.
Supplementary Table 3
Library C design and indel frequencies from the library.
Supplementary Table 4
Frequency of shuffling between sgRNA-encoding and barcode-target sequences.
Supplementary Table 5
Average indel frequencies associated with all possible 4-nt PAM sequences.
Supplementary Table 6
PAM compatibilities determined using the 30 fixed protospacers from library A and a wide range of protospacers from library B.
Supplementary Table 7
Indel frequencies at 30 perfectly matched target sequences used for analyzing specificity. Of 30 guide RNAs, 8 were selected.
Supplementary Table 8
Data sets used for the development and evaluation of DeepSpCas9variants.
Supplementary Table 9
Primer sequences used for experiments.
Source data
Source Data Fig. 1
Unprocessed western blots for Fig. 1b
Rights and permissions
About this article
Cite this article
Kim, N., Kim, H.K., Lee, S. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat Biotechnol 38, 1328–1336 (2020). https://doi.org/10.1038/s41587-020-0537-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41587-020-0537-9
This article is cited by
-
Recent advances in CRISPR-based functional genomics for the study of disease-associated genetic variants
Experimental & Molecular Medicine (2024)
-
Prediction of on-target and off-target activity of CRISPR–Cas13d guide RNAs using deep learning
Nature Biotechnology (2024)
-
Enhancing prime editor activity by directed protein evolution in yeast
Nature Communications (2024)
-
Deep learning models to predict the editing efficiencies and outcomes of diverse base editors
Nature Biotechnology (2024)
-
Integrating machine learning and genome editing for crop improvement
aBIOTECH (2024)