Several Streptococcus pyogenes Cas9 (SpCas9) variants have been developed to improve an enzyme’s specificity or to alter or broaden its protospacer-adjacent motif (PAM) compatibility, but selecting the optimal variant for a given target sequence and application remains difficult. To build computational models to predict the sequence-specific activity of 13 SpCas9 variants, we first assessed their cleavage efficiency at 26,891 target sequences. We found that, of the 256 possible four-nucleotide NNNN sequences, 156 can be used as a PAM by at least one of the SpCas9 variants. For the high-fidelity variants, overall activity could be ranked as SpCas9 ≥ Sniper-Cas9 > eSpCas9(1.1) > SpCas9-HF1 > HypaCas9 ≈ xCas9 >> evoCas9, whereas their overall specificities could be ranked as evoCas9 >> HypaCas9 ≥ SpCas9-HF1 ≈ eSpCas9(1.1) > xCas9 > Sniper-Cas9 > SpCas9. Using these data, we developed 16 deep-learning-based computational models that accurately predict the activity of these variants at any target sequence.
Subscribe to Journal
Get full journal access for 1 year
only $21.58 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
We have made the source code for DeepSpCas9variants and the custom Python scripts used for the indel frequency calculations available on Github at https://github.com/NahyeKim/DeepSpCas9variants and https://github.com/CRISPRJWCHOI/CRISPR_toolkit/tree/master/Indel_searcher_2.
Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).
Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).
Cho, S. W., Kim, S., Kim, J. M. & Kim, J. S. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230–232 (2013).
Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).
Hwang, W. Y. et al. Efficient genome editing in zebrafish using a CRISPR–Cas system. Nat. Biotechnol. 31, 227–229 (2013).
Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR–Cas systems. Nat. Biotechnol. 31, 233–239 (2013).
Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016).
Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).
Chen, J. S. et al. Enhanced proofreading governs CRISPR-Cas9 targeting accuracy. Nature 550, 407–410 (2017).
Casini, A. et al. A highly specific SpCas9 variant is identified by in vivo screening in yeast. Nat. Biotechnol. 36, 265–271 (2018).
Lee, J. K. et al. Directed evolution of CRISPR–Cas9 to increase its specificity. Nat. Commun. 9, 3048 (2018).
Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).
Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).
Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).
Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).
Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).
Zafra, M. P. et al. Optimized base editors enable efficient editing in cells, organoids and mice. Nat. Biotechnol. 36, 888–893 (2018).
Tzelepis, K. et al. A CRISPR dropout screen identifies genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep. 17, 1193–1205 (2016).
Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2017).
Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).
Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).
Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR–Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).
He, X. et al. Boosting activity of high-fidelity CRISPR/Cas9 variants using a tRNAGln-processing system in human cells. J. Biol. Chem. 294, 9308–9315 (2019).
Zhang, D. et al. Perfectly matched 20-nucleotide guide RNA sequences enable robust genome editing using high-fidelity SpCas9 nucleases. Genome Biol. 18, 191 (2017).
Xie, K., Minkenberg, B. & Yang, Y. Boosting CRISPR/Cas9 multiplex editing capability with the endogenous tRNA-processing system. Proc. Natl Acad. Sci. USA 112, 3570–3575 (2015).
Schlub, T. E., Smyth, R. P., Grimm, A. J., Mak, J. & Davenport, M. P. Accurately measuring recombination between closely related HIV-1 genomes. PLoS Comput. Biol. 6, e1000766 (2010).
Sack, L. M., Davoli, T., Xu, Q., Li, M. Z. & Elledge, S. J. Sources of error in mammalian genetic screens. G3 6, 2781–2790 (2016).
Feldman, D., Singh, A., Garrity, A. J. & Blainey, P. C. Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens. Preprint at bioRxiv https://doi.org/10.1101/262121 (2018).
Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).
Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).
Xu, L., Zhao, L., Gao, Y., Xu, J. & Han, R. Empower multiplex cell and tissue-specific CRISPR-mediated gene manipulation with self-cleaving ribozymes and tRNA. Nucleic Acids Res. 45, e28 (2017).
Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10, 4284 (2019).
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Xu, H. et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 1147–1157 (2015).
Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).
Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR–Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).
Du, D. et al. Genetic interaction mapping in mammalian cells using CRISPR interference. Nat. Methods 14, 577–580 (2017).
Kim, S., Bae, T., Hwang, J. & Kim, J. S. Rescue of high-specificity Cas9 variants using sgRNAs with matched 5′ nucleotides. Genome Biol. 18, 218 (2017).
Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).
Sastry, L., Xu, Y., Cooper, R., Pollok, K. & Cornetta, K. Evaluation of plasmid DNA removal from lentiviral vectors by benzonase treatment. Hum. Gene Ther. 15, 221–226 (2004).
Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).
Caruana, R., Lawrence, S. & Giles, C. L. Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference (MIT Press, 2000).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In Proc. 12th USENIX Conference on Operating Systems Design and Implementation 265–283 (USENIX Association, 2016); https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi
We would like to thank Younggwang Kim, S. Park and Younghye Kim for assisting with the experiments. This work was supported in part by the National Research Foundation of Korea (grants 2017R1A2B3004198 (to H.H.K.), 2017M3A9B4062403 (to H.H.K.) and 2018R1A5A2025079 (to H.H.K.)), Brain Korea 21 Plus Project (Yonsei University College of Medicine), the Institute for Basic Science (IBS-R026-D1) and the Korean Health Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (grants HI17C0676 (to H.H.K.) and HI16C1012 (to S.-R.C. and H.H.K.)).
Yonsei University has filed a patent based on this work, in which N.K., H.K.K. and H.H.K. are the co-inventors (patent no. 10-2019-0127304).
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Text, Supplementary Discussions 1 and 2, Supplementary Figs. 1–17 and Supplementary Notes 1–3.
Library A design and indel frequencies from the library.
Library B design and indel frequencies from the library.
Library C design and indel frequencies from the library.
Frequency of shuffling between sgRNA-encoding and barcode-target sequences.
Average indel frequencies associated with all possible 4-nt PAM sequences.
PAM compatibilities determined using the 30 fixed protospacers from library A and a wide range of protospacers from library B.
Indel frequencies at 30 perfectly matched target sequences used for analyzing specificity. Of 30 guide RNAs, 8 were selected.
Data sets used for the development and evaluation of DeepSpCas9variants.
Primer sequences used for experiments.
About this article
Cite this article
Kim, N., Kim, H.K., Lee, S. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat Biotechnol (2020). https://doi.org/10.1038/s41587-020-0537-9
Nature Biotechnology (2020)