Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Prediction of the sequence-specific cleavage activity of Cas9 variants

Abstract

Several Streptococcus pyogenes Cas9 (SpCas9) variants have been developed to improve an enzyme’s specificity or to alter or broaden its protospacer-adjacent motif (PAM) compatibility, but selecting the optimal variant for a given target sequence and application remains difficult. To build computational models to predict the sequence-specific activity of 13 SpCas9 variants, we first assessed their cleavage efficiency at 26,891 target sequences. We found that, of the 256 possible four-nucleotide NNNN sequences, 156 can be used as a PAM by at least one of the SpCas9 variants. For the high-fidelity variants, overall activity could be ranked as SpCas9 ≥ Sniper-Cas9 > eSpCas9(1.1) > SpCas9-HF1 > HypaCas9 ≈ xCas9 >> evoCas9, whereas their overall specificities could be ranked as evoCas9 >> HypaCas9 ≥ SpCas9-HF1 ≈ eSpCas9(1.1) > xCas9 > Sniper-Cas9 > SpCas9. Using these data, we developed 16 deep-learning-based computational models that accurately predict the activity of these variants at any target sequence.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: High-throughput evaluation of the activities of SpCas9 variants using a lentiviral library of sgRNA-target sequence pairs.
Fig. 2: PAM compatibilities and general activities of SpCas9 variants.
Fig. 3: Specificity of SpCas9 variants when there are mismatches between the sgRNA guide sequence and the target sequence.
Fig. 4: Development and evaluation of DeepSpCas9variants, computational models predicting the activities of SpCas9 variants.

Similar content being viewed by others

Data availability

We have submitted the deep sequencing data from this study to the NCBI Sequence Read Archive under accession number SRR10215483. We have provided the data sets used in this study as Supplementary Tables 13.

Code availability

We have made the source code for DeepSpCas9variants and the custom Python scripts used for the indel frequency calculations available on Github at https://github.com/NahyeKim/DeepSpCas9variants and https://github.com/CRISPRJWCHOI/CRISPR_toolkit/tree/master/Indel_searcher_2.

References

  1. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).

    CAS  Google Scholar 

  2. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).

    CAS  Google Scholar 

  3. Cho, S. W., Kim, S., Kim, J. M. & Kim, J. S. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230–232 (2013).

    CAS  Google Scholar 

  4. Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).

    Google Scholar 

  5. Hwang, W. Y. et al. Efficient genome editing in zebrafish using a CRISPR–Cas system. Nat. Biotechnol. 31, 227–229 (2013).

    CAS  Google Scholar 

  6. Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR–Cas systems. Nat. Biotechnol. 31, 233–239 (2013).

    CAS  Google Scholar 

  7. Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016).

    CAS  Google Scholar 

  8. Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).

    CAS  Google Scholar 

  9. Chen, J. S. et al. Enhanced proofreading governs CRISPR-Cas9 targeting accuracy. Nature 550, 407–410 (2017).

    CAS  Google Scholar 

  10. Casini, A. et al. A highly specific SpCas9 variant is identified by in vivo screening in yeast. Nat. Biotechnol. 36, 265–271 (2018).

    CAS  Google Scholar 

  11. Lee, J. K. et al. Directed evolution of CRISPR–Cas9 to increase its specificity. Nat. Commun. 9, 3048 (2018).

    Google Scholar 

  12. Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).

    Google Scholar 

  13. Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).

    CAS  Google Scholar 

  14. Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).

    CAS  Google Scholar 

  15. Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).

    CAS  Google Scholar 

  16. Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).

    CAS  Google Scholar 

  17. Zafra, M. P. et al. Optimized base editors enable efficient editing in cells, organoids and mice. Nat. Biotechnol. 36, 888–893 (2018).

    CAS  Google Scholar 

  18. Tzelepis, K. et al. A CRISPR dropout screen identifies genetic vulnerabilities and therapeutic targets in acute myeloid leukemia. Cell Rep. 17, 1193–1205 (2016).

    CAS  Google Scholar 

  19. Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2017).

    CAS  Google Scholar 

  20. Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).

    CAS  Google Scholar 

  21. Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).

    CAS  Google Scholar 

  22. Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR–Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).

    Google Scholar 

  23. He, X. et al. Boosting activity of high-fidelity CRISPR/Cas9 variants using a tRNAGln-processing system in human cells. J. Biol. Chem. 294, 9308–9315 (2019).

    Google Scholar 

  24. Zhang, D. et al. Perfectly matched 20-nucleotide guide RNA sequences enable robust genome editing using high-fidelity SpCas9 nucleases. Genome Biol. 18, 191 (2017).

    Google Scholar 

  25. Xie, K., Minkenberg, B. & Yang, Y. Boosting CRISPR/Cas9 multiplex editing capability with the endogenous tRNA-processing system. Proc. Natl Acad. Sci. USA 112, 3570–3575 (2015).

    CAS  Google Scholar 

  26. Schlub, T. E., Smyth, R. P., Grimm, A. J., Mak, J. & Davenport, M. P. Accurately measuring recombination between closely related HIV-1 genomes. PLoS Comput. Biol. 6, e1000766 (2010).

    Google Scholar 

  27. Sack, L. M., Davoli, T., Xu, Q., Li, M. Z. & Elledge, S. J. Sources of error in mammalian genetic screens. G3 6, 2781–2790 (2016).

    CAS  Google Scholar 

  28. Feldman, D., Singh, A., Garrity, A. J. & Blainey, P. C. Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens. Preprint at bioRxiv https://doi.org/10.1101/262121 (2018).

  29. Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).

  30. Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).

  31. Xu, L., Zhao, L., Gao, Y., Xu, J. & Han, R. Empower multiplex cell and tissue-specific CRISPR-mediated gene manipulation with self-cleaving ribozymes and tRNA. Nucleic Acids Res. 45, e28 (2017).

    Google Scholar 

  32. Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10, 4284 (2019).

    Google Scholar 

  33. Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).

    Google Scholar 

  34. Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).

    CAS  Google Scholar 

  35. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

    CAS  Google Scholar 

  36. Xu, H. et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 1147–1157 (2015).

    CAS  Google Scholar 

  37. Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).

    CAS  Google Scholar 

  38. Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR–Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).

    CAS  Google Scholar 

  39. Du, D. et al. Genetic interaction mapping in mammalian cells using CRISPR interference. Nat. Methods 14, 577–580 (2017).

    CAS  Google Scholar 

  40. Kim, S., Bae, T., Hwang, J. & Kim, J. S. Rescue of high-specificity Cas9 variants using sgRNAs with matched 5′ nucleotides. Genome Biol. 18, 218 (2017).

    Google Scholar 

  41. Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).

    CAS  Google Scholar 

  42. Sastry, L., Xu, Y., Cooper, R., Pollok, K. & Cornetta, K. Evaluation of plasmid DNA removal from lentiviral vectors by benzonase treatment. Hum. Gene Ther. 15, 221–226 (2004).

    CAS  Google Scholar 

  43. Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).

    CAS  Google Scholar 

  44. Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).

    CAS  Google Scholar 

  45. Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

    Google Scholar 

  46. Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

    CAS  Google Scholar 

  47. Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

    CAS  Google Scholar 

  48. Caruana, R., Lawrence, S. & Giles, C. L. Overfitting in neural nets: backpropagation, conjugate gradient, and early stopping. In Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference (MIT Press, 2000).

  49. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    Google Scholar 

  50. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    CAS  Google Scholar 

  51. Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In Proc. 12th USENIX Conference on Operating Systems Design and Implementation 265–283 (USENIX Association, 2016); https://www.usenix.org/conference/osdi16/technical-sessions/presentation/abadi

Download references

Acknowledgements

We would like to thank Younggwang Kim, S. Park and Younghye Kim for assisting with the experiments. This work was supported in part by the National Research Foundation of Korea (grants 2017R1A2B3004198 (to H.H.K.), 2017M3A9B4062403 (to H.H.K.) and 2018R1A5A2025079 (to H.H.K.)), Brain Korea 21 Plus Project (Yonsei University College of Medicine), the Institute for Basic Science (IBS-R026-D1) and the Korean Health Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (grants HI17C0676 (to H.H.K.) and HI16C1012 (to S.-R.C. and H.H.K.)).

Author information

Authors and Affiliations

Authors

Contributions

N.K. performed most wet experiments, including high-throughput evaluation of SpCas9 variant activities. H.K.K. helped substantially with N.K.’s experiments and provided critical technical advice for high-throughput experiments. S.L., S.M., S.Y. and H.K.K. developed DeepSpCas9variants and the related web tools. J.P. and J.W.C. contributed substantially to bioinformatics analyses. J.H.S. and S.-R.C. performed western blotting to measure SpCas9 variant protein levels. Together with H.K.K. and N.K., H.H.K. conceived and designed the study. N.K., H.K.K. and H.H.K. analyzed the data and wrote the manuscript.

Corresponding author

Correspondence to Hyongbum Henry Kim.

Ethics declarations

Competing interests

Yonsei University has filed a patent based on this work, in which N.K., H.K.K. and H.H.K. are the co-inventors (patent no. 10-2019-0127304).

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Text, Supplementary Discussions 1 and 2, Supplementary Figs. 1–17 and Supplementary Notes 1–3.

Reporting Summary

Supplementary Table 1

Library A design and indel frequencies from the library.

Supplementary Table 2

Library B design and indel frequencies from the library.

Supplementary Table 3

Library C design and indel frequencies from the library.

Supplementary Table 4

Frequency of shuffling between sgRNA-encoding and barcode-target sequences.

Supplementary Table 5

Average indel frequencies associated with all possible 4-nt PAM sequences.

Supplementary Table 6

PAM compatibilities determined using the 30 fixed protospacers from library A and a wide range of protospacers from library B.

Supplementary Table 7

Indel frequencies at 30 perfectly matched target sequences used for analyzing specificity. Of 30 guide RNAs, 8 were selected.

Supplementary Table 8

Data sets used for the development and evaluation of DeepSpCas9variants.

Supplementary Table 9

Primer sequences used for experiments.

Source data

Source Data Fig. 1

Unprocessed western blots for Fig. 1b

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kim, N., Kim, H.K., Lee, S. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat Biotechnol 38, 1328–1336 (2020). https://doi.org/10.1038/s41587-020-0537-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41587-020-0537-9

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing