Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Massively parallel evaluation and computational prediction of the activities and specificities of 17 small Cas9s

Abstract

Recently, various small Cas9 orthologs and variants have been reported for use in in vivo delivery applications. Although small Cas9s are particularly suited for this purpose, selecting the most optimal small Cas9 for use at a specific target sequence continues to be challenging. Here, to this end, we have systematically compared the activities of 17 small Cas9s for thousands of target sequences. For each small Cas9, we have characterized the protospacer adjacent motif and determined optimal single guide RNA expression formats and scaffold sequence. High-throughput comparative analyses revealed distinct high- and low-activity groups of small Cas9s. We also developed DeepSmallCas9, a set of computational models predicting the activities of the small Cas9s at matched and mismatched target sequences. Together, this analysis and these computational models provide a useful guide for researchers to select the most suitable small Cas9 for specific applications.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Massively parallel evaluation of the activities of the small Cas9s.
Fig. 2: PAM compatibilities of the small Cas9s in human cells.
Fig. 3: Effects of sgRNA expression formats and scaffold sequences on the activities of the small Cas9s.
Fig. 4: Activities of the small Cas9s and SpCas9 at mismatched target sequences.
Fig. 5: Evaluation of computational models predicting the activities of the small Cas9s.
Fig. 6: Allele-specific gene editing using the small Cas9s.

Similar content being viewed by others

Data availability

The deep sequencing data used in this study are available at the NCBI Sequence Read Archive under BioProject accession number PRJNA807878. The indel frequency datasets used in this study are provided as Supplementary Tables 26. The training and test datasets for DeepSmallCas9 and DeepSpCas9-v2 are provided as Supplementary Table 11. The human genetic variations analyzed in this study are available at https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/archive_2.0/2020/. The reference genomes for human (GRCh38.p13 v.104) and mouse (GRCm39 v.104) are accessible at https://ftp.ensembl.org/pub/release-104/, and protein-coding annotations for human (MANE Select v.0.95) and mouse (RefSeq Select v.109) are accessible at https://ftp.ncbi.nlm.nih.gov/refseq/MANE/MANE_human/release_0.95/ and https://www.ncbi.nlm.nih.gov/nuccore/?term=%22Mus+musculus%22%5BOrganism%5D+AND+Refseq_select%5Bfilter%5D%E2%80%9D+AND+srcdb_refseq%5BPROP%5D, respectively. Source data are provided with this paper.

Code availability

Source codes for DeepSmallCas9 and the custom Python scripts used for the indel frequency calculations are available on GitHub at https://github.com/SangyeonSeo/DeepSmallCas9 and https://github.com/CRISPRJWCHOI/CRISPR_toolkit/tree/master/Indel_searcher_2, respectively.

References

  1. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Cho, S. W., Kim, S., Kim, J. M. & Kim, J. S. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230–232 (2013).

    Article  CAS  PubMed  Google Scholar 

  4. Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Hwang, W. Y. et al. Efficient genome editing in zebrafish using a CRISPR–Cas system. Nat. Biotechnol. 31, 227–229 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR–Cas systems. Nat. Biotechnol. 31, 233–239 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016).

    Article  CAS  PubMed  Google Scholar 

  8. Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Chen, J. S. et al. Enhanced proofreading governs CRISPR–Cas9 targeting accuracy. Nature 550, 407–410 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Casini, A. et al. A highly specific SpCas9 variant is identified by in vivo screening in yeast. Nat. Biotechnol. 36, 265–271 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Lee, J. K. et al. Directed evolution of CRISPR–Cas9 to increase its specificity. Nat. Commun. 9, 3048 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Swiech, L. et al. In vivo interrogation of gene function in the mammalian brain using CRISPR–Cas9. Nat. Biotechnol. 33, 102–106 (2015).

    Article  CAS  PubMed  Google Scholar 

  17. Chew, W. L. et al. A multifunctional AAV–CRISPR–Cas9 and its host response. Nat. Methods 13, 868–874 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Long, C. et al. Postnatal genome editing partially restores dystrophin expression in a mouse model of muscular dystrophy. Science 351, 400–403 (2016).

    Article  CAS  PubMed  Google Scholar 

  19. Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Pardi, N., Hogan, M. J., Porter, F. W. & Weissman, D. mRNA vaccines—a new era in vaccinology. Nat. Rev. Drug Discov. 17, 261–279 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Schmidt, M. J. et al. Improved CRISPR genome editing using small highly active and specific engineered RNA-guided nucleases. Nat. Commun. 12, 4219 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Esvelt, K. M. et al. Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nat. Methods 10, 1116–1121 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Muller, M. et al. Streptococcus thermophilus CRISPR–Cas9 systems enable specific editing of the human genome. Mol. Ther. 24, 636–644 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Agudelo, D. et al. Versatile and robust genome editing with Streptococcus thermophilus CRISPR1–Cas9. Genome Res. 30, 107–117 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Hou, Z. et al. Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis. Proc. Natl Acad. Sci. USA 110, 15644–15649 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Lee, C. M., Cradick, T. J. & Bao, G. The Neisseria meningitidis CRISPR–Cas9 system enables specific genome editing in mammalian cells. Mol. Ther. 24, 645–654 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Amrani, N. et al. NmeCas9 is an intrinsically high-fidelity genome-editing platform. Genome Biol. 19, 214 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Friedland, A. E. et al. Characterization of Staphylococcus aureus Cas9: a smaller Cas9 for all-in-one adeno-associated virus delivery and paired nickase applications. Genome Biol. 16, 257 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Najm, F. J. et al. Orthologous CRISPR–Cas9 enzymes for combinatorial genetic screens. Nat. Biotechnol. 36, 179–189 (2018).

    Article  CAS  PubMed  Google Scholar 

  30. Tycko, J. et al. Pairwise library screen systematically interrogates Staphylococcus aureus Cas9 specificity in human cells. Nat. Commun. 9, 2962 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Kim, E. et al. In vivo genome editing with a small Cas9 orthologue derived from Campylobacter jejuni. Nat. Commun. 8, 14500 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Yamada, M. et al. Crystal structure of the minimal Cas9 from Campylobacter jejuni reveals the molecular diversity in the CRISPR–Cas9 systems. Mol. Cell 65, 1109–1121 e1103 (2017).

    Article  CAS  PubMed  Google Scholar 

  33. Edraki, A. et al. A compact, high-accuracy Cas9 with a dinucleotide PAM for in vivo genome editing. Mol. Cell 73, 714–726.e4 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Hu, Z. et al. A compact Cas9 ortholog from Staphylococcus auricularis (SauriCas9) expands the DNA targeting scope. PLoS Biol. 18, e3000686 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Hu, Z. et al. Discovery and engineering of small SlugCas9 with broad targeting range and high specificity and activity. Nucleic Acids Res. 49, 4008–4019 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Kleinstiver, B. P. et al. Broadening the targeting range of Staphylococcus aureus CRISPR–Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1293–1298 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Tan, Y. et al. Rationally engineered Staphylococcus aureus Cas9 nucleases with high genome-wide specificity. Proc. Natl Acad. Sci. USA 116, 20969–20976 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Xie, H. et al. High-fidelity SaCas9 identified by directional screening in human cells. PLoS Biol. 18, e3000747 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Nakagawa, R. et al. Engineered Campylobacter jejuni Cas9 variant with enhanced activity and broader targeting range. Commun. Biol. 5, 211 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Zafra, M. P. et al. Optimized base editors enable efficient editing in cells, organoids and mice. Nat. Biotechnol. 36, 888–893 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Kim, N. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat. Biotechnol. 38, 1328–1336 (2020).

    Article  CAS  PubMed  Google Scholar 

  43. Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2017).

    Article  CAS  PubMed  Google Scholar 

  44. Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).

    Article  CAS  PubMed  Google Scholar 

  45. Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10, 4284 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).

    Article  CAS  PubMed  Google Scholar 

  48. Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37, 64–72 (2018).

    Article  Google Scholar 

  50. Chen, W. et al. Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair. Nucleic Acids Res. 47, 7989–8003 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. 38, 1037–1043 (2020).

    Article  CAS  PubMed  Google Scholar 

  52. Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198–206 (2021).

    Article  CAS  PubMed  Google Scholar 

  54. Schlub, T. E., Smyth, R. P., Grimm, A. J., Mak, J. & Davenport, M. P. Accurately measuring recombination between closely related HIV-1 genomes. PLoS Comput. Biol. 6, e1000766 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  55. Sack, L. M., Davoli, T., Xu, Q., Li, M. Z. & Elledge, S. J. Sources of error in mammalian genetic screens. G3 6, 2781–2790 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Feldman, D., Singh, A., Garrity, A. J. & Blainey, P. C. Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens. Preprint at bioRxiv https://doi.org/10.1101/262121 (2018).

  57. Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Fu, Y., Sander, J. D., Reyon, D., Cascio, V. M. & Joung, J. K. Improving CRISPR–Cas nuclease specificity using truncated guide RNAs. Nat. Biotechnol. 32, 279–284 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Kim, S., Bae, T., Hwang, J. & Kim, J. S. Rescue of high-specificity Cas9 variants using sgRNAs with matched 5′ nucleotides. Genome Biol. 18, 218 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Zhang, D. et al. Perfectly matched 20-nucleotide guide RNA sequences enable robust genome editing using high-fidelity SpCas9 nucleases. Genome Biol. 18, 191 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  64. Xie, K., Minkenberg, B. & Yang, Y. Boosting CRISPR/Cas9 multiplex editing capability with the endogenous tRNA-processing system. Proc. Natl Acad. Sci. USA 112, 3570–3575 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. He, X. et al. Boosting activity of high-fidelity CRISPR/Cas9 variants using a tRNA(Gln)-processing system in human cells. J. Biol. Chem. 294, 9308–9315 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR–Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Riesenberg, S., Helmbrecht, N., Kanis, P., Maricic, T. & Paabo, S. Improved gRNA secondary structures allow editing of target sites resistant to CRISPR–Cas9 cleavage. Nat. Commun. 13, 489 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR–Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).

    Article  CAS  PubMed  Google Scholar 

  69. Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res. 42, 7473–7485 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Jones, S. K. Jr et al. Massively parallel kinetic profiling of natural and engineered CRISPR nucleases. Nat. Biotechnol. 39, 84–93 (2021).

  71. Courtney, D. G. et al. CRISPR/Cas9 DNA cleavage at SNP-derived PAM enables both in vitro and in vivo KRT12 mutation-specific targeting. Gene Ther. 23, 108–112 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Christie, K. A. et al. Towards personalised allele-specific CRISPR gene editing to treat autosomal dominant disorders. Sci. Rep. 7, 16174 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  73. Bakondi, B. et al. In vivo CRISPR/Cas9 gene editing corrects retinal dystrophy in the S334ter-3 rat model of autosomal dominant retinitis pigmentosa. Mol. Ther. 24, 556–563 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Gao, X. et al. Treatment of autosomal dominant hearing loss by in vivo delivery of genome editing agents. Nature 553, 217–221 (2018).

    Article  CAS  PubMed  Google Scholar 

  75. Gyorgy, B. et al. Allele-specific gene editing prevents deafness in a model of dominant progressive hearing loss. Nat. Med. 25, 1123–1130 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Koo, T. et al. Selective disruption of an oncogenic mutant allele by CRISPR/Cas9 induces efficient tumor regression. Nucleic Acids Res. 45, 7897–7908 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Li, Y. et al. Exploiting the CRISPR/Cas9 PAM constraint for single-nucleotide resolution interventions. PLoS ONE 11, e0144970 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Kim, W. et al. Targeting mutant KRAS with CRISPR-Cas9 controls tumor growth. Genome Res. 28, 374–382 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Cruz, L. et al. Mutant allele-specific CRISPR disruption in DYT1 dystonia fibroblasts restores cell function. Mol. Ther. Nucleic Acids 21, 1–12 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Xie, C. et al. Genome editing with CRISPR/Cas9 in postnatal mice corrects PRKAG2 cardiac syndrome. Cell Res. 26, 1099–1111 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Trochet, D. et al. Allele-specific silencing therapy for Dynamin 2-related dominant centronuclear myopathy. EMBO Mol. Med. 10, 239–253 (2018).

    Article  CAS  PubMed  Google Scholar 

  82. Rabai, A. et al. Allele-specific CRISPR/Cas9 correction of a heterozygous DNM2 mutation rescues centronuclear myopathy cell phenotypes. Mol. Ther. Nucleic Acids 16, 246–256 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).

    Article  CAS  PubMed  Google Scholar 

  84. Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).

    Article  CAS  PubMed  Google Scholar 

  85. Liu, Z. et al. Versatile and efficient in vivo genome editing with compact Streptococcus pasteurianus Cas9. Mol. Ther. 30, 256–267 (2022).

    Article  CAS  PubMed  Google Scholar 

  86. Harrington, L. B. et al. A thermostable Cas9 with increased lifetime in human plasma. Nat. Commun. 8, 1424 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  87. Hirano, S. et al. Structural basis for the promiscuous PAM recognition by Corynebacterium diphtheriae Cas9. Nat. Commun. 10, 1968 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  88. Fedorova, I. et al. PpCas9 from Pasteurella pneumotropica—a compact Type II-C Cas9 ortholog active in human cells. Nucleic Acids Res. 48, 12297–12309 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018).

    Article  CAS  PubMed  Google Scholar 

  90. Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  91. Joung, J. et al. Genome-scale CRISPR–Cas9 knockout and transcriptional activation screening. Nat. Protoc. 12, 828–863 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).

    Article  CAS  PubMed  Google Scholar 

  93. Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  94. Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Krishnapuram, B. et al.) 785–794 (ACM, 2016).

  95. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  96. Cock, P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  97. Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  98. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  CAS  PubMed  Google Scholar 

  99. Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (eds Keeton, K. & Roscoe, T.) 265–283 (USENIX Association, 2016).

  100. Bae, S., Park, J. & Kim, J. S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  101. Howe, K. L. et al. Ensembl 2021. Nucleic Acids Res. 49, D884–D891 (2021).

    Article  CAS  PubMed  Google Scholar 

  102. O'Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    Article  CAS  PubMed  Google Scholar 

  103. Diedenhofen, B. & Musch, J. cocor: a comprehensive solution for the statistical comparison of correlations. PLoS ONE 10, e0121945 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank J. Park, S. Park and Y. Kim for assisting with the experiments. This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2022R1A3B1078084 (H.H.K.) and 2018R1A5A2025079 (H.H.K.)); the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (2022M3A9E4017127 (H.H.K.) and 2022M3A9F3017506 (H.H.K.)); the Korea Drug Development Fund funded by the Ministry of Science and ICT, the Ministry of Trade, Industry, and Energy, and the Ministry of Health and Welfare, Republic of Korea (HN21C0917 (H.H.K.)); the Yonsei Signature Research Cluster Program of 2021-22-0014 (H.H.K.); the Brain Korea 21 FOUR Project for Medical Science (Yonsei University College of Medicine); the SNUH Kun-hee Lee Child Cancer & Rare Disease Project, Republic of Korea (22B-000-0101 (H.H.K.)); the Korea Research Institute of Bioscience and Biotechnology(KRIBB) Research Initiative Program (KGM5162221 (H.H.K.)); and the Korea Health Technology R&D Project funded by the Ministry of Health and Welfare, Republic of Korea (HI21C1314 (H.H.K.)).

Author information

Authors and Affiliations

Authors

Contributions

S.-Y.S. performed the majority of wet experiments, including the high-throughput evaluation of the activities of the small Cas9s. S.-Y.S., S.M. and S.L. developed DeepSmallCas9, DeepSpCas9-v2 and the related web tool. J.H.S., D.B. and S.-R.C. performed western blotting to measure the protein levels of the small Cas9s. J.P. contributed substantially to bioinformatics analyses. H.K.K. and M.S. contributed to the design of the study and provided technical assistance to S.-Y.S. in conducting the experiments and analyzing the data. Together with S.-Y.S., H.H.K. conceived of and designed the study. S.-Y.S. and H.H.K. wrote the manuscript.

Corresponding author

Correspondence to Hyongbum Henry Kim.

Ethics declarations

Competing interests

Yonsei University has filed a patent based on this work, in which S.-Y.S., S.L. and H.H.K. are the co-inventors (patent no. 10-2022-0060290). H.H.K. is a consultant for EcoR1 capital. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Lei Tang and Madhura Mukhopadhyay, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Expression of the small Cas9s in HEK293T cells.

a, Schematic of the small Cas9-expressing cassette. LTR, long terminal repeat; psi, psi packaging signal; RRE, Rev response element; CMV, cytomegalovirus promoter; NLS, nuclear localization signal from SV40 T antigen; FLAG, FLAG-tag; P2A, self-cleaving 2 A peptide from porcine teschovirus-1; BlastR, blasticidin selection marker; WPRE, woodchuck hepatitis virus post-transcriptional regulatory element. b, Representative images of Western blotting used to measure the amount of the small Cas9 proteins in HEK293T cells transduced with the lentiviral vectors encoding the small Cas9s. The levels of the small Cas9 proteins were determined using a FLAG-tag; β-actin was used as a loading control. Unavoidably, images from two Western blotting experiments conducted in parallel are shown because two gels were required to accommodate the 18 evaluated Cas9 proteins. Unprocessed images are available in Source Data Extended Data Fig. 1. c, Relative levels of the small Cas9 proteins. Data represent mean ± SD. The numbers of replicates (n) are as follows: SaCas9, n = 12; SaCas9-KKH, n = 12; SaCas9*, n = 8; St1Cas9, n = 8; Nm1Cas9, n = 8; Nm2Cas9, n = 8; CjCas9, n = 8; SauriCas9, n = 5; SauriCas9-KKH, n = 5; sRGN3.1, n = 5; SlugCas9, n = 5; SlugCas9-HF, n = 5; Sa-SlugCas9, n = 5; SaCas9-HF, n = 5; efSaCas9, n = 5; eSaCas9, n = 5; SaCas9-KKH-HF, n = 5; enCjCas9, n = 5. Subsets of the Cas9 protein levels normalized to the β-actin protein levels without statistically significant differences (one-way analysis of variance followed by Bonferroni post-hoc test) are represented with the letters a, b, c, and d.

Source data

Extended Data Fig. 2 PAM compatibilities of small Cas9s in human cells.

a, Heatmaps showing the average indel frequencies in the target sequences with the indicated PAM sequences. Indel frequencies were measured four days after transduction of the paired libraries in SaCas9-, SaCas9-KKH-, eSaCas9-, and efSaCas9-expressing cells; in the cells expressing the other three small Cas9s, indel frequencies were measured seven days after transduction. Protospacers for which the highest indel frequencies were < 5% across candidate PAM sequences were excluded from the analyses. Fixed positions and nucleotides are indicated above each heatmap. For instance, to evaluate the preferences for the 3rd, 4th, and 5th nucleotides of the PAM for SaCas9, the 6th and 7th nucleotides of the PAM were fixed as TN. The numbers of analyzed protospacers (n) are as follows: SaCas9, n = 29; SaCas9-KKH, n = 29; eSaCas9, n = 29; efSaCas9, n = 29; SaCas9-HF, n = 29; SaCas9-KKH-HF, n = 29; St1Cas9, n = 27. b, Summary of the analyzed PAM compatibilities.

Source data

Extended Data Fig. 3 Activities of sRGN3.1 and SlugCas9 at diverse potential off-target sequences.

a,b, Comparison of the activities of sRGN3.1 and SlugCas9 at different potential off-target sequences. The average relative indel frequencies are indicated using the red plus symbols. The numbers of analyzed target sequences (n) are as follows: sRGN3.1, n = 3,715 (1-bp mismatch), 1,195 (2-bp mismatch), 588 (3-bp mismatch), 1,263 (1-nt deletion), and 1,131 (1-nt insertion); SlugCas9, n = 3,715 (1-bp mismatch), 1,195 (2-bp mismatch), 588 (3-bp mismatch), 1,263 (1-nt deletion), and 1,131 (1-nt insertion). Subsets of the small Cas9-induced relative indel frequencies without statistically significant differences (one-way analysis of variance followed by Bonferroni post-hoc test) are represented with the letters a, b, c, and d. c–h, Heatmaps showing the average specificities of sRGN3.1 (c,e,g) and SlugCas9 (d,f,h) when there were 1-bp mismatches (c,d), 1-nt RNA bulges (e,f), or 1-nt DNA bulges (g,h) between sgRNAs and target sequences with a primary or secondary PAM. The specificity was calculated as 1 − (indel frequency at mismatched target sequences divided by that at perfectly matched targets). c,d, To distinguish mismatch types, wobble, nonwobble, and transversion mismatches are shown in red, green, and blue, respectively. i-l, Box plots showing the effects of deleted (i,j) or inserted (k,l) bases on the activities of sRGN3.1 (i,k) and SlugCas9 (j,l). The numbers of analyzed target sequences (n) are as follows: sRGN3.1 RNA bulge, n = 271 (A), 325 (T), 329 (C), and 338 (G); SlugCas9 RNA bulge, n = 271 (A), 325 (T), 329 (C), and 338 (G); sRGN3.1 DNA bulge, n = 279 (C), 308 (G), 289 (T), and 255 (A); SlugCas9 DNA bulge, n = 308 (G), 279 (C), 289 (T), and 255 (A). Subsets of the small Cas9-induced relative indel frequencies without statistically significant differences (one-way analysis of variance followed by Bonferroni post-hoc test) are represented with the letters a and b. a,b,i–l, Indel frequencies were normalized to those at perfectly matched target sequences. Boxes represent the 25th, 50th, and 75th percentiles and whiskers show the 10th and 90th percentiles.

Source data

Extended Data Fig. 4 Development of DeepSmallCas9.

DeepSmallCas9 is a set of deep learning-based models that predict the activities of the small Cas9s at matched and mismatched target sequences. Additional features include the melting temperature (Tm), the number of G or C nucleotides (GC count), the minimum free energy (MFE), and the mismatch position and type between guide and protospacer sequences (mismatch profile). See also Methods.

Extended Data Fig. 5 Performance comparison of algorithms used to develop computational models that predict the activities of the small Cas9s.

Heatmaps showing correlations between the measured and computationally predicted indel frequencies. Average Pearson (top) and Spearman (bottom) correlation coefficients were calculated from five-fold cross-validation. The algorithms that showed the highest average correlation coefficients are shown in bold. XGBoost, extreme gradient boosting; Boosted RT, gradient-boosted regression trees; Lasso, L1-regularized linear regression; Ridge, L2-regularized linear regression; Elastic Net, L1 and L2-regularized linear regression; RF, random forest; SVM, support vector machine.

Source data

Extended Data Fig. 6 Comparison of the performance of DeepSmallCas9 with those of existing computational models predicting SaCas9 activity.

a,b, Evaluation of DeepSmallCas9 and ‘SaCas9 on-target rules’ (ref. 29), an existing computational model predicting SaCas9 activities at matched target sequences, using the fraction of the hold-out test dataset including matched targets with NNGRRN PAM; n = 3,975. c,d, Evaluation of DeepSmallCas9 and ‘Model of SaCas9 specificity’ (ref. 30), an existing computational model predicting SaCas9 activities at mismatched target sequences, using the fraction of the hold-out test dataset including mismatched targets with NNGRRT PAM; n = 217. Predicted activities at mismatched targets were normalized to those at perfectly matched targets. a,c, The Spearman correlation coefficient (Rho) and the Pearson correlation coefficient (r) are shown. Dashed line represents y = x. b,d, Data indicate correlation coefficient ± 95% confidence interval. Statistically significant differences between two correlations were determined by two-tailed Steiger’s z-test. The P-values from left to right are < 2.2 × 10-16, < 2.2 × 10-16, 3.1 × 10-4, and 4.7 × 10-8.

Source data

Extended Data Fig. 7 Evaluation and prediction of the activities of four small Cas9s in three different cell lines.

Cell lines expressing sRGN3.1, efSaCas9, SauriCas9-KKH, or Nm2Cas9 were transduced with lentiviral pairwise libraries of sgRNA-encoding sequences and target sequences. Four days after transduction, the indel frequencies were measured. In addition, the indel frequencies were predicted using DeepSmallCas9. a, Measured activities of four small Cas9s in three cell lines. Boxes represent the 25th, 50th, and 75th percentiles and whiskers show the 10th and 90th percentiles. Guide formats, PAM sequences, and the numbers of target sequences (n) analyzed for the small Cas9s are as follows: sRGN3.1, G/gN21, NNGGRT, and n = 197 (DLD-1), 197 (HCT116), and 4,809 (HEK293T); efSaCas9, G/gN21, NNGRRT, and n = 394, 394, and 9,514; SauriCas9-KKH, G/gN21, NNGGRT, and n = 197, 197, and 4,855; Nm2Cas9, G/gN22, NNNNCCA, and n = 95, 95, and 2,304. Subsets of the small Cas9-induced indel frequencies without statistically significant differences (one-way analysis of variance followed by Bonferroni post-hoc test) are represented with the letters a, b, c, and d. b, Correlations between predicted and measured activities of four small Cas9s. Results of four Cas9s in each cell line were combined to generate one dataset per cell line. The Spearman correlation coefficient (Rho) and the Pearson correlation coefficient (r) are shown. Red dashed line represents y = x. Guide formats, PAM sequences, and the numbers of target sequences (n) analyzed for the small Cas9s are as follows: sRGN3.1, G/gN21, NNGRRT, and n = 394 (DLD-1), 394 (HCT116), and 951 (HEK293T); efSaCas9, G/gN21, NNGRRT, and n = 394, 394, and 946; SauriCas9-KKH, G/gN21, NNGRRT, and n = 394, 394, and 988; Nm2Cas9, G/gN22, NNNNCCN, and n = 362, 362, and 962.

Source data

Extended Data Fig. 8 Computational prediction of preferred small Cas9s at targets with diverse PAM sequences.

a, Heatmap showing the most efficient Cas9 out of eight highly active small Cas9s, which include sRGN3.1, SlugCas9, SaCas9, SauriCas9, Sa-SlugCas9, SaCas9-KKH, eSaCas9, and efSaCas9, at target sequences with a given PAM sequence. To compare the activities of the small Cas9s at sites with 4,096 (= 46) PAMs (all possible NNNNNN sequences for the 1st–6th nucleotides of the PAM), 204,800 target sequences were generated by combining 50 randomly designed protospacer sequences and 4,096 PAM sequences and used as input data for the prediction of the activities (i.e., the induced indel frequencies) using DeepSmallCas9. The color-coded squares represent the small Cas9 that is predicted to be the most efficient, in cases in which the average indel frequency is higher than 10%, at a given PAM sequence. When the predicted average indel frequencies of the most efficient small Cas9s at given target sequences are lower than 10%, the squares representing those PAM sequences are shown in white. The color-code for each Cas9 is shown in b. b, Pie chart showing the number of PAM sequences that could be most efficiently targeted with each Cas9 with an average activity higher than 10%. c, Bar graph showing the number of efficiently targetable PAM sequences out of 4,096 (= 46) PAMs for each Cas9 with an average activity higher than 10%.

Source data

Extended Data Fig. 9 SlugCas9-, SaCas9-KKH-, SlugCas9-HF, Sa-SlugCas9-, or efSaCas9-directed targeting of dominant single-nucleotide variants with or without using DeepSmallCas9 to select sgRNAs.

Pie charts showing the fraction of the dominant single-nucleotide variants in protein-coding sequences in the ClinVar database (ref. 83,94) that can be edited using SlugCas9 (a), SaCas9-KKH (b), SlugCas9-HF (c), Sa-SlugCas9 (d), or efSaCas9 (e) in an efficient and allele-specific manner (on-target activity higher than 10% and off-target activity lower than 2%). Mutations for which no designed sgRNAs met these criteria were classified as either inefficient or nonspecific and those for which no mutant allele-targeting sgRNAs could be designed due to the lack of a nearby PAM were classified as untargetable. (Left pie charts) The specified small Cas9s were chosen and the most appropriate sgRNAs were designed using DeepSmallCas9 such that both the activity at the mutant allele and the allele-specificity are high. (Right pie charts) The specified small Cas9s were chosen and sgRNAs were designed to target given mutations such that the mutations were located in regions in the target sequence with the following order of preference: i) the PAM, ii) the highly selective protospacer region (within 10 bp from the PAM), and iii) the remaining region in the protospacer. The activities at the mutant and corresponding wild-type alleles were predicted afterwards using DeepSmallCas9. (Box plots) The predicted activities of selected Cas9-sgRNA combinations at mutant and wild-type alleles for the indicated SNVs. Boxes represent the 25th, 50th, and 75th percentiles and whiskers show the 10th and 90th percentiles. The fold differences between the average activities at mutant and wild-type alleles are shown (e.g., 34x).

Source data

Extended Data Fig. 10 Allele-specific gene editing using the small Cas9s and SpCas9.

Of the 13,145 dominant SNVs in protein-coding sequences from the ClinVar database, pie charts show the numbers of dominant SNVs that could be most efficiently and allele-specifically targeted with the indicated Cas9s. (Top pie chart) DeepSmallCas9- and DeepSpCas9-v2-assisted selection of Cas9-sgRNA combinations allowed efficient (expected indel frequency at the mutant allele (on-target) > 10%) and allele-specific (expected indel frequency at the wild-type allele (off-target) < 2%) targeting of 10,925 of the 13,145 SNVs. (Bottom pie chart) Random selection of Cas9 and sgRNA pairs resulted in efficient and allele-specific targeting for only 678 SNVs. (Box plots) The predicted activities of selected Cas9-sgRNA combinations at mutant and wild-type alleles for the indicated SNVs. Boxes represent the 25th, 50th, and 75th percentiles and whiskers show the 10th and 90th percentiles. The fold differences between the average activities at mutant and wild-type alleles are shown (e.g., 37x).

Source data

Supplementary information

Supplementary Information

Supplementary Texts 1–5, Figs. 1–15, Tables 1–13, Notes 1–3 and references.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–13.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data and unprocessed blots.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Source Data Extended Data Fig. 10

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seo, SY., Min, S., Lee, S. et al. Massively parallel evaluation and computational prediction of the activities and specificities of 17 small Cas9s. Nat Methods 20, 999–1009 (2023). https://doi.org/10.1038/s41592-023-01875-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41592-023-01875-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing