Massively parallel evaluation and computational prediction of the activities and specificities of 17 small Cas9s

Seo, Sang-Yeon; Min, Seonwoo; Lee, Sungtae; Seo, Jung Hwa; Park, Jinman; Kim, Hui Kwon; Song, Myungjae; Baek, Dawoon; Cho, Sung-Rae; Kim, Hyongbum Henry

doi:10.1038/s41592-023-01875-2

Analysis
Published: 15 May 2023

Massively parallel evaluation and computational prediction of the activities and specificities of 17 small Cas9s

Nature Methods volume 20, pages 999–1009 (2023)Cite this article

4725 Accesses
5 Citations
9 Altmetric
Metrics details

Subjects

Abstract

Recently, various small Cas9 orthologs and variants have been reported for use in in vivo delivery applications. Although small Cas9s are particularly suited for this purpose, selecting the most optimal small Cas9 for use at a specific target sequence continues to be challenging. Here, to this end, we have systematically compared the activities of 17 small Cas9s for thousands of target sequences. For each small Cas9, we have characterized the protospacer adjacent motif and determined optimal single guide RNA expression formats and scaffold sequence. High-throughput comparative analyses revealed distinct high- and low-activity groups of small Cas9s. We also developed DeepSmallCas9, a set of computational models predicting the activities of the small Cas9s at matched and mismatched target sequences. Together, this analysis and these computational models provide a useful guide for researchers to select the most suitable small Cas9 for specific applications.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Massively parallel evaluation of the activities of the small Cas9s.**

**Fig. 2: PAM compatibilities of the small Cas9s in human cells.**

**Fig. 3: Effects of sgRNA expression formats and scaffold sequences on the activities of the small Cas9s.**

**Fig. 4: Activities of the small Cas9s and SpCas9 at mismatched target sequences.**

**Fig. 5: Evaluation of computational models predicting the activities of the small Cas9s.**

**Fig. 6: Allele-specific gene editing using the small Cas9s.**

Massively parallel Cas13 screens reveal principles for guide RNA design

Article 16 March 2020

Sniper2L is a high-fidelity Cas9 variant with high activity

Article Open access 09 March 2023

Massively parallel kinetic profiling of natural and engineered CRISPR nucleases

Article 07 September 2020

Data availability

The deep sequencing data used in this study are available at the NCBI Sequence Read Archive under BioProject accession number PRJNA807878. The indel frequency datasets used in this study are provided as Supplementary Tables 2–6. The training and test datasets for DeepSmallCas9 and DeepSpCas9-v2 are provided as Supplementary Table 11. The human genetic variations analyzed in this study are available at https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/archive_2.0/2020/. The reference genomes for human (GRCh38.p13 v.104) and mouse (GRCm39 v.104) are accessible at https://ftp.ensembl.org/pub/release-104/, and protein-coding annotations for human (MANE Select v.0.95) and mouse (RefSeq Select v.109) are accessible at https://ftp.ncbi.nlm.nih.gov/refseq/MANE/MANE_human/release_0.95/ and https://www.ncbi.nlm.nih.gov/nuccore/?term=%22Mus+musculus%22%5BOrganism%5D+AND+Refseq_select%5Bfilter%5D%E2%80%9D+AND+srcdb_refseq%5BPROP%5D, respectively. Source data are provided with this paper.

Code availability

Source codes for DeepSmallCas9 and the custom Python scripts used for the indel frequency calculations are available on GitHub at https://github.com/SangyeonSeo/DeepSmallCas9 and https://github.com/CRISPRJWCHOI/CRISPR_toolkit/tree/master/Indel_searcher_2, respectively.

References

Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).
Article CAS PubMed PubMed Central Google Scholar
Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).
Article CAS PubMed PubMed Central Google Scholar
Cho, S. W., Kim, S., Kim, J. M. & Kim, J. S. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230–232 (2013).
Article CAS PubMed Google Scholar
Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).
Article PubMed PubMed Central Google Scholar
Hwang, W. Y. et al. Efficient genome editing in zebrafish using a CRISPR–Cas system. Nat. Biotechnol. 31, 227–229 (2013).
Article CAS PubMed PubMed Central Google Scholar
Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR–Cas systems. Nat. Biotechnol. 31, 233–239 (2013).
Article CAS PubMed PubMed Central Google Scholar
Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016).
Article CAS PubMed Google Scholar
Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. S. et al. Enhanced proofreading governs CRISPR–Cas9 targeting accuracy. Nature 550, 407–410 (2017).
Article CAS PubMed PubMed Central Google Scholar
Casini, A. et al. A highly specific SpCas9 variant is identified by in vivo screening in yeast. Nat. Biotechnol. 36, 265–271 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lee, J. K. et al. Directed evolution of CRISPR–Cas9 to increase its specificity. Nat. Commun. 9, 3048 (2018).
Article PubMed PubMed Central Google Scholar
Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).
Article PubMed PubMed Central Google Scholar
Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell 61, 895–902 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).
Article CAS PubMed PubMed Central Google Scholar
Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).
Article CAS PubMed PubMed Central Google Scholar
Swiech, L. et al. In vivo interrogation of gene function in the mammalian brain using CRISPR–Cas9. Nat. Biotechnol. 33, 102–106 (2015).
Article CAS PubMed Google Scholar
Chew, W. L. et al. A multifunctional AAV–CRISPR–Cas9 and its host response. Nat. Methods 13, 868–874 (2016).
Article CAS PubMed PubMed Central Google Scholar
Long, C. et al. Postnatal genome editing partially restores dystrophin expression in a mouse model of muscular dystrophy. Science 351, 400–403 (2016).
Article CAS PubMed Google Scholar
Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pardi, N., Hogan, M. J., Porter, F. W. & Weissman, D. mRNA vaccines—a new era in vaccinology. Nat. Rev. Drug Discov. 17, 261–279 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schmidt, M. J. et al. Improved CRISPR genome editing using small highly active and specific engineered RNA-guided nucleases. Nat. Commun. 12, 4219 (2021).
Article CAS PubMed PubMed Central Google Scholar
Esvelt, K. M. et al. Orthogonal Cas9 proteins for RNA-guided gene regulation and editing. Nat. Methods 10, 1116–1121 (2013).
Article CAS PubMed PubMed Central Google Scholar
Muller, M. et al. Streptococcus thermophilus CRISPR–Cas9 systems enable specific editing of the human genome. Mol. Ther. 24, 636–644 (2016).
Article PubMed PubMed Central Google Scholar
Agudelo, D. et al. Versatile and robust genome editing with Streptococcus thermophilus CRISPR1–Cas9. Genome Res. 30, 107–117 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hou, Z. et al. Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis. Proc. Natl Acad. Sci. USA 110, 15644–15649 (2013).
Article CAS PubMed PubMed Central Google Scholar
Lee, C. M., Cradick, T. J. & Bao, G. The Neisseria meningitidis CRISPR–Cas9 system enables specific genome editing in mammalian cells. Mol. Ther. 24, 645–654 (2016).
Article CAS PubMed PubMed Central Google Scholar
Amrani, N. et al. NmeCas9 is an intrinsically high-fidelity genome-editing platform. Genome Biol. 19, 214 (2018).
Article CAS PubMed PubMed Central Google Scholar
Friedland, A. E. et al. Characterization of Staphylococcus aureus Cas9: a smaller Cas9 for all-in-one adeno-associated virus delivery and paired nickase applications. Genome Biol. 16, 257 (2015).
Article PubMed PubMed Central Google Scholar
Najm, F. J. et al. Orthologous CRISPR–Cas9 enzymes for combinatorial genetic screens. Nat. Biotechnol. 36, 179–189 (2018).
Article CAS PubMed Google Scholar
Tycko, J. et al. Pairwise library screen systematically interrogates Staphylococcus aureus Cas9 specificity in human cells. Nat. Commun. 9, 2962 (2018).
Article PubMed PubMed Central Google Scholar
Kim, E. et al. In vivo genome editing with a small Cas9 orthologue derived from Campylobacter jejuni. Nat. Commun. 8, 14500 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yamada, M. et al. Crystal structure of the minimal Cas9 from Campylobacter jejuni reveals the molecular diversity in the CRISPR–Cas9 systems. Mol. Cell 65, 1109–1121 e1103 (2017).
Article CAS PubMed Google Scholar
Edraki, A. et al. A compact, high-accuracy Cas9 with a dinucleotide PAM for in vivo genome editing. Mol. Cell 73, 714–726.e4 (2018).
Article PubMed PubMed Central Google Scholar
Hu, Z. et al. A compact Cas9 ortholog from Staphylococcus auricularis (SauriCas9) expands the DNA targeting scope. PLoS Biol. 18, e3000686 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hu, Z. et al. Discovery and engineering of small SlugCas9 with broad targeting range and high specificity and activity. Nucleic Acids Res. 49, 4008–4019 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kleinstiver, B. P. et al. Broadening the targeting range of Staphylococcus aureus CRISPR–Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1293–1298 (2015).
Article CAS PubMed PubMed Central Google Scholar
Tan, Y. et al. Rationally engineered Staphylococcus aureus Cas9 nucleases with high genome-wide specificity. Proc. Natl Acad. Sci. USA 116, 20969–20976 (2019).
Article CAS PubMed PubMed Central Google Scholar
Xie, H. et al. High-fidelity SaCas9 identified by directional screening in human cells. PLoS Biol. 18, e3000747 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nakagawa, R. et al. Engineered Campylobacter jejuni Cas9 variant with enhanced activity and broader targeting range. Commun. Biol. 5, 211 (2022).
Article CAS PubMed PubMed Central Google Scholar
Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zafra, M. P. et al. Optimized base editors enable efficient editing in cells, organoids and mice. Nat. Biotechnol. 36, 888–893 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kim, N. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat. Biotechnol. 38, 1328–1336 (2020).
Article CAS PubMed Google Scholar
Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2017).
Article CAS PubMed Google Scholar
Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).
Article CAS PubMed Google Scholar
Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).
Article CAS PubMed PubMed Central Google Scholar
Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10, 4284 (2019).
Article PubMed PubMed Central Google Scholar
Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).
Article CAS PubMed Google Scholar
Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).
Article CAS PubMed PubMed Central Google Scholar
Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37, 64–72 (2018).
Article Google Scholar
Chen, W. et al. Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair. Nucleic Acids Res. 47, 7989–8003 (2019).
Article CAS PubMed PubMed Central Google Scholar
Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. 38, 1037–1043 (2020).
Article CAS PubMed Google Scholar
Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kim, H. K. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat. Biotechnol. 39, 198–206 (2021).
Article CAS PubMed Google Scholar
Schlub, T. E., Smyth, R. P., Grimm, A. J., Mak, J. & Davenport, M. P. Accurately measuring recombination between closely related HIV-1 genomes. PLoS Comput. Biol. 6, e1000766 (2010).
Article PubMed PubMed Central Google Scholar
Sack, L. M., Davoli, T., Xu, Q., Li, M. Z. & Elledge, S. J. Sources of error in mammalian genetic screens. G3 6, 2781–2790 (2016).
Article CAS PubMed PubMed Central Google Scholar
Feldman, D., Singh, A., Garrity, A. J. & Blainey, P. C. Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens. Preprint at bioRxiv https://doi.org/10.1101/262121 (2018).
Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Article PubMed PubMed Central Google Scholar
Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).
Article CAS PubMed PubMed Central Google Scholar
Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Article CAS PubMed PubMed Central Google Scholar
Fu, Y., Sander, J. D., Reyon, D., Cascio, V. M. & Joung, J. K. Improving CRISPR–Cas nuclease specificity using truncated guide RNAs. Nat. Biotechnol. 32, 279–284 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kim, S., Bae, T., Hwang, J. & Kim, J. S. Rescue of high-specificity Cas9 variants using sgRNAs with matched 5′ nucleotides. Genome Biol. 18, 218 (2017).
Article PubMed PubMed Central Google Scholar
Zhang, D. et al. Perfectly matched 20-nucleotide guide RNA sequences enable robust genome editing using high-fidelity SpCas9 nucleases. Genome Biol. 18, 191 (2017).
Article PubMed PubMed Central Google Scholar
Xie, K., Minkenberg, B. & Yang, Y. Boosting CRISPR/Cas9 multiplex editing capability with the endogenous tRNA-processing system. Proc. Natl Acad. Sci. USA 112, 3570–3575 (2015).
Article CAS PubMed PubMed Central Google Scholar
He, X. et al. Boosting activity of high-fidelity CRISPR/Cas9 variants using a tRNA(Gln)-processing system in human cells. J. Biol. Chem. 294, 9308–9315 (2019).
Article PubMed PubMed Central Google Scholar
Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR–Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).
Article PubMed PubMed Central Google Scholar
Riesenberg, S., Helmbrecht, N., Kanis, P., Maricic, T. & Paabo, S. Improved gRNA secondary structures allow editing of target sites resistant to CRISPR–Cas9 cleavage. Nat. Commun. 13, 489 (2022).
Article CAS PubMed PubMed Central Google Scholar
Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR–Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).
Article CAS PubMed Google Scholar
Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res. 42, 7473–7485 (2014).
Article CAS PubMed PubMed Central Google Scholar
Jones, S. K. Jr et al. Massively parallel kinetic profiling of natural and engineered CRISPR nucleases. Nat. Biotechnol. 39, 84–93 (2021).
Courtney, D. G. et al. CRISPR/Cas9 DNA cleavage at SNP-derived PAM enables both in vitro and in vivo KRT12 mutation-specific targeting. Gene Ther. 23, 108–112 (2015).
Article PubMed PubMed Central Google Scholar
Christie, K. A. et al. Towards personalised allele-specific CRISPR gene editing to treat autosomal dominant disorders. Sci. Rep. 7, 16174 (2017).
Article PubMed PubMed Central Google Scholar
Bakondi, B. et al. In vivo CRISPR/Cas9 gene editing corrects retinal dystrophy in the S334ter-3 rat model of autosomal dominant retinitis pigmentosa. Mol. Ther. 24, 556–563 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gao, X. et al. Treatment of autosomal dominant hearing loss by in vivo delivery of genome editing agents. Nature 553, 217–221 (2018).
Article CAS PubMed Google Scholar
Gyorgy, B. et al. Allele-specific gene editing prevents deafness in a model of dominant progressive hearing loss. Nat. Med. 25, 1123–1130 (2019).
Article CAS PubMed PubMed Central Google Scholar
Koo, T. et al. Selective disruption of an oncogenic mutant allele by CRISPR/Cas9 induces efficient tumor regression. Nucleic Acids Res. 45, 7897–7908 (2017).
Article CAS PubMed PubMed Central Google Scholar
Li, Y. et al. Exploiting the CRISPR/Cas9 PAM constraint for single-nucleotide resolution interventions. PLoS ONE 11, e0144970 (2016).
Article PubMed PubMed Central Google Scholar
Kim, W. et al. Targeting mutant KRAS with CRISPR-Cas9 controls tumor growth. Genome Res. 28, 374–382 (2018).
Article CAS PubMed PubMed Central Google Scholar
Cruz, L. et al. Mutant allele-specific CRISPR disruption in DYT1 dystonia fibroblasts restores cell function. Mol. Ther. Nucleic Acids 21, 1–12 (2020).
Article CAS PubMed PubMed Central Google Scholar
Xie, C. et al. Genome editing with CRISPR/Cas9 in postnatal mice corrects PRKAG2 cardiac syndrome. Cell Res. 26, 1099–1111 (2016).
Article CAS PubMed PubMed Central Google Scholar
Trochet, D. et al. Allele-specific silencing therapy for Dynamin 2-related dominant centronuclear myopathy. EMBO Mol. Med. 10, 239–253 (2018).
Article CAS PubMed Google Scholar
Rabai, A. et al. Allele-specific CRISPR/Cas9 correction of a heterozygous DNM2 mutation rescues centronuclear myopathy cell phenotypes. Mol. Ther. Nucleic Acids 16, 246–256 (2019).
Article CAS PubMed PubMed Central Google Scholar
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Article CAS PubMed Google Scholar
Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).
Article CAS PubMed Google Scholar
Liu, Z. et al. Versatile and efficient in vivo genome editing with compact Streptococcus pasteurianus Cas9. Mol. Ther. 30, 256–267 (2022).
Article CAS PubMed Google Scholar
Harrington, L. B. et al. A thermostable Cas9 with increased lifetime in human plasma. Nat. Commun. 8, 1424 (2017).
Article PubMed PubMed Central Google Scholar
Hirano, S. et al. Structural basis for the promiscuous PAM recognition by Corynebacterium diphtheriae Cas9. Nat. Commun. 10, 1968 (2019).
Article PubMed PubMed Central Google Scholar
Fedorova, I. et al. PpCas9 from Pasteurella pneumotropica—a compact Type II-C Cas9 ortholog active in human cells. Nucleic Acids Res. 48, 12297–12309 (2020).
Article CAS PubMed PubMed Central Google Scholar
Organick, L. et al. Random access in large-scale DNA data storage. Nat. Biotechnol. 36, 242–248 (2018).
Article CAS PubMed Google Scholar
Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).
Article CAS PubMed PubMed Central Google Scholar
Joung, J. et al. Genome-scale CRISPR–Cas9 knockout and transcriptional activation screening. Nat. Protoc. 12, 828–863 (2017).
Article CAS PubMed PubMed Central Google Scholar
Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).
Article CAS PubMed Google Scholar
Clement, K. et al. CRISPResso2 provides accurate and rapid genome editing sequence analysis. Nat. Biotechnol. 37, 224–226 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (eds Krishnapuram, B. et al.) 785–794 (ACM, 2016).
Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Google Scholar
Cock, P. J. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Article CAS PubMed PubMed Central Google Scholar
Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Article PubMed PubMed Central Google Scholar
Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
Article CAS PubMed Google Scholar
Abadi, M. et al. TensorFlow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (eds Keeton, K. & Roscoe, T.) 265–283 (USENIX Association, 2016).
Bae, S., Park, J. & Kim, J. S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014).
Article CAS PubMed PubMed Central Google Scholar
Howe, K. L. et al. Ensembl 2021. Nucleic Acids Res. 49, D884–D891 (2021).
Article CAS PubMed Google Scholar
O'Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Article CAS PubMed Google Scholar
Diedenhofen, B. & Musch, J. cocor: a comprehensive solution for the statistical comparison of correlations. PLoS ONE 10, e0121945 (2015).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank J. Park, S. Park and Y. Kim for assisting with the experiments. This work was supported in part by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (2022R1A3B1078084 (H.H.K.) and 2018R1A5A2025079 (H.H.K.)); the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Korean government (MSIT) (2022M3A9E4017127 (H.H.K.) and 2022M3A9F3017506 (H.H.K.)); the Korea Drug Development Fund funded by the Ministry of Science and ICT, the Ministry of Trade, Industry, and Energy, and the Ministry of Health and Welfare, Republic of Korea (HN21C0917 (H.H.K.)); the Yonsei Signature Research Cluster Program of 2021-22-0014 (H.H.K.); the Brain Korea 21 FOUR Project for Medical Science (Yonsei University College of Medicine); the SNUH Kun-hee Lee Child Cancer & Rare Disease Project, Republic of Korea (22B-000-0101 (H.H.K.)); the Korea Research Institute of Bioscience and Biotechnology(KRIBB) Research Initiative Program (KGM5162221 (H.H.K.)); and the Korea Health Technology R&D Project funded by the Ministry of Health and Welfare, Republic of Korea (HI21C1314 (H.H.K.)).

Author information

Authors and Affiliations

Department of Pharmacology, Yonsei University College of Medicine, Seoul, Republic of Korea
Sang-Yeon Seo, Sungtae Lee, Jinman Park, Hui Kwon Kim, Myungjae Song & Hyongbum Henry Kim
Graduate School of Medical Science, Brain Korea 21 Project, Yonsei University College of Medicine, Seoul, Republic of Korea
Sang-Yeon Seo, Jung Hwa Seo, Jinman Park, Hui Kwon Kim, Myungjae Song, Sung-Rae Cho & Hyongbum Henry Kim
LG AI Research, Seoul, Republic of Korea
Seonwoo Min
Department and Research Institute of Rehabilitation Medicine, Yonsei University College of Medicine, Seoul, Republic of Korea
Jung Hwa Seo, Dawoon Baek & Sung-Rae Cho
Center for Nanomedicine, Institute for Basic Science (IBS), Seoul, Republic of Korea
Hui Kwon Kim & Hyongbum Henry Kim
Graduate Program of Nano Biomedical Engineering (NanoBME), Advanced Science Institute, Yonsei University, Seoul, Republic of Korea
Hui Kwon Kim & Hyongbum Henry Kim
Department of Integrative Biotechnology, Sungkyunkwan University, Suwon, Republic of Korea
Hui Kwon Kim
Department of Rehabilitation Medicine, Yonsei University Wonju College of Medicine, Wonju, Republic of Korea
Dawoon Baek
Graduate Program of Biomedical Engineering, Yonsei University College of Medicine, Seoul, Republic of Korea
Sung-Rae Cho
Severance Biomedical Science Institute, Yonsei University College of Medicine, Seoul, Republic of Korea
Hyongbum Henry Kim
Institute for Immunology and Immunological Diseases, Yonsei University College of Medicine, Seoul, Republic of Korea
Hyongbum Henry Kim

Authors

Sang-Yeon Seo
View author publications
You can also search for this author in PubMed Google Scholar
Seonwoo Min
View author publications
You can also search for this author in PubMed Google Scholar
Sungtae Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jung Hwa Seo
View author publications
You can also search for this author in PubMed Google Scholar
Jinman Park
View author publications
You can also search for this author in PubMed Google Scholar
Hui Kwon Kim
View author publications
You can also search for this author in PubMed Google Scholar
Myungjae Song
View author publications
You can also search for this author in PubMed Google Scholar
Dawoon Baek
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Rae Cho
View author publications
You can also search for this author in PubMed Google Scholar
Hyongbum Henry Kim
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.-Y.S. performed the majority of wet experiments, including the high-throughput evaluation of the activities of the small Cas9s. S.-Y.S., S.M. and S.L. developed DeepSmallCas9, DeepSpCas9-v2 and the related web tool. J.H.S., D.B. and S.-R.C. performed western blotting to measure the protein levels of the small Cas9s. J.P. contributed substantially to bioinformatics analyses. H.K.K. and M.S. contributed to the design of the study and provided technical assistance to S.-Y.S. in conducting the experiments and analyzing the data. Together with S.-Y.S., H.H.K. conceived of and designed the study. S.-Y.S. and H.H.K. wrote the manuscript.

Corresponding author

Correspondence to Hyongbum Henry Kim.

Ethics declarations

Competing interests

Yonsei University has filed a patent based on this work, in which S.-Y.S., S.L. and H.H.K. are the co-inventors (patent no. 10-2022-0060290). H.H.K. is a consultant for EcoR1 capital. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editors: Lei Tang and Madhura Mukhopadhyay, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Expression of the small Cas9s in HEK293T cells.

a, Schematic of the small Cas9-expressing cassette. LTR, long terminal repeat; psi, psi packaging signal; RRE, Rev response element; CMV, cytomegalovirus promoter; NLS, nuclear localization signal from SV40 T antigen; FLAG, FLAG-tag; P2A, self-cleaving 2 A peptide from porcine teschovirus-1; BlastR, blasticidin selection marker; WPRE, woodchuck hepatitis virus post-transcriptional regulatory element. b, Representative images of Western blotting used to measure the amount of the small Cas9 proteins in HEK293T cells transduced with the lentiviral vectors encoding the small Cas9s. The levels of the small Cas9 proteins were determined using a FLAG-tag; β-actin was used as a loading control. Unavoidably, images from two Western blotting experiments conducted in parallel are shown because two gels were required to accommodate the 18 evaluated Cas9 proteins. Unprocessed images are available in Source Data Extended Data Fig. 1. c, Relative levels of the small Cas9 proteins. Data represent mean ± SD. The numbers of replicates (n) are as follows: SaCas9, n = 12; SaCas9-KKH, n = 12; SaCas9*, n = 8; St1Cas9, n = 8; Nm1Cas9, n = 8; Nm2Cas9, n = 8; CjCas9, n = 8; SauriCas9, n = 5; SauriCas9-KKH, n = 5; sRGN3.1, n = 5; SlugCas9, n = 5; SlugCas9-HF, n = 5; Sa-SlugCas9, n = 5; SaCas9-HF, n = 5; efSaCas9, n = 5; eSaCas9, n = 5; SaCas9-KKH-HF, n = 5; enCjCas9, n = 5. Subsets of the Cas9 protein levels normalized to the β-actin protein levels without statistically significant differences (one-way analysis of variance followed by Bonferroni post-hoc test) are represented with the letters a, b, c, and d.

Source data

Extended Data Fig. 2 PAM compatibilities of small Cas9s in human cells.

a, Heatmaps showing the average indel frequencies in the target sequences with the indicated PAM sequences. Indel frequencies were measured four days after transduction of the paired libraries in SaCas9-, SaCas9-KKH-, eSaCas9-, and efSaCas9-expressing cells; in the cells expressing the other three small Cas9s, indel frequencies were measured seven days after transduction. Protospacers for which the highest indel frequencies were < 5% across candidate PAM sequences were excluded from the analyses. Fixed positions and nucleotides are indicated above each heatmap. For instance, to evaluate the preferences for the 3rd, 4th, and 5th nucleotides of the PAM for SaCas9, the 6th and 7th nucleotides of the PAM were fixed as TN. The numbers of analyzed protospacers (n) are as follows: SaCas9, n = 29; SaCas9-KKH, n = 29; eSaCas9, n = 29; efSaCas9, n = 29; SaCas9-HF, n = 29; SaCas9-KKH-HF, n = 29; St1Cas9, n = 27. b, Summary of the analyzed PAM compatibilities.

Source data

Extended Data Fig. 3 Activities of sRGN3.1 and SlugCas9 at diverse potential off-target sequences.

a,b, Comparison of the activities of sRGN3.1 and SlugCas9 at different potential off-target sequences. The average relative indel frequencies are indicated using the red plus symbols. The numbers of analyzed target sequences (n) are as follows: sRGN3.1, n = 3,715 (1-bp mismatch), 1,195 (2-bp mismatch), 588 (3-bp mismatch), 1,263 (1-nt deletion), and 1,131 (1-nt insertion); SlugCas9, n = 3,715 (1-bp mismatch), 1,195 (2-bp mismatch), 588 (3-bp mismatch), 1,263 (1-nt deletion), and 1,131 (1-nt insertion). Subsets of the small Cas9-induced relative indel frequencies without statistically significant differences (one-way analysis of variance followed by Bonferroni post-hoc test) are represented with the letters a, b, c, and d. c–h, Heatmaps showing the average specificities of sRGN3.1 (c,e,g) and SlugCas9 (d,f,h) when there were 1-bp mismatches (c,d), 1-nt RNA bulges (e,f), or 1-nt DNA bulges (g,h) between sgRNAs and target sequences with a primary or secondary PAM. The specificity was calculated as 1 − (indel frequency at mismatched target sequences divided by that at perfectly matched targets). c,d, To distinguish mismatch types, wobble, nonwobble, and transversion mismatches are shown in red, green, and blue, respectively. i-l, Box plots showing the effects of deleted (i,j) or inserted (k,l) bases on the activities of sRGN3.1 (i,k) and SlugCas9 (j,l). The numbers of analyzed target sequences (n) are as follows: sRGN3.1 RNA bulge, n = 271 (A), 325 (T), 329 (C), and 338 (G); SlugCas9 RNA bulge, n = 271 (A), 325 (T), 329 (C), and 338 (G); sRGN3.1 DNA bulge, n = 279 (C), 308 (G), 289 (T), and 255 (A); SlugCas9 DNA bulge, n = 308 (G), 279 (C), 289 (T), and 255 (A). Subsets of the small Cas9-induced relative indel frequencies without statistically significant differences (one-way analysis of variance followed by Bonferroni post-hoc test) are represented with the letters a and b. a,b,i–l, Indel frequencies were normalized to those at perfectly matched target sequences. Boxes represent the 25th, 50th, and 75th percentiles and whiskers show the 10th and 90th percentiles.

Source data

Extended Data Fig. 4 Development of DeepSmallCas9.

DeepSmallCas9 is a set of deep learning-based models that predict the activities of the small Cas9s at matched and mismatched target sequences. Additional features include the melting temperature (Tm), the number of G or C nucleotides (GC count), the minimum free energy (MFE), and the mismatch position and type between guide and protospacer sequences (mismatch profile). See also Methods.

Extended Data Fig. 5 Performance comparison of algorithms used to develop computational models that predict the activities of the small Cas9s.

Heatmaps showing correlations between the measured and computationally predicted indel frequencies. Average Pearson (top) and Spearman (bottom) correlation coefficients were calculated from five-fold cross-validation. The algorithms that showed the highest average correlation coefficients are shown in bold. XGBoost, extreme gradient boosting; Boosted RT, gradient-boosted regression trees; Lasso, L1-regularized linear regression; Ridge, L2-regularized linear regression; Elastic Net, L1 and L2-regularized linear regression; RF, random forest; SVM, support vector machine.

Source data

Extended Data Fig. 6 Comparison of the performance of DeepSmallCas9 with those of existing computational models predicting SaCas9 activity.

a,b, Evaluation of DeepSmallCas9 and ‘SaCas9 on-target rules’ (ref. ²⁹), an existing computational model predicting SaCas9 activities at matched target sequences, using the fraction of the hold-out test dataset including matched targets with NNGRRN PAM; n = 3,975. c,d, Evaluation of DeepSmallCas9 and ‘Model of SaCas9 specificity’ (ref. ³⁰), an existing computational model predicting SaCas9 activities at mismatched target sequences, using the fraction of the hold-out test dataset including mismatched targets with NNGRRT PAM; n = 217. Predicted activities at mismatched targets were normalized to those at perfectly matched targets. a,c, The Spearman correlation coefficient (Rho) and the Pearson correlation coefficient (r) are shown. Dashed line represents y = x. b,d, Data indicate correlation coefficient ± 95% confidence interval. Statistically significant differences between two correlations were determined by two-tailed Steiger’s z-test. The P-values from left to right are < 2.2 × 10^-16, < 2.2 × 10^-16, 3.1 × 10^-4, and 4.7 × 10^-8.

Source data

Extended Data Fig. 7 Evaluation and prediction of the activities of four small Cas9s in three different cell lines.

Cell lines expressing sRGN3.1, efSaCas9, SauriCas9-KKH, or Nm2Cas9 were transduced with lentiviral pairwise libraries of sgRNA-encoding sequences and target sequences. Four days after transduction, the indel frequencies were measured. In addition, the indel frequencies were predicted using DeepSmallCas9. a, Measured activities of four small Cas9s in three cell lines. Boxes represent the 25th, 50th, and 75th percentiles and whiskers show the 10th and 90th percentiles. Guide formats, PAM sequences, and the numbers of target sequences (n) analyzed for the small Cas9s are as follows: sRGN3.1, G/gN21, NNGGRT, and n = 197 (DLD-1), 197 (HCT116), and 4,809 (HEK293T); efSaCas9, G/gN21, NNGRRT, and n = 394, 394, and 9,514; SauriCas9-KKH, G/gN21, NNGGRT, and n = 197, 197, and 4,855; Nm2Cas9, G/gN22, NNNNCCA, and n = 95, 95, and 2,304. Subsets of the small Cas9-induced indel frequencies without statistically significant differences (one-way analysis of variance followed by Bonferroni post-hoc test) are represented with the letters a, b, c, and d. b, Correlations between predicted and measured activities of four small Cas9s. Results of four Cas9s in each cell line were combined to generate one dataset per cell line. The Spearman correlation coefficient (Rho) and the Pearson correlation coefficient (r) are shown. Red dashed line represents y = x. Guide formats, PAM sequences, and the numbers of target sequences (n) analyzed for the small Cas9s are as follows: sRGN3.1, G/gN21, NNGRRT, and n = 394 (DLD-1), 394 (HCT116), and 951 (HEK293T); efSaCas9, G/gN21, NNGRRT, and n = 394, 394, and 946; SauriCas9-KKH, G/gN21, NNGRRT, and n = 394, 394, and 988; Nm2Cas9, G/gN22, NNNNCCN, and n = 362, 362, and 962.

Source data

Extended Data Fig. 8 Computational prediction of preferred small Cas9s at targets with diverse PAM sequences.

a, Heatmap showing the most efficient Cas9 out of eight highly active small Cas9s, which include sRGN3.1, SlugCas9, SaCas9, SauriCas9, Sa-SlugCas9, SaCas9-KKH, eSaCas9, and efSaCas9, at target sequences with a given PAM sequence. To compare the activities of the small Cas9s at sites with 4,096 (= 4⁶) PAMs (all possible NNNNNN sequences for the 1st–6th nucleotides of the PAM), 204,800 target sequences were generated by combining 50 randomly designed protospacer sequences and 4,096 PAM sequences and used as input data for the prediction of the activities (i.e., the induced indel frequencies) using DeepSmallCas9. The color-coded squares represent the small Cas9 that is predicted to be the most efficient, in cases in which the average indel frequency is higher than 10%, at a given PAM sequence. When the predicted average indel frequencies of the most efficient small Cas9s at given target sequences are lower than 10%, the squares representing those PAM sequences are shown in white. The color-code for each Cas9 is shown in b. b, Pie chart showing the number of PAM sequences that could be most efficiently targeted with each Cas9 with an average activity higher than 10%. c, Bar graph showing the number of efficiently targetable PAM sequences out of 4,096 (= 4⁶) PAMs for each Cas9 with an average activity higher than 10%.

Source data

Extended Data Fig. 9 SlugCas9-, SaCas9-KKH-, SlugCas9-HF, Sa-SlugCas9-, or efSaCas9-directed targeting of dominant single-nucleotide variants with or without using DeepSmallCas9 to select sgRNAs.

Pie charts showing the fraction of the dominant single-nucleotide variants in protein-coding sequences in the ClinVar database (ref. ^83,94) that can be edited using SlugCas9 (a), SaCas9-KKH (b), SlugCas9-HF (c), Sa-SlugCas9 (d), or efSaCas9 (e) in an efficient and allele-specific manner (on-target activity higher than 10% and off-target activity lower than 2%). Mutations for which no designed sgRNAs met these criteria were classified as either inefficient or nonspecific and those for which no mutant allele-targeting sgRNAs could be designed due to the lack of a nearby PAM were classified as untargetable. (Left pie charts) The specified small Cas9s were chosen and the most appropriate sgRNAs were designed using DeepSmallCas9 such that both the activity at the mutant allele and the allele-specificity are high. (Right pie charts) The specified small Cas9s were chosen and sgRNAs were designed to target given mutations such that the mutations were located in regions in the target sequence with the following order of preference: i) the PAM, ii) the highly selective protospacer region (within 10 bp from the PAM), and iii) the remaining region in the protospacer. The activities at the mutant and corresponding wild-type alleles were predicted afterwards using DeepSmallCas9. (Box plots) The predicted activities of selected Cas9-sgRNA combinations at mutant and wild-type alleles for the indicated SNVs. Boxes represent the 25th, 50th, and 75th percentiles and whiskers show the 10th and 90th percentiles. The fold differences between the average activities at mutant and wild-type alleles are shown (e.g., 34x).

Source data

Extended Data Fig. 10 Allele-specific gene editing using the small Cas9s and SpCas9.

Of the 13,145 dominant SNVs in protein-coding sequences from the ClinVar database, pie charts show the numbers of dominant SNVs that could be most efficiently and allele-specifically targeted with the indicated Cas9s. (Top pie chart) DeepSmallCas9- and DeepSpCas9-v2-assisted selection of Cas9-sgRNA combinations allowed efficient (expected indel frequency at the mutant allele (on-target) > 10%) and allele-specific (expected indel frequency at the wild-type allele (off-target) < 2%) targeting of 10,925 of the 13,145 SNVs. (Bottom pie chart) Random selection of Cas9 and sgRNA pairs resulted in efficient and allele-specific targeting for only 678 SNVs. (Box plots) The predicted activities of selected Cas9-sgRNA combinations at mutant and wild-type alleles for the indicated SNVs. Boxes represent the 25th, 50th, and 75th percentiles and whiskers show the 10th and 90th percentiles. The fold differences between the average activities at mutant and wild-type alleles are shown (e.g., 37x).

Source data

Supplementary information

Supplementary Information

Supplementary Texts 1–5, Figs. 1–15, Tables 1–13, Notes 1–3 and references.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–13.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data and unprocessed blots.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 6

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Source Data Extended Data Fig. 10

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Seo, SY., Min, S., Lee, S. et al. Massively parallel evaluation and computational prediction of the activities and specificities of 17 small Cas9s. Nat Methods 20, 999–1009 (2023). https://doi.org/10.1038/s41592-023-01875-2

Download citation

Received: 04 July 2022
Accepted: 10 April 2023
Published: 15 May 2023
Issue Date: July 2023
DOI: https://doi.org/10.1038/s41592-023-01875-2

This article is cited by

Integrating machine learning and genome editing for crop improvement
- Long Chen
- Guanqing Liu
- Tao Zhang
aBIOTECH (2024)

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links