High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells


The applications of clustered regularly interspaced short palindromic repeats (CRISPR)-based genome editing can be limited by a lack of compatible protospacer adjacent motifs (PAMs), insufficient on-target activity and off-target effects. Here, we report an extensive comparison of the PAM-sequence compatibilities and the on-target and off-target activities of Cas9 from Streptococcus pyogenes (SpCas9) and the SpCas9 variants xCas9 and SpCas9-NG (which are known to have broader PAM compatibility than SpCas9) at 26,478 lentivirally integrated target sequences and 78 endogenous target sites in human cells. We found that xCas9 has the lowest tolerance for mismatched target sequences and that SpCas9-NG has the broadest PAM compatibility. We also show, on the basis of newly identified non-NGG PAM sequences, that SpCas9-NG and SpCas9 can edit six previously unedited endogenous sites associated with genetic diseases. Moreover, we provide deep-learning models that predict the activities of xCas9 and SpCas9-NG at the target sequences. The resulting deeper understanding of the activities of xCas9, SpCas9-NG and SpCas9 in human cells should facilitate their use.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: High-throughput evaluation of the xCas9, SpCas9-NG and SpCas9 activities.
Fig. 2: Effects of sgRNA expression formats on the activities of xCas9, SpCas9-NG and SpCas9.
Fig. 3: PAM sequence determination using fixed protospacers for xCas9, SpCas9-NG and SpCas9 in human cells.
Fig. 4: PAM sequence determination for xCas9, SpCas9-NG and SpCas9 using a wider range of protospacers and PAM sequences.
Fig. 5: Activities of xCas9, SpCas9-NG and SpCas9 at mismatched target sequences.
Fig. 6: Development and evaluation of the computational models DeepxCas9 and DeepSpCas9-NG, which predict the activity of xCas9 and SpCas9-NG, respectively.

Data availability

The authors declare that all data supporting the results in this study are available within the paper and its Supplementary Information. The deep-sequencing data from this study are available at the NCBI Sequence Read Archive under the accession number SRP158724.

Code availability

The source code for DeepxCas9, DeepSpCas9-NG and the custom Python scripts used for the indel-frequency calculations are available on Github at https://github.com/MyungjaeSong/Paired-Library and https://github.com/CRISPRJWCHOI/IndelSearcher.


  1. 1.

    Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819–823 (2013).

  2. 2.

    Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823–826 (2013).

  3. 3.

    Hwang, W. Y. et al. Efficient genome editing in zebrafish using a CRISPR–Cas system. Nat. Biotechnol. 31, 227–229 (2013).

  4. 4.

    Cho, S. W., Kim, S., Kim, J. M. & Kim, J. S. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230–232 (2013).

  5. 5.

    Jiang, W., Bikard, D., Cox, D., Zhang, F. & Marraffini, L. A. RNA-guided editing of bacterial genomes using CRISPR–Cas systems. Nat. Biotechnol. 31, 233–239 (2013).

  6. 6.

    Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).

  7. 7.

    Kim, H. & Kim, J. S. A guide to genome engineering with programmable nucleases. Nat. Rev. Genet. 15, 321–334 (2014).

  8. 8.

    Komor, A. C., Badran, A. H. & Liu, D. R. CRISPR-based technologies for the manipulation of eukaryotic genomes. Cell 169, 559 (2017).

  9. 9.

    Doudna, J. A. & Charpentier, E. Genome editing. The new frontier of genome engineering with CRISPR–Cas9. Science 346, 1258096 (2014).

  10. 10.

    Hsu, P. D., Lander, E. S. & Zhang, F. Development and applications of CRISPR–Cas9 for genome engineering. Cell 157, 1262–1278 (2014).

  11. 11.

    Zhang, Y. et al. Comparison of non-canonical PAMs for CRISPR/Cas9-mediated DNA cleavage in human cells. Sci. Rep. 4, 5405 (2014).

  12. 12.

    Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).

  13. 13.

    Zetsche, B. et al. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR–Cas system. Cell 163, 759–771 (2015).

  14. 14.

    Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).

  15. 15.

    Kim, E. et al. In vivo genome editing with a small Cas9 orthologue derived from Campylobacter jejuni. Nat. Commun. 8, 14500 (2017).

  16. 16.

    Hou, Z. et al. Efficient genome engineering in human pluripotent stem cells using Cas9 from Neisseria meningitidis. Proc. Natl Acad. Sci. USA 110, 15644–15649 (2013).

  17. 17.

    Muller, M. et al. Streptococcus thermophilus CRISPR–Cas9 systems enable specific editing of the human genome. Mol. Ther. 24, 636–644 (2016).

  18. 18.

    Kleinstiver, B. P. et al. Broadening the targeting range of Staphylococcus aureus CRISPR–Cas9 by modifying PAM recognition. Nat. Biotechnol. 33, 1293–1298 (2015).

  19. 19.

    Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).

  20. 20.

    Gao, L. et al. Engineered Cpf1 variants with altered PAM specificities. Nat. Biotechnol. 35, 789–792 (2017).

  21. 21.

    Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).

  22. 22.

    Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).

  23. 23.

    Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2017).

  24. 24.

    Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).

  25. 25.

    Koblan, L. W. et al. Improving cytidine and adenine base editors by expression optimization and ancestral reconstruction. Nat. Biotechnol. 36, 843–846 (2018).

  26. 26.

    Zafra, M. P. et al. Optimized base editors enable efficient editing in cells, organoids and mice. Nat. Biotechnol. 36, 888–893 (2018).

  27. 27.

    Schroder, A. R. et al. HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521–529 (2002).

  28. 28.

    Kim, D. & Kim, J. S. DIG-seq: a genome-wide CRISPR off-target profiling method using chromatin DNA. Genome Res. 28, 1894–1900 (2018).

  29. 29.

    Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with unparalleled generalization performance. Preprint at https://www.biorxiv.org/content/10.1101/636472v2 (2019).

  30. 30.

    Kim, S., Bae, T., Hwang, J. & Kim, J. S. Rescue of high-specificity Cas9 variants using sgRNAs with matched 5ʹ nucleotides. Genome Biol. 18, 218 (2017).

  31. 31.

    Zhang, D. et al. Perfectly matched 20-nucleotide guide RNA sequences enable robust genome editing using high-fidelity SpCas9 nucleases. Genome Biol. 18, 191 (2017).

  32. 32.

    Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A. & Liu, D. R. Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature 533, 420–424 (2016).

  33. 33.

    Nishida, K. et al. Targeted nucleotide editing using hybrid prokaryotic and vertebrate adaptive immune systems. Science 353, aaf8729 (2016).

  34. 34.

    Gaudelli, N. M. et al. Programmable base editing of A•T to G•C in genomic DNA without DNA cleavage. Nature 551, 464–471 (2017).

  35. 35.

    Tsai, S. Q. et al. GUIDE–seq enables genome-wide profiling of off-target cleavage by CRISPR–Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).

  36. 36.

    Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR–Cas nucleases in human cells. Nat. Biotechnol. 31, 822–826 (2013).

  37. 37.

    Pattanayak, V. et al. High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity. Nat. Biotechnol. 31, 839–843 (2013).

  38. 38.

    Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

  39. 39.

    Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37, 64–72 (2018).

  40. 40.

    Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).

  41. 41.

    Chen, W. et al. Massively parallel profiling and predictive modeling of the outcomes of CRISPR/Cas9-mediated double-strand break repair. Nucleic Acids Res. 47, 7989–8003 (2019).

  42. 42.

    Tycko, J. et al. Pairwise library screen systematically interrogates Staphylococcus aureus Cas9 specificity in human cells. Nat. Commun. 9, 2962 (2018).

  43. 43.

    Chen, H., Choi, J. & Bailey, S. Cut site selection by the two nuclease domains of the Cas9 RNA-guided endonuclease. J. Biol. Chem. 289, 13284–13294 (2014).

  44. 44.

    Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res. 42, 7473–7485 (2014).

  45. 45.

    Zeng, Y. et al. The initiation, propagation and dynamics of CRISPR-SpyCas9 R-loop complex. Nucleic Acids Res. 46, 350–361 (2018).

  46. 46.

    Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).

  47. 47.

    Slaymaker, I. M. et al. Rationally engineered Cas9 nucleases with improved specificity. Science 351, 84–88 (2016).

  48. 48.

    Chen, J. S. et al. Enhanced proofreading governs CRISPR–Cas9 targeting accuracy. Nature 550, 407–410 (2017).

  49. 49.

    Casini, A. et al. A highly specific SpCas9 variant is identified by in vivo screening in yeast. Nat. Biotechnol. 36, 265–271 (2018).

  50. 50.

    Lee, J. K. et al. Directed evolution of CRISPR–Cas9 to increase its specificity. Nat. Commun. 9, 3048 (2018).

  51. 51.

    Du, D. et al. Genetic interaction mapping in mammalian cells using CRISPR interference. Nat. Methods 14, 577–580 (2017).

  52. 52.

    Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).

  53. 53.

    Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).

  54. 54.

    Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).

  55. 55.

    Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).

  56. 56.

    Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

  57. 57.

    Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).

  58. 58.

    Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).

  59. 59.

    Szegedy, C. et al. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1–9 (2015).

  60. 60.

    Abadi, M. et al. In Proc. 12th USENIX Conference on Operating Systems Design and Implementation 265–283 (USENIX Association, 2016).

Download references


We thank O. Nureki and H. Nishimasu at the University of Tokyo for sharing a plasmid encoding SpCas9-NG. We thank S. Park and C. Lee at Yonsei University for their assistance with the data analysis. We also thank Y. Kim and S. Park at Yonsei University for their assistance with experiments. We thank S. Miller at Harvard University for critical reading of the manuscript. This work was supported in part by the National Research Foundation of Korea (grant nos 2017R1A2B3004198, 2017M3A9B4062403 and 2018R1A5A2025079 to H.H.K.), Brain Korea 21 Plus Project (Yonsei University College of Medicine), Institute for Basic Science (IBS; grant no. IBS-R026-D1), Yonsei University Future-leading Research Initiative of 2015 (grant no. RMS2 2015-22-0092; Challenge Grant), Korean Health Technology R&D Project, Ministry of Health and Welfare of the Republic of Korea (grant nos HI17C0676 and HI16C1012 to H.H.K.), US NIH (grant nos RM1 HG009490, R01 EB022376 and R35 GM118062 to D.R.L.) and HHMI (D.R.L.).

Author information

H.K.K. performed experiments to build high-throughput datasets of xCas9- and SpCas9-induced indel frequencies. Y.K. contributed to the generation of the high-throughput datasets. S.L., S.M. and S.Y. developed the framework and carried out the model training and computational validation. T.P.H. provided the computational identification of the endogenous disease-relevant sites. H.K.K. evaluated the activities of the Cas9 variants at the endogenous sites. J.P. and J.W.C. contributed significantly to the bioinformatics analyses. H.K.K. and H.H.K. conceived and designed the study and analysed the data. H.K.K., D.R.L. and H.H.K. wrote the manuscript with input from all authors.

Correspondence to Hyongbum Henry Kim.

Ethics declarations

Competing interests

The authors declare that Yonsei University has filed a patent based on this work, in which H.K.K. and H.H.K. are the co-inventors (patent no. PCT/KR2019/011166). D.R.L. is a consultant and co-founder of Beam Therapeutics, Prime Medicine, Editas Medicine and Pairwise Plants, which are companies that use genome editing.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Text, Supplementary Figs. and Supplementary Tables.

Reporting Summary

Supplementary Dataset 1

Design and indel frequencies from library A.

Supplementary Dataset 2

Design and indel frequencies from library B.

Supplementary Dataset 3

Datasets obtained from endogenous target sites.

Supplementary Dataset 4

Average indel frequencies in the target sequences, grouped by different potential PAM sequences for xCas9 and SpCas9 on the basis of fixed protospacers.

Supplementary Dataset 5

Average indel frequencies at target sequences, grouped by five-nucleotide PAM sequences.

Supplementary Dataset 6

Average indel frequencies at target sequences, grouped by four-nucleotide PAM sequences.

Supplementary Dataset 7

Primer sequences.

Supplementary Dataset 8

Model selection for DeepxCas9 and DeepSpCas9-NG.

Supplementary Dataset 9

P values and sample sizes for the data in Fig. 5.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kim, H.K., Lee, S., Kim, Y. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat Biomed Eng 4, 111–124 (2020). https://doi.org/10.1038/s41551-019-0505-1

Download citation