Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Predicting the efficiency of prime editing guide RNAs in human cells


Prime editing enables the introduction of virtually any small-sized genetic change without requiring donor DNA or double-strand breaks. However, evaluation of prime editing efficiency requires time-consuming experiments, and the factors that affect efficiency have not been extensively investigated. In this study, we performed high-throughput evaluation of prime editor 2 (PE2) activities in human cells using 54,836 pairs of prime editing guide RNAs (pegRNAs) and their target sequences. The resulting data sets allowed us to identify factors affecting PE2 efficiency and to develop three computational models to predict pegRNA efficiency. For a given target sequence, the computational models predict efficiencies of pegRNAs with different lengths of primer binding sites and reverse transcriptase templates for edits of various types and positions. Testing the accuracy of the predictions using test data sets that were not used for training, we found Spearman’s correlations between 0.47 and 0.81. Our computational models and information about factors affecting PE2 efficiency will facilitate practical application of prime editing.

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Rent or buy this article

Get just this article for as long as you need it


Prices may be subject to local taxes which are calculated during checkout

Fig. 1: High-throughput evaluation of PE2 activity using libraries of pegRNA–target sequence pairs.
Fig. 2: Factors affecting PE2 efficiency.
Fig. 3: Effects of editing type and position on PE2 efficiency.
Fig. 4: Development of computational models for predicting PE2 efficiencies.

Data availability

The deep sequencing data from this study have been submitted to the National Center for Biotechnology Information Sequence Read Archive under accession number PRJNA624815. The data sets used in this study are provided as Supplementary Tables 3, 4 and 5.

Code availability

Source codes for DeepPE and the custom Python script used for the prime editing efficiency calculations are provided as Supplementary Codes 1 and 2 and are also available at


  1. Anzalone, A. V. et al. Search-and-replace genome editing without double-strand breaks or donor DNA. Nature 576, 149–157 (2019).

    Article  CAS  Google Scholar 

  2. Lin, Q. et al. Prime genome editing in rice and wheat. Nat. Biotechnol. 38, 582–585 (2020).

    Article  CAS  Google Scholar 

  3. Chari, R., Mali, P., Moosburner, M. & Church, G. M. Unraveling CRISPR–Cas9 genome engineering parameters via a library-on-library approach. Nat. Methods 12, 823–826 (2015).

    Article  CAS  Google Scholar 

  4. Doench, J. G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).

    Article  CAS  Google Scholar 

  5. Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

    Article  CAS  Google Scholar 

  6. Kim, H. K. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat. Methods 14, 153–159 (2017).

    Article  CAS  Google Scholar 

  7. Allen, F. et al. Predicting the mutations generated by repair of Cas9-induced double-strand breaks. Nat. Biotechnol. 37, 64–72 (2018).

    Article  Google Scholar 

  8. Kim, H. K. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat. Biotechnol. 36, 239–241 (2018).

    Article  CAS  Google Scholar 

  9. Shen, M. W. et al. Predictable and precise template-free CRISPR editing of pathogenic variants. Nature 563, 646–651 (2018).

    Article  CAS  Google Scholar 

  10. Kim, H. K. et al. SpCas9 activity prediction by DeepSpCas9, a deep learning-based model with high generalization performance. Sci. Adv. 5, eaax9249 (2019).

    Article  CAS  Google Scholar 

  11. Kim, H. K. et al. High-throughput analysis of the activities of xCas9, SpCas9-NG and SpCas9 at matched and mismatched target sequences in human cells. Nat. Biomed. Eng. 4, 111–124 (2020).

    Article  CAS  Google Scholar 

  12. Song, M. et al. Sequence-specific prediction of the efficiencies of adenine and cytosine base editors. Nat. Biotechnol. (2020).

  13. Arbab, M. et al. Determinants of base editing outcomes from target library analysis and machine learning. Cell 182, 463–480 (2020).

    Article  CAS  Google Scholar 

  14. Kim, N. et al. Prediction of the sequence-specific cleavage activity of Cas9 variants. Nat. Biotechnol. (2020).

  15. Wang, D. et al. Optimized CRISPR guide RNA design for two high-fidelity Cas9 variants by deep learning. Nat. Commun. 10, 4284 (2019).

    Article  Google Scholar 

  16. Schlub, T. E., Smyth, R. P., Grimm, A. J., Mak, J. & Davenport, M. P. Accurately measuring recombination between closely related HIV-1 genomes. PLoS Comput. Biol. 6, e1000766 (2010).

    Article  Google Scholar 

  17. Sack, L. M., Davoli, T., Xu, Q., Li, M. Z. & Elledge, S. J. Sources of error in mammalian genetic screens. G3 (Bethesda) 6, 2781–2790 (2016).

    Article  CAS  Google Scholar 

  18. Feldman, D., Singh, A., Garrity, A. J. & Blainey, P. C. Lentiviral co-packaging mitigates the effects of intermolecular recombination and multiple integrations in pooled genetic screens. Preprint at (2018).

  19. Hill, A. J. et al. On the design of CRISPR-based single-cell molecular screens. Nat. Methods 15, 271–274 (2018).

    Article  CAS  Google Scholar 

  20. Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).

    Article  Google Scholar 

  21. Dang, Y. et al. Optimizing sgRNA structure to improve CRISPR–Cas9 knockout efficiency. Genome Biol. 16, 280 (2015).

    Article  Google Scholar 

  22. Nielsen, S., Yuzenkova, Y. & Zenkin, N. Mechanism of eukaryotic RNA polymerase III transcription termination. Science 340, 1577–1580 (2013).

    Article  CAS  Google Scholar 

  23. Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res. 42, 7473–7485 (2014).

    Article  CAS  Google Scholar 

  24. Chen, H., Choi, J. & Bailey, S. Cut site selection by the two nuclease domains of the Cas9 RNA-guided endonuclease. J. Biol. Chem. 289, 13284–13294 (2014).

    Article  CAS  Google Scholar 

  25. Zeng, Y. et al. The initiation, propagation and dynamics of CRISPR–SpyCas9 R-loop complex. Nucleic Acids Res. 46, 350–361 (2018).

    Article  CAS  Google Scholar 

  26. Kleinstiver, B. P. et al. Engineered CRISPR–Cas9 nucleases with altered PAM specificities. Nature 523, 481–485 (2015).

    Article  Google Scholar 

  27. Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).

    Article  CAS  Google Scholar 

  28. Anders, C., Bargsten, K. & Jinek, M. Structural plasticity of PAM recognition by engineered variants of the RNA-guided endonuclease Cas9. Mol. Cell. 61, 895–902 (2016).

    Article  CAS  Google Scholar 

  29. Hu, J. H. et al. Evolved Cas9 variants with broad PAM compatibility and high DNA specificity. Nature 556, 57–63 (2018).

    Article  CAS  Google Scholar 

  30. Nishimasu, H. et al. Engineered CRISPR–Cas9 nuclease with expanded targeting space. Science 361, 1259–1262 (2018).

    Article  CAS  Google Scholar 

  31. Miller, S. M. et al. Continuous evolution of SpCas9 variants compatible with non-G PAMs. Nat. Biotechnol. 38, 471–481 (2020).

    Article  CAS  Google Scholar 

  32. Walton, R. T., Christie, K. A., Whittaker, M. N. & Kleinstiver, B. P. Unconstrained genome targeting with near-PAMless engineered CRISPR–Cas9 variants. Science 368, 290–296 (2020).

    Article  CAS  Google Scholar 

  33. Du, D. et al. Genetic interaction mapping in mammalian cells using CRISPR interference. Nat. Methods 14, 577–580 (2017).

    Article  CAS  Google Scholar 

  34. Shen, J. P. et al. Combinatorial CRISPR–Cas9 screens for de novo mapping of genetic interactions. Nat. Methods 14, 573–576 (2017).

    Article  CAS  Google Scholar 

  35. Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).

    Article  CAS  Google Scholar 

  36. Chen, T. & Guestrin, C. XGBoost: a scalable tree boosting system. Preprint at (2016).

  37. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  38. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  CAS  Google Scholar 

  39. Abadi, M. et al. In Proc. of the 12th USENIX Conference on Operating Systems Design and Implementation 265–283 (USENIX Association, 2016).

Download references


We would like to thank D. Kim, S. Park and Y. Kim for assisting with the experiments. This work was supported, in part, by the National Research Foundation of Korea (grants 2017R1A2B3004198 (H.H.K.), 2017M3A9B4062403 (H.H.K.), 2020R1C1C1003284 (H.K.K) and 2018R1A5A2025079 (H.H.K)), the Brain Korea 21 Plus Project (Yonsei University College of Medicine) and the Korean Health Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (grants HI17C0676 (H.H.K.) and HI16C1012 (H.H.K.)).

Author information

Authors and Affiliations



G.Y. and H.K.K. performed the wet experiments, including high-throughput evaluation of PE2 efficiencies. S.M., S.L., S.Y. and H.K.K. developed DeepPE and the related web tools. J.P. substantially contributed to bioinformatics analyses and DeepPE development. H.K.K. and H.H.K. conceived of and designed the study. H.K.K., G.Y. and H.H.K. analyzed the data and wrote the manuscript.

Corresponding author

Correspondence to Hyongbum Henry Kim.

Ethics declarations

Competing interests

Yonsei University has filed a patent application based on this work, in which H.K.K., G.Y. and H.H.K. are listed as inventors.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Texts 1–3, Supplementary Figs. 1–23 and Supplementary Tables 1 and 2.

Reporting Summary

Supplementary Table 3

Data sets of PE2 efficiencies at endogenous sites

Supplementary Table 4

Data sets HT-training, HT-test, Type-training, Type-test, Position-training and Position-test

Supplementary Table 5

Data sets of PE2 efficiencies generated using HCT116 and MDA-MB-231 cells

Supplementary Table 6

Oligonucleotides used in this study

Supplementary Table 7

Exact P values for Figs. 2 and 3 and Supplementary Fig. 15

Supplementary Software 1

Codes relevant to the PE efficiency analysis

Supplementary Software 2

Codes relevant to DeepPE

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kim, H.K., Yu, G., Park, J. et al. Predicting the efficiency of prime editing guide RNAs in human cells. Nat Biotechnol 39, 198–206 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:

This article is cited by


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing