Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

In vivo high-throughput profiling of CRISPR–Cpf1 activity


CRISPR from Prevotella and Francisella 1 (Cpf1) is an effector endonuclease of the class 2 CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) gene editing system. We developed a method for evaluating Cpf1 activity, based on target sequence composition in mammalian cells, in a high-throughput manner. A library of >11,000 target sequence and guide RNA pairs was delivered into human cells using lentiviral vectors. Subsequent delivery of Cpf1 into this cell library induced insertions and deletions (indels) at the integrated synthetic target sequences, which allowed en masse evaluation of Cpf1 activity by using deep sequencing. With this approach, we determined protospacer-adjacent motif sequences of two Cpf1 nucleases, one from Acidaminococcus sp. BV3L6 (hereafter referred to as AsCpf1) and the other from Lachnospiraceae bacterium ND2006 (hereafter referred to as LbCpf1). We also defined target-sequence-dependent activity profiles of AsCpf1, which enabled the development of a web tool that predicts the indel frequencies for given target sequences ( Both the Cpf1 characterization profile and the in vivo high-throughput evaluation method will greatly facilitate Cpf1-based genome editing.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Figure 1: Paired library preparation and high-throughput evaluation of Cpf1 activity.
Figure 2: PAM determination for AsCpf1 and LbCpf1 in mammalian cells.
Figure 3: Activity profiling of AsCpf1.
Figure 4: AsCpf1 activity for mismatched target sequences.
Figure 5: AsCpf1 activity with truncated guide RNAs for matched and mismatched target sequences.

Accession codes

Primary accessions

Sequence Read Archive

Referenced accessions

Protein Data Bank


  1. 1

    Kim, H. & Kim, J.S. A guide to genome engineering with programmable nucleases. Nat. Rev. Genet. 15, 321–334 (2014).

    CAS  Article  Google Scholar 

  2. 2

    Hsu, P.D., Lander, E.S. & Zhang, F. Development and applications of CRISPR–Cas9 for genome engineering. Cell 157, 1262–1278 (2014).

    CAS  Article  Google Scholar 

  3. 3

    Zetsche, B. et al. Cpf1 is a single-RNA-guided endonuclease of a class 2 CRISPR–Cas system. Cell 163, 759–771 (2015).

    CAS  Article  Google Scholar 

  4. 4

    Ramakrishna, S. et al. Surrogate reporter-based enrichment of cells containing RNA-guided Cas9 nuclease-induced mutations. Nat. Commun. 5, 3378 (2014).

    Article  Google Scholar 

  5. 5

    Kim, H. et al. Surrogate reporters for enrichment of cells with nuclease-induced mutations. Nat. Methods 8, 941–943 (2011).

    CAS  Article  Google Scholar 

  6. 6

    Gibson, D.G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).

    CAS  Article  Google Scholar 

  7. 7

    Chari, R., Mali, P., Moosburner, M. & Church, G.M. Unraveling CRISPR–Cas9 genome-engineering parameters via a library-on-library approach. Nat. Methods 12, 823–826 (2015).

    CAS  Article  Google Scholar 

  8. 8

    Schröder, A.R. et al. HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521–529 (2002).

    Article  Google Scholar 

  9. 9

    Mitchell, R.S. et al. Retroviral DNA integration: ASLV, HIV, and MLV show distinct target site preferences. PLoS Biol. 2, E234 (2004).

    Article  Google Scholar 

  10. 10

    ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  11. 11

    Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).

    CAS  Article  Google Scholar 

  12. 12

    Wang, T., Wei, J.J., Sabatini, D.M. & Lander, E.S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).

    CAS  Article  Google Scholar 

  13. 13

    Yamano, T. et al. Crystal structure of Cpf1 in complex with guide RNA and target DNA. Cell 165, 949–962 (2016).

    CAS  Article  Google Scholar 

  14. 14

    Doench, J.G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).

    CAS  Article  Google Scholar 

  15. 15

    Kleinstiver, B.P. et al. Genome-wide specificities of CRISPR–Cas Cpf1 nucleases in human cells. Nat. Biotechnol. 34, 869–874 (2016).

    CAS  Article  Google Scholar 

  16. 16

    Ramakrishna, S. et al. Gene disruption by cell-penetrating peptide-mediated delivery of Cas9 protein and guide RNA. Genome Res. 24, 1020–1027 (2014).

    CAS  Article  Google Scholar 

  17. 17

    Tsai, S.Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR–Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).

    CAS  Article  Google Scholar 

  18. 18

    Kim, D. et al. Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nat. Biotechnol. 34, 863–868 (2016).

    CAS  Article  Google Scholar 

  19. 19

    Fu, Y., Sander, J.D., Reyon, D., Cascio, V.M. & Joung, J.K. Improving CRISPR–Cas nuclease specificity using truncated guide RNAs. Nat. Biotechnol. 32, 279–284 (2014).

    CAS  Article  Google Scholar 

  20. 20

    Lee, H. et al. A high-throughput optomechanical retrieval method for sequence-verified clonal DNA from the NGS platform. Nat. Commun. 6, 6073 (2015).

    CAS  Article  Google Scholar 

  21. 21

    Sanjana, N.E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).

    CAS  Article  Google Scholar 

  22. 22

    Lin, C.H., Chen, Y.C. & Pan, T.M. Quantification bias caused by plasmid DNA conformation in quantitative real-time PCR assay. PLoS One 6, e29101 (2011).

    CAS  Article  Google Scholar 

  23. 23

    Bylund, L., Kytölä, S., Lui, W.O., Larsson, C. & Weber, G. Analysis of the cytogenetic stability of the human embryonal kidney cell line 293 by cytogenetic and STR profiling approaches. Cytogenet. Genome Res. 106, 28–32 (2004).

    CAS  Article  Google Scholar 

  24. 24

    Doench, J.G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

    CAS  Article  Google Scholar 

  25. 25

    Wong, N., Liu, W. & Wang, X. WU–CRISPR: characteristics of functional guide RNAs for the CRISPR–Cas9 system. Genome Biol. 16, 218 (2015).

    Article  Google Scholar 

  26. 26

    Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

    Article  Google Scholar 

  27. 27

    Xu, H. et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 1147–1157 (2015).

    CAS  Article  Google Scholar 

  28. 28

    Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).

    CAS  Article  Google Scholar 

Download references


The authors would like to thank D.-S. Jang (Medical Illustrator, Medical Research Support Section, Yonsei University College of Medicine) for his help with the illustrations, D. Kim for developing Python programs, M. K. Song (Biostatistics Collaboration Unit, Yonsei University College of Medicine) and Y. Kim for the assistance with statistical analysis, E.-S. Lee for proofreading, R. Gopalappa for assistance with sample preparation, Severance Biomedical Science Institute for bioinformatics analysis, and B. Kleinstiver and K. Joung (both at Massachusetts General Hospital) for sharing raw data for Supplementary Figure 9 and Figure 3d. This work was supported in part by the National Research Foundation of Korea (grants 2014R1A1A1A05006189 (H.K.), 2013M3A9B4076544 (H.K.), 2014M3C9A3063541 (J.-W.N.), and 2015R1A2A1A15052668 (H.K.)), the Korean Health Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (grants HI14C2019 (Medistar program) (H.K.), HI16C1012 (H.K.), and HI15C1578 (J.-W.N.)), and a faculty research grant of Yonsei University College of Medicine for 2015 (6-2015-0086 (H.K.)).

Author information




H.K.K. and M.S. performed most of the experiments (including library preparation, lentivirus generation, cell culture, and deep-sequencing experiments), analyzed the data, and contributed to the writing of the manuscript; J.L. generated lentivirus, determined MOI after transduction, and contributed to the writing of the manuscript; A.V.M., Y.-M.K., and J.-W.N. developed the computation model for AsCpf1 indel frequency prediction and contributed to the writing of the manuscript; S.J. significantly contributed to the performance of the experiments including cell culture, lentivirus generation, and deep-sequencing; J.W.C. calculated chromatin accessibility scores and performed related analyses; E.W. contributed to the analysis of target sequence nucleotide preferences based on the structure of AsCpf1 and to the writing of the manuscript; H.C.K. contributed to the writing of the manuscript; and H.K. conceived and supervised the study, analyzed the data, and wrote the manuscript.

Corresponding author

Correspondence to Hyongbum Kim.

Ethics declarations

Competing interests

Yonsei University has filed a patent based on this work, in which H.K.K., M.S., and H.K. are co-inventors.

Integrated supplementary information

Supplementary Figure 1 Plasmid vector map.

(a), Guide RNA and target sequence pairs were lentivirally delivered using the Lenti_gRNA-puro vector. The region following the U6 promoter in the Lenti_gRNA-puro vector was deleted using BsmBI and PCR-amplified oligonucleotides containing the guide and target sequence pairs were cloned into the Lenti_gRNA-puro vector using Gibson assembly. Psi, packaging signal; RRE, rev response element; WPRE, posttranscriptional regulatory element of woodchuck hepatitis virus; cPPT, central polypurine tract; DR, direct repeat of Cpf1; GS, guide sequence of guide RNA; T, polyT; B, barcode; TS, target sequence; HS, homolgy sequence; EF1α, elogation factor 1a promoter; PuroR, puromycin resistance gene. (b, c) AsCpf1 (b) and LbCpf1 (c) were expressed with the EFS promoter. Psi, packaging signal; RRE, rev response element; WPRE, posttranscriptional regulatory element of woodchuck hepatitis virus; cPPT, central polypurine tract; EFS, elongation factor 1α short promoter; BlastR, blasticidin resistance gene.

Supplementary Figure 2 The relationship between chromatin accessibility and the correlation coefficient of indel frequencies at endogenous and integrated sequences.

Target sites were divided into smaller subsets based on chromatin accessibility and squared Pearson correlation coefficients between indel frequencies at endogenous target sites and the corresponding integrated target sites were calculated in each subset. The correlation was higher in regions of high chromatin accessibility.

Source data

Supplementary Figure 3 Variability of relative copy numbers of each pair in the library.

The relative copy numbers of each pair in the oligonucleotide pools, plasmid library, and cell library were calculated as described in the materials and methods from the deep sequencing data. (a) The relative copy numbers of each pair in the oligonucleotide pools, plasmid library, and cell library. (b) The pair copy numbers in the plasmid and cell libraries normalized to those of the oligonucleotide pool and plasmid library, respectively. (c) The relative copy numbers of each pair in the plasmid and cell library shown in the order of copy number in the oligonucleotide pool. For brevity, only 1% of regularly selected pairs are shown. (d) The relative copy number of each pair in the cell library shown in the order of copy number in the plasmid library. For brevity, only 1% of regularly selected pairs are shown. (e-g) The relationship between the pair copy numbers in the oligonucleotide pool and plasmid and cell libraries, evaluated by deep sequencing. Number of analyzed pairs (n) = 7,622.

Source data

Supplementary Figure 4 Correlation between indel frequencies in experimental replicates.

Two different cell libraries (library A and library B) were generated from independent lentivirus production and transduction. These two libraries were transfected with plasmids encoding AsCpf1. Four days later, indel frequencies were analyzed in the cell libraries. Number of analyzed pairs (n) = 233.

Source data

Supplementary Figure 5 Correlation between indel frequencies after two different methods of AsCpf1 delivery.

A cell library was either transfected with AsCpf1-encoding plasmid or transduced with AsCpf1-encoding lentiviral vector. Indel frequencies in the cell libraries were analyzed four days after AsCpf1 plasmid transfection or five days after AsCpf1 lentivirus transduction. Number of analyzed pairs (n) = 156.

Source data

Supplementary Figure 6 Effects of multiplicity of infection (MOI) in the cell library on the AsCpf1-induced indel frequencies.

Three different cell libraries with different MOIs (0.4, 2, and 10) were generated by modifying the transducing amount of guide RNA-expressing lentivirus and AsCpf1 was lentivirally delivered into these cell libraries. Indel frequencies were measured 5 days after AsCpf1 delivery. Number of analyzed pairs (n) = 273 per group. (a) AsCpf1-induced indel frequencies in the cell libraries with different MOIs. Error bars represent s.e.m. **P < 0.01, ***P < 0.001, ANOVA followed by Tukey’s post hoc test. (b-d) Correlation between indel frequencies in cell libraries with different MOIs.

Source data

Supplementary Figure 7 Comparison of four different TTTN sequences as potential PAMs.

Indel frequencies for four different TTTN sequences tested as potential PAMs for AsCpf1 (a) and LbCpf1 (b). Error bars represent s.e.m. *P < 0.05, **P < 0.01, ***P < 0.001, ANOVA followed by Tukey’s post hoc test. Number of tested guide sequences (n) = 11, 13, 12, 12 for AsCpf1 and 14, 13, 15, 10 for LbCpf1.

Source data

Supplementary Figure 8 Comparison of indel frequency rankings between AsCpf1 and SpCas9 using the same and reverse target sequences.

The relationship between indel frequency rankings of SpCas9 target sequences and those of Cpf1 targeting the same (left) or reverse (right) sequences. The sequences that are marked with a red color, 5’-GGG-3’ and 5’-TTTA-3’, were used as the PAMs for SpCas9 and AsCpf1 target sequences, respectively. The known activity rank for the SpCas9 target sequence (ref. 14) is used for comparison.

Source data

Supplementary Figure 9 Comparison of indel frequencies induced by AsCpf1, LbCpf1, and SpCas9 using common protospacer sequences.

The relationship between indel frequencies at SpCas9 target sequences and those of AsCpf1 and LbCpf1 targeting the same protospacer sequences. The graphs are re-drawn using data published in reference 15. Number of tested guide sequences (n) = 22.

Source data

Supplementary Figure 10 Interaction between the crRNA nucleobase in position 1 and Thr16 of the AsCpf1 WED domain.

The hydroxyl side chain of the Thr16 residue in the WED domain shows a polar interaction with the N2 of a guanine base (blue dotted line inside the red circle). Similar polar interactions with this Thr16 residue may be formed with side chains of other nucleobases such as the O2 of thymine and uracil. Due to the lack of a corresponding moiety in adenine, however, the positioning of a crRNA adenine ribonucleobase in this site could be unstable, consequently resulting in an unfavorable positioning of thymine at position 1 of the target DNA strand, which is next to the PAM motif. There is a complementary interaction between the crRNA ribonucleotide (guanine is shown here) and the target sequence nucleotide (cytosine at position 1 is shown here). The figure is drawn based on the published data of ref. 17 (PDB 5B43).

Supplementary Figure 11 Development of an algorithm to predict the AsCpf1-induced indel frequency for a given guide RNA-target sequence pair.

(a) The workflow for the development of a model to predict the AsCpf1-induced indel frequency using the measured indel frequencies for 1,251 pairs of guide RNA and target sequences. (b) Feature selection using Elastic Net for prediction model development. (Upper panel) The 57 non-zero coefficients with the fewest miss-classification errors were automatically selected from a certain shrinkage value range, indicated between the two dotted lines. The misclassification error (y-axis) is the error rate associated with classifying an entity into two classes and the shrinkage parameter adds information to a raw estimate to regularize the ill poised problem. (Lower panel) Coefficient values of the 57 selected features.

Source data

Supplementary Figure 12 A web tool that calculates AsCpf1 indel frequency scores.

The site is available at When target site sequences are entered, the web tool calculates AsCpf1 indel scores (CINDELs) that correlate with AsCpf1-induced indel frequencies for given pairs of target and guide RNA sequences.

Supplementary Figure 13 Effect of the number of mismatched nucleotides in different regions of the target sequence on AsCpf1-induced off-target indel frequencies.

The off-target indel frequencies are normalized to those at on-target sequences. The locations of multi-nucleotide mismatches are shown. Error bars represent s.e.m.

Source data

Supplementary Figure 14 Effect of multi-nucleotide mismatches in different regions of the target sequence on AsCpf1-induced off-target indel frequencies.

The off-target indel frequencies are normalized to those at on-target sequences. The locations of multi-nucleotide mismatches are shown on the x-axis. Error bars represent s.e.m.

Source data

Supplementary Figure 15 A comparison on cost for evaluating Cpf1 activity at target sequences between conventional individual evaluation and our high throughput approach.

Costs for materials (left) and labor (right) are compared. Cost rate is represented in USD. One labor unit indicates the amount of maximum workload by one skilled person in one hour. If there is a break of working time exceeded by one hour, such as incubation time, it is not counted as labor.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–15, Supplementary Tables 1–4, Supplementary Results and Supplementary Notes 1–6. (PDF 2579 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kim, H., Song, M., Lee, J. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat Methods 14, 153–159 (2017).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing