CRISPR from Prevotella and Francisella 1 (Cpf1) is an effector endonuclease of the class 2 CRISPR–Cas (clustered regularly interspaced short palindromic repeats–CRISPR-associated proteins) gene editing system. We developed a method for evaluating Cpf1 activity, based on target sequence composition in mammalian cells, in a high-throughput manner. A library of >11,000 target sequence and guide RNA pairs was delivered into human cells using lentiviral vectors. Subsequent delivery of Cpf1 into this cell library induced insertions and deletions (indels) at the integrated synthetic target sequences, which allowed en masse evaluation of Cpf1 activity by using deep sequencing. With this approach, we determined protospacer-adjacent motif sequences of two Cpf1 nucleases, one from Acidaminococcus sp. BV3L6 (hereafter referred to as AsCpf1) and the other from Lachnospiraceae bacterium ND2006 (hereafter referred to as LbCpf1). We also defined target-sequence-dependent activity profiles of AsCpf1, which enabled the development of a web tool that predicts the indel frequencies for given target sequences (http://big.hanyang.ac.kr/cindel). Both the Cpf1 characterization profile and the in vivo high-throughput evaluation method will greatly facilitate Cpf1-based genome editing.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Kim, H. & Kim, J.S. A guide to genome engineering with programmable nucleases. Nat. Rev. Genet. 15, 321–334 (2014).
Hsu, P.D., Lander, E.S. & Zhang, F. Development and applications of CRISPR–Cas9 for genome engineering. Cell 157, 1262–1278 (2014).
Zetsche, B. et al. Cpf1 is a single-RNA-guided endonuclease of a class 2 CRISPR–Cas system. Cell 163, 759–771 (2015).
Ramakrishna, S. et al. Surrogate reporter-based enrichment of cells containing RNA-guided Cas9 nuclease-induced mutations. Nat. Commun. 5, 3378 (2014).
Kim, H. et al. Surrogate reporters for enrichment of cells with nuclease-induced mutations. Nat. Methods 8, 941–943 (2011).
Gibson, D.G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).
Chari, R., Mali, P., Moosburner, M. & Church, G.M. Unraveling CRISPR–Cas9 genome-engineering parameters via a library-on-library approach. Nat. Methods 12, 823–826 (2015).
Schröder, A.R. et al. HIV-1 integration in the human genome favors active genes and local hotspots. Cell 110, 521–529 (2002).
Mitchell, R.S. et al. Retroviral DNA integration: ASLV, HIV, and MLV show distinct target site preferences. PLoS Biol. 2, E234 (2004).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Shalem, O. et al. Genome-scale CRISPR–Cas9 knockout screening in human cells. Science 343, 84–87 (2014).
Wang, T., Wei, J.J., Sabatini, D.M. & Lander, E.S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).
Yamano, T. et al. Crystal structure of Cpf1 in complex with guide RNA and target DNA. Cell 165, 949–962 (2016).
Doench, J.G. et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).
Kleinstiver, B.P. et al. Genome-wide specificities of CRISPR–Cas Cpf1 nucleases in human cells. Nat. Biotechnol. 34, 869–874 (2016).
Ramakrishna, S. et al. Gene disruption by cell-penetrating peptide-mediated delivery of Cas9 protein and guide RNA. Genome Res. 24, 1020–1027 (2014).
Tsai, S.Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR–Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).
Kim, D. et al. Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nat. Biotechnol. 34, 863–868 (2016).
Fu, Y., Sander, J.D., Reyon, D., Cascio, V.M. & Joung, J.K. Improving CRISPR–Cas nuclease specificity using truncated guide RNAs. Nat. Biotechnol. 32, 279–284 (2014).
Lee, H. et al. A high-throughput optomechanical retrieval method for sequence-verified clonal DNA from the NGS platform. Nat. Commun. 6, 6073 (2015).
Sanjana, N.E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783–784 (2014).
Lin, C.H., Chen, Y.C. & Pan, T.M. Quantification bias caused by plasmid DNA conformation in quantitative real-time PCR assay. PLoS One 6, e29101 (2011).
Bylund, L., Kytölä, S., Lui, W.O., Larsson, C. & Weber, G. Analysis of the cytogenetic stability of the human embryonal kidney cell line 293 by cytogenetic and STR profiling approaches. Cytogenet. Genome Res. 106, 28–32 (2004).
Doench, J.G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).
Wong, N., Liu, W. & Wang, X. WU–CRISPR: characteristics of functional guide RNAs for the CRISPR–Cas9 system. Genome Biol. 16, 218 (2015).
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Xu, H. et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 1147–1157 (2015).
Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. ROCR: visualizing classifier performance in R. Bioinformatics 21, 3940–3941 (2005).
The authors would like to thank D.-S. Jang (Medical Illustrator, Medical Research Support Section, Yonsei University College of Medicine) for his help with the illustrations, D. Kim for developing Python programs, M. K. Song (Biostatistics Collaboration Unit, Yonsei University College of Medicine) and Y. Kim for the assistance with statistical analysis, E.-S. Lee for proofreading, R. Gopalappa for assistance with sample preparation, Severance Biomedical Science Institute for bioinformatics analysis, and B. Kleinstiver and K. Joung (both at Massachusetts General Hospital) for sharing raw data for Supplementary Figure 9 and Figure 3d. This work was supported in part by the National Research Foundation of Korea (grants 2014R1A1A1A05006189 (H.K.), 2013M3A9B4076544 (H.K.), 2014M3C9A3063541 (J.-W.N.), and 2015R1A2A1A15052668 (H.K.)), the Korean Health Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (grants HI14C2019 (Medistar program) (H.K.), HI16C1012 (H.K.), and HI15C1578 (J.-W.N.)), and a faculty research grant of Yonsei University College of Medicine for 2015 (6-2015-0086 (H.K.)).
Yonsei University has filed a patent based on this work, in which H.K.K., M.S., and H.K. are co-inventors.
Integrated supplementary information
(a), Guide RNA and target sequence pairs were lentivirally delivered using the Lenti_gRNA-puro vector. The region following the U6 promoter in the Lenti_gRNA-puro vector was deleted using BsmBI and PCR-amplified oligonucleotides containing the guide and target sequence pairs were cloned into the Lenti_gRNA-puro vector using Gibson assembly. Psi, packaging signal; RRE, rev response element; WPRE, posttranscriptional regulatory element of woodchuck hepatitis virus; cPPT, central polypurine tract; DR, direct repeat of Cpf1; GS, guide sequence of guide RNA; T, polyT; B, barcode; TS, target sequence; HS, homolgy sequence; EF1α, elogation factor 1a promoter; PuroR, puromycin resistance gene. (b, c) AsCpf1 (b) and LbCpf1 (c) were expressed with the EFS promoter. Psi, packaging signal; RRE, rev response element; WPRE, posttranscriptional regulatory element of woodchuck hepatitis virus; cPPT, central polypurine tract; EFS, elongation factor 1α short promoter; BlastR, blasticidin resistance gene.
Supplementary Figure 2 The relationship between chromatin accessibility and the correlation coefficient of indel frequencies at endogenous and integrated sequences.
Target sites were divided into smaller subsets based on chromatin accessibility and squared Pearson correlation coefficients between indel frequencies at endogenous target sites and the corresponding integrated target sites were calculated in each subset. The correlation was higher in regions of high chromatin accessibility.
The relative copy numbers of each pair in the oligonucleotide pools, plasmid library, and cell library were calculated as described in the materials and methods from the deep sequencing data. (a) The relative copy numbers of each pair in the oligonucleotide pools, plasmid library, and cell library. (b) The pair copy numbers in the plasmid and cell libraries normalized to those of the oligonucleotide pool and plasmid library, respectively. (c) The relative copy numbers of each pair in the plasmid and cell library shown in the order of copy number in the oligonucleotide pool. For brevity, only 1% of regularly selected pairs are shown. (d) The relative copy number of each pair in the cell library shown in the order of copy number in the plasmid library. For brevity, only 1% of regularly selected pairs are shown. (e-g) The relationship between the pair copy numbers in the oligonucleotide pool and plasmid and cell libraries, evaluated by deep sequencing. Number of analyzed pairs (n) = 7,622.
Two different cell libraries (library A and library B) were generated from independent lentivirus production and transduction. These two libraries were transfected with plasmids encoding AsCpf1. Four days later, indel frequencies were analyzed in the cell libraries. Number of analyzed pairs (n) = 233.
Supplementary Figure 5 Correlation between indel frequencies after two different methods of AsCpf1 delivery.
A cell library was either transfected with AsCpf1-encoding plasmid or transduced with AsCpf1-encoding lentiviral vector. Indel frequencies in the cell libraries were analyzed four days after AsCpf1 plasmid transfection or five days after AsCpf1 lentivirus transduction. Number of analyzed pairs (n) = 156.
Supplementary Figure 6 Effects of multiplicity of infection (MOI) in the cell library on the AsCpf1-induced indel frequencies.
Three different cell libraries with different MOIs (0.4, 2, and 10) were generated by modifying the transducing amount of guide RNA-expressing lentivirus and AsCpf1 was lentivirally delivered into these cell libraries. Indel frequencies were measured 5 days after AsCpf1 delivery. Number of analyzed pairs (n) = 273 per group. (a) AsCpf1-induced indel frequencies in the cell libraries with different MOIs. Error bars represent s.e.m. **P < 0.01, ***P < 0.001, ANOVA followed by Tukey’s post hoc test. (b-d) Correlation between indel frequencies in cell libraries with different MOIs.
Indel frequencies for four different TTTN sequences tested as potential PAMs for AsCpf1 (a) and LbCpf1 (b). Error bars represent s.e.m. *P < 0.05, **P < 0.01, ***P < 0.001, ANOVA followed by Tukey’s post hoc test. Number of tested guide sequences (n) = 11, 13, 12, 12 for AsCpf1 and 14, 13, 15, 10 for LbCpf1.
Supplementary Figure 8 Comparison of indel frequency rankings between AsCpf1 and SpCas9 using the same and reverse target sequences.
The relationship between indel frequency rankings of SpCas9 target sequences and those of Cpf1 targeting the same (left) or reverse (right) sequences. The sequences that are marked with a red color, 5’-GGG-3’ and 5’-TTTA-3’, were used as the PAMs for SpCas9 and AsCpf1 target sequences, respectively. The known activity rank for the SpCas9 target sequence (ref. 14) is used for comparison.
Supplementary Figure 10 Interaction between the crRNA nucleobase in position 1 and Thr16 of the AsCpf1 WED domain.
The hydroxyl side chain of the Thr16 residue in the WED domain shows a polar interaction with the N2 of a guanine base (blue dotted line inside the red circle). Similar polar interactions with this Thr16 residue may be formed with side chains of other nucleobases such as the O2 of thymine and uracil. Due to the lack of a corresponding moiety in adenine, however, the positioning of a crRNA adenine ribonucleobase in this site could be unstable, consequently resulting in an unfavorable positioning of thymine at position 1 of the target DNA strand, which is next to the PAM motif. There is a complementary interaction between the crRNA ribonucleotide (guanine is shown here) and the target sequence nucleotide (cytosine at position 1 is shown here). The figure is drawn based on the published data of ref. 17 (PDB 5B43).
Supplementary Figure 11 Development of an algorithm to predict the AsCpf1-induced indel frequency for a given guide RNA-target sequence pair.
(a) The workflow for the development of a model to predict the AsCpf1-induced indel frequency using the measured indel frequencies for 1,251 pairs of guide RNA and target sequences. (b) Feature selection using Elastic Net for prediction model development. (Upper panel) The 57 non-zero coefficients with the fewest miss-classification errors were automatically selected from a certain shrinkage value range, indicated between the two dotted lines. The misclassification error (y-axis) is the error rate associated with classifying an entity into two classes and the shrinkage parameter adds information to a raw estimate to regularize the ill poised problem. (Lower panel) Coefficient values of the 57 selected features.
The site is available at http://big.hanyang.ac.kr/cindel. When target site sequences are entered, the web tool calculates AsCpf1 indel scores (CINDELs) that correlate with AsCpf1-induced indel frequencies for given pairs of target and guide RNA sequences.
Supplementary Figure 13 Effect of the number of mismatched nucleotides in different regions of the target sequence on AsCpf1-induced off-target indel frequencies.
The off-target indel frequencies are normalized to those at on-target sequences. The locations of multi-nucleotide mismatches are shown. Error bars represent s.e.m.
Supplementary Figure 14 Effect of multi-nucleotide mismatches in different regions of the target sequence on AsCpf1-induced off-target indel frequencies.
The off-target indel frequencies are normalized to those at on-target sequences. The locations of multi-nucleotide mismatches are shown on the x-axis. Error bars represent s.e.m.
Supplementary Figure 15 A comparison on cost for evaluating Cpf1 activity at target sequences between conventional individual evaluation and our high throughput approach.
Costs for materials (left) and labor (right) are compared. Cost rate is represented in USD. One labor unit indicates the amount of maximum workload by one skilled person in one hour. If there is a break of working time exceeded by one hour, such as incubation time, it is not counted as labor.
About this article
Cite this article
Kim, H., Song, M., Lee, J. et al. In vivo high-throughput profiling of CRISPR–Cpf1 activity. Nat Methods 14, 153–159 (2017). https://doi.org/10.1038/nmeth.4104
Identification of pathogenic variants in cancer genes using base editing screens with editing efficiency correction
Genome Biology (2021)
Nature Biotechnology (2021)
Nature Biotechnology (2021)
Predicting base editing outcomes with an attention-based deep learning algorithm trained on high-throughput target library screens
Nature Communications (2021)
Nature Biotechnology (2021)