Multiplexed pooled library screening with Cpf1

RNA interference and CRISPR/Cas9-based pooled library screens have revolutionized the field of functional genomics. However, currently available pooled library screens face a trade-off between library effectiveness and library complexity. We developed a multiplexed, high-throughput screening strategy based on an optimized AsCpf1 nuclease that minimizes library size without sacrificing gene targeting efficiency. Our AsCpf1-based multiplexed library performed similarly well compared to currently available CRISPR/Cas9 libraries, but with a single polycistronic crRNA clone targeting each gene. With this strategy, we constructed the smallest whole-genome knock-out library available, “Mini-human” for the human genome, which is one-fourth the size of the smallest CRISPR library currently available.

Similar to CRISPR/Cas9, CRISPR/Cpf1 (CRISPR from Prevotella and Francisella 1, or Cas12a) is a type II CRISPR system identified in the prokaryotic adaptive immune system that cleaves DNA target by small RNA guides. 10 It has been demonstrated that the Cpf1 orthologues LbCpf1 and AsCpf1 are highly specific, even more than SpCas9. 11,12 Moreover, Cpf1 is self-sufficient for multiplexed gene editing, unlike Cas9, which requires other Cas proteins and RNase III to process multiple guides in its native host. 13 Even though modified Cas9 multiplexing systems have been invented such as tRNA 14 , Cas6/Csy4 15 and ribozyme 16 aided multiplexing. Based on these properties, we reasoned a CRISPR/Cpf1 system may enable multiplexing of multiple guides targeting the same gene into a single lentiviral vector to generate high-efficiency pooled sgRNA libraries with substantially decreased complexity, therefore eliminate the trade-off paradox between library efficacy and library complexity.
To assess the performance of Cpf1 multiplexing, we generated our multiplexed AsCpf1 library targeting 342 "core-essential" genes and 345 "non-essential" genes, with three guides per gene 3,4 (2061 guides, 687 constructs). Fitness change of knockouts of these genes are highly consistent across multiple cell lines, therefore making them "gold-standard" controls. To compare the screen performance of the multiplexed AsCpf1 library and conventional monocistronic CRISPR knock out libraries, we generated another two benchmark CRISPR libraries targeting the same group of genes: SpCas9-based mono-cistronic library (2061 guides, 2061 constructs) and AsCpf1-based mono-cistronic library (2061 guides, 2061 constructs). The design rules for SpCas9 and AsCpf1 guides are highly similar, despite the nuclease-specific requirements, such as different protospacer adjacent motifs (PAMs). The AsCpf1-based monocistronic and multiplexed libraries share identical guide sequences; however, the multiplexed AsCpf1 library has only a single construct harboring all three guides (Fig 1a).
Benchmark screens employed K-562 cells separately infected with the pooled CRISPR libraries separately at a low multiplicity of infection (MOI). After puromycin selection, the corresponding CRISPR nuclease (AsCpf1 or SpCas9) was delivered by lentivirus transduction and blasticidin selection. Triplicate screens were conducted with cells grown for 4 weeks, and each replicate was sampled at intermediate time points to capture the dynamics of guide populations.
Screening performance was measured by separation of "core-essential" vs "non-essential" genes (Fig 1b). Interestingly, the common AsCpf1 variant used (human codon-optimized AsCpf1 with C-terminal nucleoplasmin bi-partite nucleus localization signal (NLS), herein AsCpf1-Nuc) 11 Fig 1a, 1c). When infected side-by-side with identical MOI, AsCpf1-3xMYC showed stronger expression and nuclear localization compared to AsCpf1-Nuc, suggesting that the amount of AsCpf1 in the nuclear fraction is critical for optimal gene editing efficiency.
( Supplementary Fig 1d) All biological replicates for each of the three library screens correlated well, indicating good reproducibility (Supplementary 2a, 2b, 2c). An essential gene-targeting construct is considered active if it is more depleted compared to the non-essential gene targeting construct since it should have an anti-proliferative effect if it is active. To determine the percentages of active constructs among the three different libraries, we chose a false positive rate of 5% based on the log2 transformed fold change for each time point. The active construct percentage curve of the SpCas9-based mono-cistronic library was relatively flat across all four time points, with a mean value of 49.0% ± 2.9% active constructs. For AsCpf1-based libraries, the active construct percentage curve plateaued 2 weeks after the screen initiated with mean values of 31.7% ± 0.9% and 77.4% ± 1.4% active constructs for the mono-cistronic and multiplexed libraries, respectively (Fig 1c). The different shapes of the active construct percentage curves for SpCas9-and AsCpf1-based screens indicate different population temporal dynamics and knockout efficiency for the different CRISPR nucleases. Our data strongly suggest that SpCas9 is more active in mammalian cell gene knockout experiments compared with AsCpf1. However, multiplexing different guides targeting the same gene significantly increased the likelihood of gene knockout with AsCpf1. At endpoint, the percentage of active constructs in the AsCpf1 multiplexed library was only slightly increased compared to other libraries when we loose the FPR stringency from 5% to 20% (increased by 4.4% ± 1.9% at 10% FPR and 9.9% ± 0.8% at 20%FPR) (Fig 1d), indicating relatively low noise in the AsCpf1 multiplexed library screens.
To call out significantly depleted genes, we used an adapted Bayesian Analysis of Gene Essentiality (BAGEL) algorithm to analyze construct-level data. Based on the fold change in sgRNA abundance after knockout of each gene in the essential and non-essential training sets BAGEL uses a Bayesian model selection approach to classify a Bayes Factor (BF), which is the log2 likelihood of each gene belonging to either the essential gene distribution or non-essential gene distribution. Because BAGEL is designed for whole-genome CRISPR screens, we designed and utilized a version of BAGEL optimized for small library screens, "Low Fat BAGEL".
Low Fat Bagel generates BFs on a construct-level basis that is summed across guides to obtain a gene-level BF. For the AsCpf1-based multiplexed library, each gene has only one corresponding construct; therefore, its construct-level BF corresponds directly to its gene-level BF. To benchmark screen performance across the three libraries, precision-recall curves were plotted based on BFs. The precision-recall curves clearly showed that the SpCas9-based monocistronic screen (construct-wise area under the curve (AUC) 0.78 ± 0.01, gene-wise AUC 0.89 ± 0.01) outperformed the AsCpf1-based mono-cistronic screen (construct-wise AUC 0.70 ± 0.01, gene-wise AUC 0.82 ± 0.01) at both the construct (Fig 1e) and the gene level (Fig 1f).
However, the AsCpf1-based multiplexed screen (construct-wise and gene-wise AUC 0.89 ± 0.00) performed similarly to the SpCas9 monocistronic library at the gene level, and it yielded a much stronger performance at the construct level, primarily due to lower active construct percentage in the SpCas9 screen, indicating a synergistic effect when guides were multiplexed together for the screening application (Fig 1e,1f). This is consistent with the reported synergistic effects observed in dCpf1-based activator-induced gene expression experiments 18 as well as with the increase in active constructs we observed in the context of multiplexed versus monocistronic AsCpf1 libraries.
To compare the rate of separation between essential and non-essential genes among the three library screens, we calculated the ratio of the modified area under the precision-recall curve (mAUC) of any given time point divided by the mAUC at endpoint (ratio of mAUC, rmAUC) (Fig   1g). As the area under the precision-recall curve (AUC) for this library would be 0.498 when there is no separation between essential and non-essential genes, the mAUC for any given time point was set to be its AUC minus 0.498. In accordance with Fig 1c, the results suggest that separation between essential and non-essential genes in the AsCpf1-based mono-cistronic screens was much slower compared to SpCas9-based mono-cistronic and AsCpf1-based multiplexed screens. This might be the result of the relatively slower cleavage rate of AsCpf1 compared with SpCas9 19 , as we also saw a slightly slower separation between essential genes and non-essential genes in the AsCpf1 based multiplexed screen compared with SpCas9.
CRISPR/SpCas9 guide design has been optimized using empirical data from hundreds of screens 4,6,8,9 , but previous AsCpf1 guide optimization algorithms are largely based on a small number of surrogate reporters experiments 17 . It is known that lentivirus has a preference on integration site 20 and the chromosomal environment is one factor that influence CRISPR nuclease activity 21 . Thus, the gene editing process on surrogate reporters might not fully represent the true biological effect on endogenous loci editing. Our screen provided the first large-scale action-in-situ dataset to enable prediction of AsCpf1 guide preference based on functional screen data of endogenous loci. We used fold change information of essential genetargeting guides in the mono-cistronic AsCpf1 library to calculate sequence preference, as effective guides should drop out more efficiently than ineffective guides. Even though all of the 342 genes are essential genes, the severity of their knockout phenotypes may differ. To avoid any sequence biases introduced by gene-specific effects, within the three guides in each gene, we classified the most depleted guide as the "high-performing" guide, and the least depleted guide as the "low-performing" guide. Using a scoring scheme similar to that of Hart et al. 4 , the frequency of each nucleotide at each position of the 20-mer protospacer was calculated individually for the high-performing and low-performing guides. At each position, the nucleotide frequency of low-performing guide is subtracted from the high-performing guide to produce a table with subtracted frequencies for each nucleotide. This process was repeated across 100 bootstraps, and an aggregate average score table for each protospacer position was obtained.
In agreement with the previous report 17 , thymine (T) was strongly disfavored in position 1 in the protospacer, while guanine (G) and cytosine (C) were favored. We also identified a general trend of G disfavor from position 16 to position 20 in the protospacers. In addition, we found T was favored in positions 3,16, and 18, while C was favored in position 7 but disfavored in position 3 (Figure 2a). The score table obtained served as a metric to predict guide activity in terms of fold change: the sequence score for each guide is a sum of the nucleotide score at each position. Therefore, a guide with a zero sequence score indicates no similarity between either effective nor non-effective guides. We validated our prediction algorithm on the "medianperforming" guides not used to develop the scoring algorithm -that is, the guide for each gene that was neither the highest nor lowest performing guide. Each median-performing guide was assigned a sequence score, and was classified with a prediction of "high-performing" or "lowperforming" based on a guide score > 0 or ≤ 0, respectively. We then evaluated if the sequence score and predicted performance classification of each guide were indicative of fold change, and found a significant correlation between the guide score and guide performance (Spearman's rho = -0.40, 95% confidence interval: (-0.34, -0.45), p=4.5 x 10 -40 ) (Figure 2b).
Based on our multiplexed AsCpf1 library strategy, we designed the smallest available CRISPR library targeting the entire human protein-coding genome, "Mini-human". The guides for Minihuman were optimized based on activity scores derived from our screen dataset and further filtered for potential off-target effects. Because a previous analysis of published screens determined that four to six gRNAs per gene yields robust results when computational approaches to design sequence-optimized guides are employed 4,6,9 , each construct in Minihuman contains up to 4 gene-targeting guides: 16393 gene-targeting constructs with 4 optimized guides, 584 gene-targeting constructs containing 3 optimized guides, and 55 nontargeting guide arrays as negative controls. This library is approximately one-fourth the size of the smallest currently available genome-wide CRISPR library and will be publicly available through Addgene.
In conclusion, with an AsCpf1 CRISPR-associated nuclease optimized for improved protein expression and nuclear localization, we have addressed the trade-off issue between library efficacy and library complexity for pooled libraries by multiplexing multiple guides in a single construct. We demonstrated that the multiplexed AsCpf1-3xMYC performed similarly well, if not better than, a conventional SpCas9-based mono-cistronic library, but with a significantly reduced library size. The new Mini-human library provides an invaluable tool for demanding functional genomics applications, especially in vivo pooled library screenings where library size is a concern. Acknowledgements: We thank Draetta lab members for helpful discussions. The human codon-optimized AsCpf1 Genome-wide "mini-human" multiplexed library design: Guides were identified by adapting the Cas9 library design algorithm developed by Hart et.al 4 . Candidate guide sequences were obtained from exonic regions using hg38 and are filtered for homopolymers, and BsmBI restriction sites. Using Bowtie, we aligned the filtered candidate guides across the genome, allowing for one mismatch outside the "TTTV" PAM sequence. Guides with off-target matches in intronic or exonic regions were excluded, and the remaining guides were ranked based on the number of off-target matches in intergenic regions. A sequence score was assigned to each guide based on the score table presented in Figure 2a.