We developed an in vivo library-on-library methodology to simultaneously assess single guide RNA (sgRNA) activity across ∼1,400 genomic loci. Assaying across multiple human cell types and end-processing enzymes as well as two Cas9 orthologs, we unraveled underlying nucleotide sequence and epigenetic parameters. Our results and software (http://crispr.med.harvard.edu/sgRNAScorer) enable improved design of reagents, shed light on mechanisms of genome targeting, and provide a generalizable framework to study nucleic acid–nucleic acid interactions and biochemistry in high throughput.
Subscribe to Journal
Get full journal access for 1 year
only $9.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Mali, P. et al. Science 339, 823–826 (2013).
Cong, L. et al. Science 339, 819–823 (2013).
Doench, J.G. et al. Nat. Biotechnol. 32, 1262–1267 (2014).
Gagnon, J.A. et al. PLoS ONE 9, e98186 (2014).
Certo, M.T. et al. Nat. Methods 9, 973–975 (2012).
Kuscu, C., Arslan, S., Singh, R., Thorpe, J. & Adli, M. Nat. Biotechnol. 32, 677–683 (2014).
Wu, X. et al. Nat. Biotechnol. 32, 670–676 (2014).
ENCODE Project Consortium. Nature 489, 57–74 (2012).
Koch, C.M. et al. Genome Res. 17, 691–707 (2007).
Ran, F.A. et al. Cell 154, 1380–1389 (2013).
Mali, P. et al. Nat. Biotechnol. 31, 833–838 (2013).
Tsai, S.Q. et al. Nat. Biotechnol. 32, 569–576 (2014).
Fu, Y., Sander, J.D., Reyon, D., Cascio, V.M. & Joung, J.K. Nat. Biotechnol. 32, 279–284 (2014).
Fu, Y. et al. Nat. Biotechnol. 31, 822–826 (2013).
Guilinger, J.P., Thompson, D.B. & Liu, D.R. Nat. Biotechnol. 32, 577–582 (2014).
Pattanayak, V. et al. Nat. Biotechnol. 31, 839–843 (2013).
Tsai, S.Q. et al. Nat. Biotechnol. 33, 187–197 (2015).
Aach, J., Mali, P. & Church, G.M. Preprint at bioRxiv 10.1101/005074 (2014).
Futreal, P.A. et al. Nat. Rev. Cancer 4, 177–183 (2004).
Karolchik, D. et al. Nucleic Acids Res. 32, D493–D496 (2004).
Jiang, H. & Wong, W.H. Bioinformatics 24, 2395–2396 (2008).
Xu, Q., Schlabach, M.R., Hannon, G.J. & Elledge, S.J. Proc. Natl. Acad. Sci. USA 106, 2289–2294 (2009).
Esvelt, K.M. et al. Nat. Methods 10, 1116–1121 (2013).
Harris, D.R. et al. PLoS Biol. 2, e304 (2004).
Magoč, T. & Salzberg, S.L. Bioinformatics 27, 2957–2963 (2011).
Li, H. & Durbin, R. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Droettboom, M. et al. Matplotlib version 1.4.0. 10.5281/zenodo.11451 (2014).
Schölkopf, B., Burges, C.J.C. & Smola, A.J. Advances in Kernel Methods: Support Vector Learning (MIT Press, 1999).
Karolchik, D. et al. Nucleic Acids Res. 42, D764–D770 (2014).
Kent, W.J., Zweig, A.S., Barber, G., Hinrichs, A.S. & Karolchik, D. Bioinformatics 26, 2204–2207 (2010).
We acknowledge J. Aach for help with CasFinder and useful discussion, B. Turczyk for help with custom array oligonucleotide synthesis, S. Byrne (Harvard Medical School) for providing PGP1 induced pluripotent stem cells, and A. Chavez for useful discussion. This work was supported by US National Institutes of Health grant P50 HG005550. R.C. was supported by a Banting Fellowship from the Canadian Institutes of Health Research. P.M. is supported by University of California, San Diego, startup funds and a Burroughs Wellcome Career Award at the Scientific Interface.
The authors declare no competing financial interests.
Integrated supplementary information
Two separate transfections of the sgRNA library and Cas9Sp nuclease were performed BR1 and BR2. For each transfection, two libraries were prepared from each (TR1 and TR2). Thus, in total these two transfections have four samples. (A) Technical replicates comparison for each biological replicate (B) Scatter plots of each sample vs. each other sample across both biological replicates. Since there were two samples / transfection, a total of 4 comparisons are shown. Pearson correlation coefficients ranged from 0.819 to 0.853.
Supplementary Figure 2 Distribution of observed NHEJ mutation rates across different Cas9s and end-processing enzymes.
Total observed NHEJ rates in (A) Cas9Sp nuclease, (B) Cas9Sp nickase and (C) Cas9St1 nuclease transfected experiments. For each set of experiments, the respective sgRNA library was transfected alone, as well as four transfections each with a different end processing enzymes. Control samples represent cells with the integrated target library that were transfected with just Cas9. Of the enzymes we tested, TREX2 and Artemis have the strongest impact on increasing NHEJ-associated mutation rate.
Supplementary Figure 3 Distribution of observed NHEJ-mediated insertion and deletion rates across different Cas9s and end-processing enzymes.
Box plots of the insertion and deletion rates NHEJ events for Cas9Sp Nuclease (top), Cas9Sp Nickase (middle) and Cas9St1 nuclease (bottom). Consistent in both nuclease sets, TREX2 alters the NHEJ pattern by biasing towards deletions and away from insertions. We also notice that the insertion rate in the Cas9Sp Nuclease is modestly higher in the Tdt and ddrA samples.
(A) Histogram plot of the fold increase in NHEJ-associated mutagenesis across all of the sites due to the addition of end processing enzymes. Top plot illustrates the effect of TREX2 while the bottom plot shows the effect of Artemis. The observed impact appears quite variable, suggesting sequence context may also be important. (B) Box plot depicting the range of fold increase. In some instances, the effect is little to none, but in other cases, this enhancement could be as much 10-15 fold. (C) Bar plot showing the percentage of sites which showed no mutagenesis with the sgRNA library alone, but exhibited mutagenesis upon addition of TREX2 or Artemis. Over half of these sites showed mutation upon addition of TREX2 or Artemis.
Supplementary Figure 5 Impact of end-processing enzymes on the distribution of net deletion sizes observed.
With no end processing enzymes (dashed line), the distribution observed is greatly biased towards smaller deletions. Intriguingly, while the addition of either Artemis or TREX2 increases the rate of observed deletions, Artemis tends to favor smaller deletions (< 5 bp in size) and TREX2 tends to mediate larger deletions (> 5 bp in size). This is consistent across both Cas9 systems (Cas9Sp left, Cas9St1 right).
Three previously published sites (with 3 known off-target sites for each) were assessed for NHEJ-induced mutation in the absence and presence of the exonuclease TREX2. For each site, individual sgRNAs with Cas9 were co-transfected with or without TREX2 and cells were assessed for mutations 72hrs post-transfection. Notably, TREX2 does increase mutation rates across both on and off-target sites to varying degrees. This suggests that sgRNA site selection, with respect to minimal off-target sites is imperative, even more so in the presence of TREX2.
Supplementary Figure 7 Comparison of high- and low-activity St1 sgRNA sequences at both the integrated target site and endogenous loci in 293T cells.
(A) Position-by-position comparison of base distributions between the 82 high activity sgRNAs and 69 low activity sgRNAs. A window of 37 bp was used encompassing the 27 bp target site and 5 bp of flanking sequence on each side. For each position, p-values were calculated using a 2 x 4 Fisher’s exact test comparing the nucleotide distributions and were subsequently corrected for multiple hypothesis testing using the Benjamini-Hochberg method. Position 20 shows the largest peak with a strong preference for guanine. (B) Comparison of DNaseI hypersensitivity between the top and bottom quartile of sites with high and low activity at endogenous loci. Data was generated by ENCODE and were downloaded from the UCSC Genome Browser. A given region was defined as 225bp upstream and downstream of the target site for a total size of 427 bp. (C) Comparison of H3K4-Trimethylation between the same sites as (B).
Supplementary Figure 8 Validation of SVM predicted activity of sgRNAs for Cas9 S. pyogenes (Cas9Sp).
Five sgRNAs predicted to have high activity (blue) and low activity (orange) were transfected individually in seven different cell lines, representing a diverse set of tissue types and lineages. Due to intrinsic factors, efficiency of DNA repair, and transfection efficiencies, different cell lines have different levels of observed mutagenesis. By and large, those predicted to perform better did in fact perform better than those that were predicted to be poor.
Supplementary Figure 9 Validation of SVM-predicted activity of sgRNAs for Cas9 S. thermophilus (Cas9St1).
Five sgRNAs predicted to have high activity (blue) and low activity (orange) were transfected individually in seven different cell lines, representing a diverse set of tissue types and lineages. Due to intrinsic factors, efficiency of DNA repair, and transfection efficiencies, different cell lines have different levels of observed mutagenesis. Across all cell types, the predicted high activity guides for Cas9St1 consistently outperform those that are predicted to be low activity.
Supplementary Figure 10 Comparison of predicted SVM scores from our model with a previously published data set of 1,841 sgRNA sequences.
(A) Comparison of the gene percent rank vs. the SVM prediction score and (B) Comparison of the predicted sgRNA score from a previously published classifier vs. the SVM prediction score from our model. Spearman correlation coefficients were used to quantify the relationships. While there appears to be variability between models, there is a positive relationship between the two classifiers.
Supplementary Figure 11 Comparison of the mutation rates observed at the integrated target site with that observed at the endogenous locus.
Since chromatin effects are controlled for around the integrated site and multiple target sites were incorporated in each cell, as expected, the rates of mutagenesis are higher than that seen at the endogenous loci. This was consistent for both Cas9Sp and Cas9St1.
Supplementary Figure 12 Correlation analysis of observed mutation rates at lentiviral target sites and endogenous sites.
A strong correlation is observed when using all of the interrogated sites between the two experiments (r=0.42; p-value = 1.6e-53). However, given the endogenous sites have sites with a range of chromatin accessibility and the lentiviral sites are expected to be broadly accessible, it is likely that most accessible endogenous sites should bear closer resemblance to the lentiviral target sites. Indeed, this is the case. When taking progressively smaller subsets of the most accessible sites, the correlation dramatically improves. Using the top 100 most accessible sites, the observed correlation is 0.69 (p-value = 2.6e-15)
Supplementary Figure 13 Representation analysis of the four plasmid libraries that were used in this study.
Plasmid libraries were sequenced using the Illumina MiSeq. The Y-axis represents the percent of total reads and the X-axis represents all of the sequences in alphabetical order. The target libraries had fairly even representation across all sites, with the Cas9Sp showing slightly more uniform representation, while the sgRNA libraries appeared to have fairly uniform representation for both Cas9s.
Supplementary Figure 14 Design of oligonucleotides for the target sites and sgRNA sequences for both Cas9Sp and Cas9St1.
170 base pair oligonucleotides were synthesized using the Custom Array platform. Outer primer sequences were designed for selective amplification and 10 base barcodes were used as additional sequence to aid in mapping of target sites after Cas9-mediated mutagenesis.
Supplementary Figures 1–14 (PDF 2264 kb)
List of individual sgRNAs tested for predicted activity (XLSX 10 kb)
TREX2 off-target analysis (XLSX 9 kb)
List of high and low activity sgRNAs for Cas9Sp and Cas9St1 (XLSX 17 kb)
Human Cas9Sp sites identified by CASFinder with SVM ranking (hg19) (XLSX 35187 kb)
Mouse Cas9Sp sites identified by CASFinder with SVM ranking (hg19) (XLSX 38318 kb)
Human Cas9St1 sites identified by CASFinder with SVM ranking (hg19) (XLSX 9908 kb)
Mouse Cas9St1 sites identified by CASFinder with SVM ranking (hg19) (XLSX 10761 kb)
List of Cas9Sp sites (XLSX 58 kb)
List of Cas9St1 sites (XLSX 62 kb)
List of synthesized oligonucleotides for Cas9Sp (sgRNAs and target sites) (XLSX 165 kb)
List of synthesized oligonucleotides for Cas9St1 (sgRNAs and target sites) (XLSX 175 kb)
Total mutation rates observed in 293T with Cas9Sp nuclease (XLSX 181 kb)
Total mutation rates observed in 293T with Cas9Sp nickase (XLSX 150 kb)
Total mutation rates observed in 293T with Cas9St1 nuclease (XLSX 164 kb)
Total mutation rates observed in K562 with Cas9Sp nuclease (XLSX 79 kb)
Stand-alone software program to identify high activity sgRNAs (ZIP 28269 kb)
About this article
Cite this article
Chari, R., Mali, P., Moosburner, M. et al. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat Methods 12, 823–826 (2015). https://doi.org/10.1038/nmeth.3473
Nature Biotechnology (2021)
Nature Biotechnology (2021)
Nature Communications (2021)
Targeting androgen receptor (AR) with antiandrogen Enzalutamide increases prostate cancer cell invasion yet decreases bladder cancer cell invasion via differentially altering the AR/circRNA-ARC1/miR-125b-2-3p or miR-4736/PPARγ/MMP-9 signals
Cell Death & Differentiation (2021)