Abstract
We developed an in vivo library-on-library methodology to simultaneously assess single guide RNA (sgRNA) activity across ∼1,400 genomic loci. Assaying across multiple human cell types and end-processing enzymes as well as two Cas9 orthologs, we unraveled underlying nucleotide sequence and epigenetic parameters. Our results and software (http://crispr.med.harvard.edu/sgRNAScorer) enable improved design of reagents, shed light on mechanisms of genome targeting, and provide a generalizable framework to study nucleic acid–nucleic acid interactions and biochemistry in high throughput.
This is a preview of subscription content, access via your institution
Relevant articles
Open Access articles citing this article.
-
Integration of CRISPR/Cas9 with artificial intelligence for improved cancer therapeutics
Journal of Translational Medicine Open Access 18 November 2022
-
BoostMEC: predicting CRISPR-Cas9 cleavage efficiency through boosting models
BMC Bioinformatics Open Access 26 October 2022
-
Accounting for small variations in the tracrRNA sequence improves sgRNA activity predictions for CRISPR screening
Nature Communications Open Access 06 September 2022
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Get just this article for as long as you need it
$39.95
Prices may be subject to local taxes which are calculated during checkout



Accession codes
Primary accessions
Sequence Read Archive
Referenced accessions
Sequence Read Archive
References
Mali, P. et al. Science 339, 823–826 (2013).
Cong, L. et al. Science 339, 819–823 (2013).
Doench, J.G. et al. Nat. Biotechnol. 32, 1262–1267 (2014).
Gagnon, J.A. et al. PLoS ONE 9, e98186 (2014).
Certo, M.T. et al. Nat. Methods 9, 973–975 (2012).
Kuscu, C., Arslan, S., Singh, R., Thorpe, J. & Adli, M. Nat. Biotechnol. 32, 677–683 (2014).
Wu, X. et al. Nat. Biotechnol. 32, 670–676 (2014).
ENCODE Project Consortium. Nature 489, 57–74 (2012).
Koch, C.M. et al. Genome Res. 17, 691–707 (2007).
Ran, F.A. et al. Cell 154, 1380–1389 (2013).
Mali, P. et al. Nat. Biotechnol. 31, 833–838 (2013).
Tsai, S.Q. et al. Nat. Biotechnol. 32, 569–576 (2014).
Fu, Y., Sander, J.D., Reyon, D., Cascio, V.M. & Joung, J.K. Nat. Biotechnol. 32, 279–284 (2014).
Fu, Y. et al. Nat. Biotechnol. 31, 822–826 (2013).
Guilinger, J.P., Thompson, D.B. & Liu, D.R. Nat. Biotechnol. 32, 577–582 (2014).
Pattanayak, V. et al. Nat. Biotechnol. 31, 839–843 (2013).
Tsai, S.Q. et al. Nat. Biotechnol. 33, 187–197 (2015).
Aach, J., Mali, P. & Church, G.M. Preprint at bioRxiv 10.1101/005074 (2014).
Futreal, P.A. et al. Nat. Rev. Cancer 4, 177–183 (2004).
Karolchik, D. et al. Nucleic Acids Res. 32, D493–D496 (2004).
Jiang, H. & Wong, W.H. Bioinformatics 24, 2395–2396 (2008).
Xu, Q., Schlabach, M.R., Hannon, G.J. & Elledge, S.J. Proc. Natl. Acad. Sci. USA 106, 2289–2294 (2009).
Esvelt, K.M. et al. Nat. Methods 10, 1116–1121 (2013).
Harris, D.R. et al. PLoS Biol. 2, e304 (2004).
Magoč, T. & Salzberg, S.L. Bioinformatics 27, 2957–2963 (2011).
Li, H. & Durbin, R. Bioinformatics 25, 1754–1760 (2009).
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Droettboom, M. et al. Matplotlib version 1.4.0. 10.5281/zenodo.11451 (2014).
Schölkopf, B., Burges, C.J.C. & Smola, A.J. Advances in Kernel Methods: Support Vector Learning (MIT Press, 1999).
Karolchik, D. et al. Nucleic Acids Res. 42, D764–D770 (2014).
Kent, W.J., Zweig, A.S., Barber, G., Hinrichs, A.S. & Karolchik, D. Bioinformatics 26, 2204–2207 (2010).
Acknowledgements
We acknowledge J. Aach for help with CasFinder and useful discussion, B. Turczyk for help with custom array oligonucleotide synthesis, S. Byrne (Harvard Medical School) for providing PGP1 induced pluripotent stem cells, and A. Chavez for useful discussion. This work was supported by US National Institutes of Health grant P50 HG005550. R.C. was supported by a Banting Fellowship from the Canadian Institutes of Health Research. P.M. is supported by University of California, San Diego, startup funds and a Burroughs Wellcome Career Award at the Scientific Interface.
Author information
Authors and Affiliations
Contributions
R.C. and P.M. designed the study and performed the experiments. R.C. and P.M. wrote and edited the manuscript. All authors approved the final version of the manuscript. R.C. implemented custom Python software and performed data analysis. M.M. provided technical assistance. G.M.C. supervised the project.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Technical and biological replicates of the library-on-library experiments.
Two separate transfections of the sgRNA library and Cas9Sp nuclease were performed BR1 and BR2. For each transfection, two libraries were prepared from each (TR1 and TR2). Thus, in total these two transfections have four samples. (A) Technical replicates comparison for each biological replicate (B) Scatter plots of each sample vs. each other sample across both biological replicates. Since there were two samples / transfection, a total of 4 comparisons are shown. Pearson correlation coefficients ranged from 0.819 to 0.853.
Supplementary Figure 2 Distribution of observed NHEJ mutation rates across different Cas9s and end-processing enzymes.
Total observed NHEJ rates in (A) Cas9Sp nuclease, (B) Cas9Sp nickase and (C) Cas9St1 nuclease transfected experiments. For each set of experiments, the respective sgRNA library was transfected alone, as well as four transfections each with a different end processing enzymes. Control samples represent cells with the integrated target library that were transfected with just Cas9. Of the enzymes we tested, TREX2 and Artemis have the strongest impact on increasing NHEJ-associated mutation rate.
Supplementary Figure 3 Distribution of observed NHEJ-mediated insertion and deletion rates across different Cas9s and end-processing enzymes.
Box plots of the insertion and deletion rates NHEJ events for Cas9Sp Nuclease (top), Cas9Sp Nickase (middle) and Cas9St1 nuclease (bottom). Consistent in both nuclease sets, TREX2 alters the NHEJ pattern by biasing towards deletions and away from insertions. We also notice that the insertion rate in the Cas9Sp Nuclease is modestly higher in the Tdt and ddrA samples.
Supplementary Figure 4 Impact of TREX2 and Artemis across the target-site library.
(A) Histogram plot of the fold increase in NHEJ-associated mutagenesis across all of the sites due to the addition of end processing enzymes. Top plot illustrates the effect of TREX2 while the bottom plot shows the effect of Artemis. The observed impact appears quite variable, suggesting sequence context may also be important. (B) Box plot depicting the range of fold increase. In some instances, the effect is little to none, but in other cases, this enhancement could be as much 10-15 fold. (C) Bar plot showing the percentage of sites which showed no mutagenesis with the sgRNA library alone, but exhibited mutagenesis upon addition of TREX2 or Artemis. Over half of these sites showed mutation upon addition of TREX2 or Artemis.
Supplementary Figure 5 Impact of end-processing enzymes on the distribution of net deletion sizes observed.
With no end processing enzymes (dashed line), the distribution observed is greatly biased towards smaller deletions. Intriguingly, while the addition of either Artemis or TREX2 increases the rate of observed deletions, Artemis tends to favor smaller deletions (< 5 bp in size) and TREX2 tends to mediate larger deletions (> 5 bp in size). This is consistent across both Cas9 systems (Cas9Sp left, Cas9St1 right).
Supplementary Figure 6 Impact of TREX2 on both on-target and off-target mutagenesis rates.
Three previously published sites (with 3 known off-target sites for each) were assessed for NHEJ-induced mutation in the absence and presence of the exonuclease TREX2. For each site, individual sgRNAs with Cas9 were co-transfected with or without TREX2 and cells were assessed for mutations 72hrs post-transfection. Notably, TREX2 does increase mutation rates across both on and off-target sites to varying degrees. This suggests that sgRNA site selection, with respect to minimal off-target sites is imperative, even more so in the presence of TREX2.
Supplementary Figure 7 Comparison of high- and low-activity St1 sgRNA sequences at both the integrated target site and endogenous loci in 293T cells.
(A) Position-by-position comparison of base distributions between the 82 high activity sgRNAs and 69 low activity sgRNAs. A window of 37 bp was used encompassing the 27 bp target site and 5 bp of flanking sequence on each side. For each position, p-values were calculated using a 2 x 4 Fisher’s exact test comparing the nucleotide distributions and were subsequently corrected for multiple hypothesis testing using the Benjamini-Hochberg method. Position 20 shows the largest peak with a strong preference for guanine. (B) Comparison of DNaseI hypersensitivity between the top and bottom quartile of sites with high and low activity at endogenous loci. Data was generated by ENCODE and were downloaded from the UCSC Genome Browser. A given region was defined as 225bp upstream and downstream of the target site for a total size of 427 bp. (C) Comparison of H3K4-Trimethylation between the same sites as (B).
Supplementary Figure 8 Validation of SVM predicted activity of sgRNAs for Cas9 S. pyogenes (Cas9Sp).
Five sgRNAs predicted to have high activity (blue) and low activity (orange) were transfected individually in seven different cell lines, representing a diverse set of tissue types and lineages. Due to intrinsic factors, efficiency of DNA repair, and transfection efficiencies, different cell lines have different levels of observed mutagenesis. By and large, those predicted to perform better did in fact perform better than those that were predicted to be poor.
Supplementary Figure 9 Validation of SVM-predicted activity of sgRNAs for Cas9 S. thermophilus (Cas9St1).
Five sgRNAs predicted to have high activity (blue) and low activity (orange) were transfected individually in seven different cell lines, representing a diverse set of tissue types and lineages. Due to intrinsic factors, efficiency of DNA repair, and transfection efficiencies, different cell lines have different levels of observed mutagenesis. Across all cell types, the predicted high activity guides for Cas9St1 consistently outperform those that are predicted to be low activity.
Supplementary Figure 10 Comparison of predicted SVM scores from our model with a previously published data set of 1,841 sgRNA sequences.
(A) Comparison of the gene percent rank vs. the SVM prediction score and (B) Comparison of the predicted sgRNA score from a previously published classifier vs. the SVM prediction score from our model. Spearman correlation coefficients were used to quantify the relationships. While there appears to be variability between models, there is a positive relationship between the two classifiers.
Supplementary Figure 11 Comparison of the mutation rates observed at the integrated target site with that observed at the endogenous locus.
Since chromatin effects are controlled for around the integrated site and multiple target sites were incorporated in each cell, as expected, the rates of mutagenesis are higher than that seen at the endogenous loci. This was consistent for both Cas9Sp and Cas9St1.
Supplementary Figure 12 Correlation analysis of observed mutation rates at lentiviral target sites and endogenous sites.
A strong correlation is observed when using all of the interrogated sites between the two experiments (r=0.42; p-value = 1.6e-53). However, given the endogenous sites have sites with a range of chromatin accessibility and the lentiviral sites are expected to be broadly accessible, it is likely that most accessible endogenous sites should bear closer resemblance to the lentiviral target sites. Indeed, this is the case. When taking progressively smaller subsets of the most accessible sites, the correlation dramatically improves. Using the top 100 most accessible sites, the observed correlation is 0.69 (p-value = 2.6e-15)
Supplementary Figure 13 Representation analysis of the four plasmid libraries that were used in this study.
Plasmid libraries were sequenced using the Illumina MiSeq. The Y-axis represents the percent of total reads and the X-axis represents all of the sequences in alphabetical order. The target libraries had fairly even representation across all sites, with the Cas9Sp showing slightly more uniform representation, while the sgRNA libraries appeared to have fairly uniform representation for both Cas9s.
Supplementary Figure 14 Design of oligonucleotides for the target sites and sgRNA sequences for both Cas9Sp and Cas9St1.
170 base pair oligonucleotides were synthesized using the Custom Array platform. Outer primer sequences were designed for selective amplification and 10 base barcodes were used as additional sequence to aid in mapping of target sites after Cas9-mediated mutagenesis.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–14 (PDF 2264 kb)
Supplementary Table 1
List of individual sgRNAs tested for predicted activity (XLSX 10 kb)
Supplementary Table 2
TREX2 off-target analysis (XLSX 9 kb)
Supplementary Data 1
List of high and low activity sgRNAs for Cas9Sp and Cas9St1 (XLSX 17 kb)
Supplementary Data 2
Human Cas9Sp sites identified by CASFinder with SVM ranking (hg19) (XLSX 35187 kb)
Supplementary Data 3
Mouse Cas9Sp sites identified by CASFinder with SVM ranking (hg19) (XLSX 38318 kb)
Supplementary Data 4
Human Cas9St1 sites identified by CASFinder with SVM ranking (hg19) (XLSX 9908 kb)
Supplementary Data 5
Mouse Cas9St1 sites identified by CASFinder with SVM ranking (hg19) (XLSX 10761 kb)
Supplementary Data 6
List of Cas9Sp sites (XLSX 58 kb)
Supplementary Data 7
List of Cas9St1 sites (XLSX 62 kb)
Supplementary Data 8
List of synthesized oligonucleotides for Cas9Sp (sgRNAs and target sites) (XLSX 165 kb)
Supplementary Data 9
List of synthesized oligonucleotides for Cas9St1 (sgRNAs and target sites) (XLSX 175 kb)
Supplementary Data 10
Total mutation rates observed in 293T with Cas9Sp nuclease (XLSX 181 kb)
Supplementary Data 11
Total mutation rates observed in 293T with Cas9Sp nickase (XLSX 150 kb)
Supplementary Data 12
Total mutation rates observed in 293T with Cas9St1 nuclease (XLSX 164 kb)
Supplementary Data 13
Total mutation rates observed in K562 with Cas9Sp nuclease (XLSX 79 kb)
Supplementary Software
Stand-alone software program to identify high activity sgRNAs (ZIP 28269 kb)
Source data
Rights and permissions
About this article
Cite this article
Chari, R., Mali, P., Moosburner, M. et al. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat Methods 12, 823–826 (2015). https://doi.org/10.1038/nmeth.3473
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3473
This article is cited by
-
BoostMEC: predicting CRISPR-Cas9 cleavage efficiency through boosting models
BMC Bioinformatics (2022)
-
Integration of CRISPR/Cas9 with artificial intelligence for improved cancer therapeutics
Journal of Translational Medicine (2022)
-
Cognate microglia–T cell interactions shape the functional regulatory T cell pool in experimental autoimmune encephalomyelitis pathology
Nature Immunology (2022)
-
Accounting for small variations in the tracrRNA sequence improves sgRNA activity predictions for CRISPR screening
Nature Communications (2022)
-
CRISPR/Cas9 gRNA activity depends on free energy changes and on the target PAM context
Nature Communications (2022)