Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach

Abstract

We developed an in vivo library-on-library methodology to simultaneously assess single guide RNA (sgRNA) activity across 1,400 genomic loci. Assaying across multiple human cell types and end-processing enzymes as well as two Cas9 orthologs, we unraveled underlying nucleotide sequence and epigenetic parameters. Our results and software (http://crispr.med.harvard.edu/sgRNAScorer) enable improved design of reagents, shed light on mechanisms of genome targeting, and provide a generalizable framework to study nucleic acid–nucleic acid interactions and biochemistry in high throughput.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Schematic of the library-on-library approach employed in our study.
Figure 2: Versatility of the library-on-library approach.
Figure 3: Analysis of parameters modulating Cas9-sgRNA gene targeting.

Accession codes

Primary accessions

Sequence Read Archive

Referenced accessions

Sequence Read Archive

References

  1. 1

    Mali, P. et al. Science 339, 823–826 (2013).

    CAS  Article  Google Scholar 

  2. 2

    Cong, L. et al. Science 339, 819–823 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  3. 3

    Doench, J.G. et al. Nat. Biotechnol. 32, 1262–1267 (2014).

    CAS  Article  Google Scholar 

  4. 4

    Gagnon, J.A. et al. PLoS ONE 9, e98186 (2014).

    Article  Google Scholar 

  5. 5

    Certo, M.T. et al. Nat. Methods 9, 973–975 (2012).

    CAS  Article  Google Scholar 

  6. 6

    Kuscu, C., Arslan, S., Singh, R., Thorpe, J. & Adli, M. Nat. Biotechnol. 32, 677–683 (2014).

    CAS  Article  Google Scholar 

  7. 7

    Wu, X. et al. Nat. Biotechnol. 32, 670–676 (2014).

    CAS  Article  Google Scholar 

  8. 8

    ENCODE Project Consortium. Nature 489, 57–74 (2012).

  9. 9

    Koch, C.M. et al. Genome Res. 17, 691–707 (2007).

    CAS  Article  Google Scholar 

  10. 10

    Ran, F.A. et al. Cell 154, 1380–1389 (2013).

    CAS  Article  Google Scholar 

  11. 11

    Mali, P. et al. Nat. Biotechnol. 31, 833–838 (2013).

    CAS  Article  Google Scholar 

  12. 12

    Tsai, S.Q. et al. Nat. Biotechnol. 32, 569–576 (2014).

    CAS  Article  Google Scholar 

  13. 13

    Fu, Y., Sander, J.D., Reyon, D., Cascio, V.M. & Joung, J.K. Nat. Biotechnol. 32, 279–284 (2014).

    CAS  Article  Google Scholar 

  14. 14

    Fu, Y. et al. Nat. Biotechnol. 31, 822–826 (2013).

    CAS  Article  Google Scholar 

  15. 15

    Guilinger, J.P., Thompson, D.B. & Liu, D.R. Nat. Biotechnol. 32, 577–582 (2014).

    CAS  Article  Google Scholar 

  16. 16

    Pattanayak, V. et al. Nat. Biotechnol. 31, 839–843 (2013).

    CAS  Article  Google Scholar 

  17. 17

    Tsai, S.Q. et al. Nat. Biotechnol. 33, 187–197 (2015).

    CAS  Article  Google Scholar 

  18. 18

    Aach, J., Mali, P. & Church, G.M. Preprint at bioRxiv 10.1101/005074 (2014).

  19. 19

    Futreal, P.A. et al. Nat. Rev. Cancer 4, 177–183 (2004).

    CAS  Article  Google Scholar 

  20. 20

    Karolchik, D. et al. Nucleic Acids Res. 32, D493–D496 (2004).

    CAS  Article  Google Scholar 

  21. 21

    Jiang, H. & Wong, W.H. Bioinformatics 24, 2395–2396 (2008).

    CAS  Article  Google Scholar 

  22. 22

    Xu, Q., Schlabach, M.R., Hannon, G.J. & Elledge, S.J. Proc. Natl. Acad. Sci. USA 106, 2289–2294 (2009).

    CAS  Article  Google Scholar 

  23. 23

    Esvelt, K.M. et al. Nat. Methods 10, 1116–1121 (2013).

    CAS  Article  Google Scholar 

  24. 24

    Harris, D.R. et al. PLoS Biol. 2, e304 (2004).

  25. 25

    Magoč, T. & Salzberg, S.L. Bioinformatics 27, 2957–2963 (2011).

    Article  Google Scholar 

  26. 26

    Li, H. & Durbin, R. Bioinformatics 25, 1754–1760 (2009).

    CAS  Article  Google Scholar 

  27. 27

    Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  28. 28

    Droettboom, M. et al. Matplotlib version 1.4.0. 10.5281/zenodo.11451 (2014).

  29. 29

    Schölkopf, B., Burges, C.J.C. & Smola, A.J. Advances in Kernel Methods: Support Vector Learning (MIT Press, 1999).

  30. 30

    Karolchik, D. et al. Nucleic Acids Res. 42, D764–D770 (2014).

    CAS  Article  Google Scholar 

  31. 31

    Kent, W.J., Zweig, A.S., Barber, G., Hinrichs, A.S. & Karolchik, D. Bioinformatics 26, 2204–2207 (2010).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

We acknowledge J. Aach for help with CasFinder and useful discussion, B. Turczyk for help with custom array oligonucleotide synthesis, S. Byrne (Harvard Medical School) for providing PGP1 induced pluripotent stem cells, and A. Chavez for useful discussion. This work was supported by US National Institutes of Health grant P50 HG005550. R.C. was supported by a Banting Fellowship from the Canadian Institutes of Health Research. P.M. is supported by University of California, San Diego, startup funds and a Burroughs Wellcome Career Award at the Scientific Interface.

Author information

Affiliations

Authors

Contributions

R.C. and P.M. designed the study and performed the experiments. R.C. and P.M. wrote and edited the manuscript. All authors approved the final version of the manuscript. R.C. implemented custom Python software and performed data analysis. M.M. provided technical assistance. G.M.C. supervised the project.

Corresponding authors

Correspondence to Prashant Mali or George M Church.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Technical and biological replicates of the library-on-library experiments.

Two separate transfections of the sgRNA library and Cas9Sp nuclease were performed BR1 and BR2. For each transfection, two libraries were prepared from each (TR1 and TR2). Thus, in total these two transfections have four samples. (A) Technical replicates comparison for each biological replicate (B) Scatter plots of each sample vs. each other sample across both biological replicates. Since there were two samples / transfection, a total of 4 comparisons are shown. Pearson correlation coefficients ranged from 0.819 to 0.853.

Source data

Supplementary Figure 2 Distribution of observed NHEJ mutation rates across different Cas9s and end-processing enzymes.

Total observed NHEJ rates in (A) Cas9Sp nuclease, (B) Cas9Sp nickase and (C) Cas9St1 nuclease transfected experiments. For each set of experiments, the respective sgRNA library was transfected alone, as well as four transfections each with a different end processing enzymes. Control samples represent cells with the integrated target library that were transfected with just Cas9. Of the enzymes we tested, TREX2 and Artemis have the strongest impact on increasing NHEJ-associated mutation rate.

Supplementary Figure 3 Distribution of observed NHEJ-mediated insertion and deletion rates across different Cas9s and end-processing enzymes.

Box plots of the insertion and deletion rates NHEJ events for Cas9Sp Nuclease (top), Cas9Sp Nickase (middle) and Cas9St1 nuclease (bottom). Consistent in both nuclease sets, TREX2 alters the NHEJ pattern by biasing towards deletions and away from insertions. We also notice that the insertion rate in the Cas9Sp Nuclease is modestly higher in the Tdt and ddrA samples.

Supplementary Figure 4 Impact of TREX2 and Artemis across the target-site library.

(A) Histogram plot of the fold increase in NHEJ-associated mutagenesis across all of the sites due to the addition of end processing enzymes. Top plot illustrates the effect of TREX2 while the bottom plot shows the effect of Artemis. The observed impact appears quite variable, suggesting sequence context may also be important. (B) Box plot depicting the range of fold increase. In some instances, the effect is little to none, but in other cases, this enhancement could be as much 10-15 fold. (C) Bar plot showing the percentage of sites which showed no mutagenesis with the sgRNA library alone, but exhibited mutagenesis upon addition of TREX2 or Artemis. Over half of these sites showed mutation upon addition of TREX2 or Artemis.

Supplementary Figure 5 Impact of end-processing enzymes on the distribution of net deletion sizes observed.

With no end processing enzymes (dashed line), the distribution observed is greatly biased towards smaller deletions. Intriguingly, while the addition of either Artemis or TREX2 increases the rate of observed deletions, Artemis tends to favor smaller deletions (< 5 bp in size) and TREX2 tends to mediate larger deletions (> 5 bp in size). This is consistent across both Cas9 systems (Cas9Sp left, Cas9St1 right).

Source data

Supplementary Figure 6 Impact of TREX2 on both on-target and off-target mutagenesis rates.

Three previously published sites (with 3 known off-target sites for each) were assessed for NHEJ-induced mutation in the absence and presence of the exonuclease TREX2. For each site, individual sgRNAs with Cas9 were co-transfected with or without TREX2 and cells were assessed for mutations 72hrs post-transfection. Notably, TREX2 does increase mutation rates across both on and off-target sites to varying degrees. This suggests that sgRNA site selection, with respect to minimal off-target sites is imperative, even more so in the presence of TREX2.

Source data

Supplementary Figure 7 Comparison of high- and low-activity St1 sgRNA sequences at both the integrated target site and endogenous loci in 293T cells.

(A) Position-by-position comparison of base distributions between the 82 high activity sgRNAs and 69 low activity sgRNAs. A window of 37 bp was used encompassing the 27 bp target site and 5 bp of flanking sequence on each side. For each position, p-values were calculated using a 2 x 4 Fisher’s exact test comparing the nucleotide distributions and were subsequently corrected for multiple hypothesis testing using the Benjamini-Hochberg method. Position 20 shows the largest peak with a strong preference for guanine. (B) Comparison of DNaseI hypersensitivity between the top and bottom quartile of sites with high and low activity at endogenous loci. Data was generated by ENCODE and were downloaded from the UCSC Genome Browser. A given region was defined as 225bp upstream and downstream of the target site for a total size of 427 bp. (C) Comparison of H3K4-Trimethylation between the same sites as (B).

Supplementary Figure 8 Validation of SVM predicted activity of sgRNAs for Cas9 S. pyogenes (Cas9Sp).

Five sgRNAs predicted to have high activity (blue) and low activity (orange) were transfected individually in seven different cell lines, representing a diverse set of tissue types and lineages. Due to intrinsic factors, efficiency of DNA repair, and transfection efficiencies, different cell lines have different levels of observed mutagenesis. By and large, those predicted to perform better did in fact perform better than those that were predicted to be poor.

Source data

Supplementary Figure 9 Validation of SVM-predicted activity of sgRNAs for Cas9 S. thermophilus (Cas9St1).

Five sgRNAs predicted to have high activity (blue) and low activity (orange) were transfected individually in seven different cell lines, representing a diverse set of tissue types and lineages. Due to intrinsic factors, efficiency of DNA repair, and transfection efficiencies, different cell lines have different levels of observed mutagenesis. Across all cell types, the predicted high activity guides for Cas9St1 consistently outperform those that are predicted to be low activity.

Source data

Supplementary Figure 10 Comparison of predicted SVM scores from our model with a previously published data set of 1,841 sgRNA sequences.

(A) Comparison of the gene percent rank vs. the SVM prediction score and (B) Comparison of the predicted sgRNA score from a previously published classifier vs. the SVM prediction score from our model. Spearman correlation coefficients were used to quantify the relationships. While there appears to be variability between models, there is a positive relationship between the two classifiers.

Source data

Supplementary Figure 11 Comparison of the mutation rates observed at the integrated target site with that observed at the endogenous locus.

Since chromatin effects are controlled for around the integrated site and multiple target sites were incorporated in each cell, as expected, the rates of mutagenesis are higher than that seen at the endogenous loci. This was consistent for both Cas9Sp and Cas9St1.

Source data

Supplementary Figure 12 Correlation analysis of observed mutation rates at lentiviral target sites and endogenous sites.

A strong correlation is observed when using all of the interrogated sites between the two experiments (r=0.42; p-value = 1.6e-53). However, given the endogenous sites have sites with a range of chromatin accessibility and the lentiviral sites are expected to be broadly accessible, it is likely that most accessible endogenous sites should bear closer resemblance to the lentiviral target sites. Indeed, this is the case. When taking progressively smaller subsets of the most accessible sites, the correlation dramatically improves. Using the top 100 most accessible sites, the observed correlation is 0.69 (p-value = 2.6e-15)

Source data

Supplementary Figure 13 Representation analysis of the four plasmid libraries that were used in this study.

Plasmid libraries were sequenced using the Illumina MiSeq. The Y-axis represents the percent of total reads and the X-axis represents all of the sequences in alphabetical order. The target libraries had fairly even representation across all sites, with the Cas9Sp showing slightly more uniform representation, while the sgRNA libraries appeared to have fairly uniform representation for both Cas9s.

Supplementary Figure 14 Design of oligonucleotides for the target sites and sgRNA sequences for both Cas9Sp and Cas9St1.

170 base pair oligonucleotides were synthesized using the Custom Array platform. Outer primer sequences were designed for selective amplification and 10 base barcodes were used as additional sequence to aid in mapping of target sites after Cas9-mediated mutagenesis.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14 (PDF 2264 kb)

Supplementary Table 1

List of individual sgRNAs tested for predicted activity (XLSX 10 kb)

Supplementary Table 2

TREX2 off-target analysis (XLSX 9 kb)

Supplementary Data 1

List of high and low activity sgRNAs for Cas9Sp and Cas9St1 (XLSX 17 kb)

Supplementary Data 2

Human Cas9Sp sites identified by CASFinder with SVM ranking (hg19) (XLSX 35187 kb)

Supplementary Data 3

Mouse Cas9Sp sites identified by CASFinder with SVM ranking (hg19) (XLSX 38318 kb)

Supplementary Data 4

Human Cas9St1 sites identified by CASFinder with SVM ranking (hg19) (XLSX 9908 kb)

Supplementary Data 5

Mouse Cas9St1 sites identified by CASFinder with SVM ranking (hg19) (XLSX 10761 kb)

Supplementary Data 6

List of Cas9Sp sites (XLSX 58 kb)

Supplementary Data 7

List of Cas9St1 sites (XLSX 62 kb)

Supplementary Data 8

List of synthesized oligonucleotides for Cas9Sp (sgRNAs and target sites) (XLSX 165 kb)

Supplementary Data 9

List of synthesized oligonucleotides for Cas9St1 (sgRNAs and target sites) (XLSX 175 kb)

Supplementary Data 10

Total mutation rates observed in 293T with Cas9Sp nuclease (XLSX 181 kb)

Supplementary Data 11

Total mutation rates observed in 293T with Cas9Sp nickase (XLSX 150 kb)

Supplementary Data 12

Total mutation rates observed in 293T with Cas9St1 nuclease (XLSX 164 kb)

Supplementary Data 13

Total mutation rates observed in K562 with Cas9Sp nuclease (XLSX 79 kb)

Supplementary Software

Stand-alone software program to identify high activity sgRNAs (ZIP 28269 kb)

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chari, R., Mali, P., Moosburner, M. et al. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat Methods 12, 823–826 (2015). https://doi.org/10.1038/nmeth.3473

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing