Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach

Chari, Raj; Mali, Prashant; Moosburner, Mark; Church, George M

doi:10.1038/nmeth.3473

Brief Communication
Published: 13 July 2015

Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach

Raj Chari¹^na1,
Prashant Mali²^na1,
Mark Moosburner³ &
…
George M Church^1,4

Nature Methods volume 12, pages 823–826 (2015)Cite this article

16k Accesses
263 Citations
74 Altmetric
Metrics details

Subjects

Genetic engineering

Abstract

We developed an in vivo library-on-library methodology to simultaneously assess single guide RNA (sgRNA) activity across ∼1,400 genomic loci. Assaying across multiple human cell types and end-processing enzymes as well as two Cas9 orthologs, we unraveled underlying nucleotide sequence and epigenetic parameters. Our results and software (http://crispr.med.harvard.edu/sgRNAScorer) enable improved design of reagents, shed light on mechanisms of genome targeting, and provide a generalizable framework to study nucleic acid–nucleic acid interactions and biochemistry in high throughput.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Schematic of the library-on-library approach employed in our study.**

**Figure 2: Versatility of the library-on-library approach.**

**Figure 3: Analysis of parameters modulating Cas9-sgRNA gene targeting.**

Massively parallel kinetic profiling of natural and engineered CRISPR nucleases

Article 07 September 2020

Defining genome-wide CRISPR–Cas genome-editing nuclease activity with GUIDE-seq

Article 12 November 2021

CRISPR-broad: combined design of multi-targeting gRNAs and broad, multiplex target finding

Article Open access 12 November 2023

Accession codes

Primary accessions

Sequence Read Archive

SRP048540

Referenced accessions

Sequence Read Archive

References

Mali, P. et al. Science 339, 823–826 (2013).
Article CAS PubMed PubMed Central Google Scholar
Cong, L. et al. Science 339, 819–823 (2013).
CAS PubMed PubMed Central Google Scholar
Doench, J.G. et al. Nat. Biotechnol. 32, 1262–1267 (2014).
Article CAS PubMed PubMed Central Google Scholar
Gagnon, J.A. et al. PLoS ONE 9, e98186 (2014).
Article PubMed PubMed Central Google Scholar
Certo, M.T. et al. Nat. Methods 9, 973–975 (2012).
Article CAS PubMed PubMed Central Google Scholar
Kuscu, C., Arslan, S., Singh, R., Thorpe, J. & Adli, M. Nat. Biotechnol. 32, 677–683 (2014).
Article CAS PubMed Google Scholar
Wu, X. et al. Nat. Biotechnol. 32, 670–676 (2014).
Article CAS PubMed PubMed Central Google Scholar
ENCODE Project Consortium. Nature 489, 57–74 (2012).
Koch, C.M. et al. Genome Res. 17, 691–707 (2007).
Article CAS PubMed PubMed Central Google Scholar
Ran, F.A. et al. Cell 154, 1380–1389 (2013).
Article CAS PubMed PubMed Central Google Scholar
Mali, P. et al. Nat. Biotechnol. 31, 833–838 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tsai, S.Q. et al. Nat. Biotechnol. 32, 569–576 (2014).
Article CAS PubMed PubMed Central Google Scholar
Fu, Y., Sander, J.D., Reyon, D., Cascio, V.M. & Joung, J.K. Nat. Biotechnol. 32, 279–284 (2014).
Article CAS PubMed PubMed Central Google Scholar
Fu, Y. et al. Nat. Biotechnol. 31, 822–826 (2013).
Article CAS PubMed PubMed Central Google Scholar
Guilinger, J.P., Thompson, D.B. & Liu, D.R. Nat. Biotechnol. 32, 577–582 (2014).
Article CAS PubMed PubMed Central Google Scholar
Pattanayak, V. et al. Nat. Biotechnol. 31, 839–843 (2013).
Article CAS PubMed PubMed Central Google Scholar
Tsai, S.Q. et al. Nat. Biotechnol. 33, 187–197 (2015).
Article CAS PubMed Google Scholar
Aach, J., Mali, P. & Church, G.M. Preprint at bioRxiv 10.1101/005074 (2014).
Futreal, P.A. et al. Nat. Rev. Cancer 4, 177–183 (2004).
Article CAS PubMed PubMed Central Google Scholar
Karolchik, D. et al. Nucleic Acids Res. 32, D493–D496 (2004).
Article CAS PubMed PubMed Central Google Scholar
Jiang, H. & Wong, W.H. Bioinformatics 24, 2395–2396 (2008).
Article CAS PubMed PubMed Central Google Scholar
Xu, Q., Schlabach, M.R., Hannon, G.J. & Elledge, S.J. Proc. Natl. Acad. Sci. USA 106, 2289–2294 (2009).
Article CAS PubMed PubMed Central Google Scholar
Esvelt, K.M. et al. Nat. Methods 10, 1116–1121 (2013).
Article CAS PubMed PubMed Central Google Scholar
Harris, D.R. et al. PLoS Biol. 2, e304 (2004).
Magoč, T. & Salzberg, S.L. Bioinformatics 27, 2957–2963 (2011).
Article PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Droettboom, M. et al. Matplotlib version 1.4.0. 10.5281/zenodo.11451 (2014).
Schölkopf, B., Burges, C.J.C. & Smola, A.J. Advances in Kernel Methods: Support Vector Learning (MIT Press, 1999).
Karolchik, D. et al. Nucleic Acids Res. 42, D764–D770 (2014).
Article CAS PubMed Google Scholar
Kent, W.J., Zweig, A.S., Barber, G., Hinrichs, A.S. & Karolchik, D. Bioinformatics 26, 2204–2207 (2010).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We acknowledge J. Aach for help with CasFinder and useful discussion, B. Turczyk for help with custom array oligonucleotide synthesis, S. Byrne (Harvard Medical School) for providing PGP1 induced pluripotent stem cells, and A. Chavez for useful discussion. This work was supported by US National Institutes of Health grant P50 HG005550. R.C. was supported by a Banting Fellowship from the Canadian Institutes of Health Research. P.M. is supported by University of California, San Diego, startup funds and a Burroughs Wellcome Career Award at the Scientific Interface.

Author information

Raj Chari and Prashant Mali: These authors contributed equally to this work.

Authors and Affiliations

Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
Raj Chari & George M Church
Department of Bioengineering, University of California, San Diego, La Jolla, California, USA
Prashant Mali
Scripps Institute of Oceanography, University of California, San Diego, La Jolla, California, USA
Mark Moosburner
Wyss Institute for Biologically Inspired Engineering, Harvard University, Cambridge, Massachusetts, USA
George M Church

Authors

Raj Chari
View author publications
You can also search for this author in PubMed Google Scholar
Prashant Mali
View author publications
You can also search for this author in PubMed Google Scholar
Mark Moosburner
View author publications
You can also search for this author in PubMed Google Scholar
George M Church
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.C. and P.M. designed the study and performed the experiments. R.C. and P.M. wrote and edited the manuscript. All authors approved the final version of the manuscript. R.C. implemented custom Python software and performed data analysis. M.M. provided technical assistance. G.M.C. supervised the project.

Corresponding authors

Correspondence to Prashant Mali or George M Church.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Technical and biological replicates of the library-on-library experiments.

Two separate transfections of the sgRNA library and Cas9_Sp nuclease were performed BR1 and BR2. For each transfection, two libraries were prepared from each (TR1 and TR2). Thus, in total these two transfections have four samples. (A) Technical replicates comparison for each biological replicate (B) Scatter plots of each sample vs. each other sample across both biological replicates. Since there were two samples / transfection, a total of 4 comparisons are shown. Pearson correlation coefficients ranged from 0.819 to 0.853.

Source data

Supplementary Figure 2 Distribution of observed NHEJ mutation rates across different Cas9s and end-processing enzymes.

Total observed NHEJ rates in (A) Cas9_Sp nuclease, (B) Cas9_Sp nickase and (C) Cas9_St1 nuclease transfected experiments. For each set of experiments, the respective sgRNA library was transfected alone, as well as four transfections each with a different end processing enzymes. Control samples represent cells with the integrated target library that were transfected with just Cas9. Of the enzymes we tested, TREX2 and Artemis have the strongest impact on increasing NHEJ-associated mutation rate.

Supplementary Figure 3 Distribution of observed NHEJ-mediated insertion and deletion rates across different Cas9s and end-processing enzymes.

Box plots of the insertion and deletion rates NHEJ events for Cas9_Sp Nuclease (top), Cas9_Sp Nickase (middle) and Cas9_St1 nuclease (bottom). Consistent in both nuclease sets, TREX2 alters the NHEJ pattern by biasing towards deletions and away from insertions. We also notice that the insertion rate in the Cas9_Sp Nuclease is modestly higher in the Tdt and ddrA samples.

Supplementary Figure 4 Impact of TREX2 and Artemis across the target-site library.

(A) Histogram plot of the fold increase in NHEJ-associated mutagenesis across all of the sites due to the addition of end processing enzymes. Top plot illustrates the effect of TREX2 while the bottom plot shows the effect of Artemis. The observed impact appears quite variable, suggesting sequence context may also be important. (B) Box plot depicting the range of fold increase. In some instances, the effect is little to none, but in other cases, this enhancement could be as much 10-15 fold. (C) Bar plot showing the percentage of sites which showed no mutagenesis with the sgRNA library alone, but exhibited mutagenesis upon addition of TREX2 or Artemis. Over half of these sites showed mutation upon addition of TREX2 or Artemis.

Supplementary Figure 5 Impact of end-processing enzymes on the distribution of net deletion sizes observed.

With no end processing enzymes (dashed line), the distribution observed is greatly biased towards smaller deletions. Intriguingly, while the addition of either Artemis or TREX2 increases the rate of observed deletions, Artemis tends to favor smaller deletions (< 5 bp in size) and TREX2 tends to mediate larger deletions (> 5 bp in size). This is consistent across both Cas9 systems (Cas9_Sp left, Cas9_St1 right).

Source data

Supplementary Figure 6 Impact of TREX2 on both on-target and off-target mutagenesis rates.

Three previously published sites (with 3 known off-target sites for each) were assessed for NHEJ-induced mutation in the absence and presence of the exonuclease TREX2. For each site, individual sgRNAs with Cas9 were co-transfected with or without TREX2 and cells were assessed for mutations 72hrs post-transfection. Notably, TREX2 does increase mutation rates across both on and off-target sites to varying degrees. This suggests that sgRNA site selection, with respect to minimal off-target sites is imperative, even more so in the presence of TREX2.

Source data

Supplementary Figure 7 Comparison of high- and low-activity St1 sgRNA sequences at both the integrated target site and endogenous loci in 293T cells.

(A) Position-by-position comparison of base distributions between the 82 high activity sgRNAs and 69 low activity sgRNAs. A window of 37 bp was used encompassing the 27 bp target site and 5 bp of flanking sequence on each side. For each position, p-values were calculated using a 2 x 4 Fisher’s exact test comparing the nucleotide distributions and were subsequently corrected for multiple hypothesis testing using the Benjamini-Hochberg method. Position 20 shows the largest peak with a strong preference for guanine. (B) Comparison of DNaseI hypersensitivity between the top and bottom quartile of sites with high and low activity at endogenous loci. Data was generated by ENCODE and were downloaded from the UCSC Genome Browser. A given region was defined as 225bp upstream and downstream of the target site for a total size of 427 bp. (C) Comparison of H3K4-Trimethylation between the same sites as (B).

Supplementary Figure 8 Validation of SVM predicted activity of sgRNAs for Cas9 S. pyogenes (Cas9_Sp).

Five sgRNAs predicted to have high activity (blue) and low activity (orange) were transfected individually in seven different cell lines, representing a diverse set of tissue types and lineages. Due to intrinsic factors, efficiency of DNA repair, and transfection efficiencies, different cell lines have different levels of observed mutagenesis. By and large, those predicted to perform better did in fact perform better than those that were predicted to be poor.

Source data

Supplementary Figure 9 Validation of SVM-predicted activity of sgRNAs for Cas9 S. thermophilus (Cas9_St1).

Five sgRNAs predicted to have high activity (blue) and low activity (orange) were transfected individually in seven different cell lines, representing a diverse set of tissue types and lineages. Due to intrinsic factors, efficiency of DNA repair, and transfection efficiencies, different cell lines have different levels of observed mutagenesis. Across all cell types, the predicted high activity guides for Cas9_St1 consistently outperform those that are predicted to be low activity.

Source data

Supplementary Figure 10 Comparison of predicted SVM scores from our model with a previously published data set of 1,841 sgRNA sequences.

(A) Comparison of the gene percent rank vs. the SVM prediction score and (B) Comparison of the predicted sgRNA score from a previously published classifier vs. the SVM prediction score from our model. Spearman correlation coefficients were used to quantify the relationships. While there appears to be variability between models, there is a positive relationship between the two classifiers.

Source data

Supplementary Figure 11 Comparison of the mutation rates observed at the integrated target site with that observed at the endogenous locus.

Since chromatin effects are controlled for around the integrated site and multiple target sites were incorporated in each cell, as expected, the rates of mutagenesis are higher than that seen at the endogenous loci. This was consistent for both Cas9_Sp and Cas9_St1.

Source data

Supplementary Figure 12 Correlation analysis of observed mutation rates at lentiviral target sites and endogenous sites.

A strong correlation is observed when using all of the interrogated sites between the two experiments (r=0.42; p-value = 1.6e^-53). However, given the endogenous sites have sites with a range of chromatin accessibility and the lentiviral sites are expected to be broadly accessible, it is likely that most accessible endogenous sites should bear closer resemblance to the lentiviral target sites. Indeed, this is the case. When taking progressively smaller subsets of the most accessible sites, the correlation dramatically improves. Using the top 100 most accessible sites, the observed correlation is 0.69 (p-value = 2.6e^-15)

Source data

Supplementary Figure 13 Representation analysis of the four plasmid libraries that were used in this study.

Plasmid libraries were sequenced using the Illumina MiSeq. The Y-axis represents the percent of total reads and the X-axis represents all of the sequences in alphabetical order. The target libraries had fairly even representation across all sites, with the Cas9_Sp showing slightly more uniform representation, while the sgRNA libraries appeared to have fairly uniform representation for both Cas9s.

Supplementary Figure 14 Design of oligonucleotides for the target sites and sgRNA sequences for both Cas9_Sp and Cas9_St1.

170 base pair oligonucleotides were synthesized using the Custom Array platform. Outer primer sequences were designed for selective amplification and 10 base barcodes were used as additional sequence to aid in mapping of target sites after Cas9-mediated mutagenesis.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14 (PDF 2264 kb)

Supplementary Table 1

List of individual sgRNAs tested for predicted activity (XLSX 10 kb)

Supplementary Table 2

TREX2 off-target analysis (XLSX 9 kb)

Supplementary Data 1

List of high and low activity sgRNAs for Cas9Sp and Cas9St1 (XLSX 17 kb)

Supplementary Data 2

Human Cas9Sp sites identified by CASFinder with SVM ranking (hg19) (XLSX 35187 kb)

Supplementary Data 3

Mouse Cas9Sp sites identified by CASFinder with SVM ranking (hg19) (XLSX 38318 kb)

Supplementary Data 4

Human Cas9St1 sites identified by CASFinder with SVM ranking (hg19) (XLSX 9908 kb)

Supplementary Data 5

Mouse Cas9St1 sites identified by CASFinder with SVM ranking (hg19) (XLSX 10761 kb)

Supplementary Data 6

List of Cas9Sp sites (XLSX 58 kb)

Supplementary Data 7

List of Cas9St1 sites (XLSX 62 kb)

Supplementary Data 8

List of synthesized oligonucleotides for Cas9Sp (sgRNAs and target sites) (XLSX 165 kb)

Supplementary Data 9

List of synthesized oligonucleotides for Cas9St1 (sgRNAs and target sites) (XLSX 175 kb)

Supplementary Data 10

Total mutation rates observed in 293T with Cas9_Sp nuclease (XLSX 181 kb)

Supplementary Data 11

Total mutation rates observed in 293T with Cas9_Sp nickase (XLSX 150 kb)

Supplementary Data 12

Total mutation rates observed in 293T with Cas9_St1 nuclease (XLSX 164 kb)

Supplementary Data 13

Total mutation rates observed in K562 with Cas9_Sp nuclease (XLSX 79 kb)

Supplementary Software

Stand-alone software program to identify high activity sgRNAs (ZIP 28269 kb)

Source data

Source data to Fig. 1

Source data to Fig. 2

Source data to Supplementary Fig. 3

Source data to Supplementary Fig. 4

Source data to Supplementary Fig. 5

Source data to Supplementary Fig. 6

Source data to Supplementary Fig. 7

Source data to Supplementary Fig. 8

Source data to Supplementary Fig. 9

Source data to Supplementary Fig. 10

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chari, R., Mali, P., Moosburner, M. et al. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat Methods 12, 823–826 (2015). https://doi.org/10.1038/nmeth.3473

Download citation

Received: 30 January 2015
Accepted: 26 May 2015
Published: 13 July 2015
Issue Date: September 2015
DOI: https://doi.org/10.1038/nmeth.3473

This article is cited by

Digital data storage on DNA tape using CRISPR base editors
- Afsaneh Sadremomtaz
- Robert F. Glass
- Reza Zadegan
Nature Communications (2023)
Optimization of Cas9 activity through the addition of cytosine extensions to single-guide RNAs
- Masaki Kawamata
- Hiroshi I. Suzuki
- Atsushi Suzuki
Nature Biomedical Engineering (2023)
Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities
- Heng Zhang
- Jianfeng Yan
- Lijia Ma
Cell Discovery (2023)
BoostMEC: predicting CRISPR-Cas9 cleavage efficiency through boosting models
- Oscar A. Zarate
- Yiben Yang
- Ji-Ping Wang
BMC Bioinformatics (2022)
Cognate microglia–T cell interactions shape the functional regulatory T cell pool in experimental autoimmune encephalomyelitis pathology
- Zhana Haimon
- Gal Ronit Frumer
- Steffen Jung
Nature Immunology (2022)

Subjects

Abstract

Access options

Similar content being viewed by others

Accession codes

Primary accessions

Sequence Read Archive

Referenced accessions

Sequence Read Archive

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links