Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9

Journal name:
Nature Biotechnology
Volume:
34,
Pages:
184–191
Year published:
DOI:
doi:10.1038/nbt.3437
Received
Accepted
Published online

Abstract

CRISPR-Cas9–based genetic screens are a powerful new tool in biology. By simply altering the sequence of the single-guide RNA (sgRNA), one can reprogram Cas9 to target different sites in the genome with relative ease, but the on-target activity and off-target effects of individual sgRNAs can vary widely. Here, we use recently devised sgRNA design rules to create human and mouse genome-wide libraries, perform positive and negative selection screens and observe that the use of these rules produced improved results. Additionally, we profile the off-target activity of thousands of sgRNAs and develop a metric to predict off-target sites. We incorporate these findings from large-scale, empirical data to improve our computational design rules and create optimized sgRNA libraries that maximize on-target activity and minimize off-target effects to enable more effective and efficient genetic screens and genome engineering.

At a glance

Figures

  1. Comparative performance of the Avana library.
    Figure 1: Comparative performance of the Avana library.

    (a) Distribution of Rule Set 1 scores across libraries. The box represents the 25th, 50th and 75th percentiles; whiskers show 5th and 95th percentiles. (b) Comparison of the FDR-corrected q-values determined by STARS for the top 100 ranked genes in the vemurafenib resistance assay in A375 cells. (c) Validation of individual sgRNAs for vemurafenib resistance in a competition assay in A375 cells. Horizontal bars represent the average of the individual sgRNAs for each gene. Previously validated genes are labeled in blue. ETP, early time point. (d) Subsampling analysis of the Avana library. Genes that passed at different FDR thresholds with STARS when all six subpools were analyzed (first number in legend); the average number of retained genes that score at different FDR thresholds following removal of subpools (second number in legend). LG, lentiGuide vector; LC, lentiCRISPRv2 vector. Errors bars, 1 s.d. from the mean of different combinations of subpools. (e) ROC-AUC analysis of individual sgRNAs targeting core essential genes in dropout screens in A375 cells. AUC values are indicated in parentheses.

  2. HPRT1 and NUDT5 confer 6-thioguanine resistance.
    Figure 2: HPRT1 and NUDT5 confer 6-thioguanine resistance.

    (a) For each of six sgRNAs targeting these genes, fold-enrichment for the indicated sgRNA after 2 weeks of selection with 6-thioguanine, relative to its starting abundance, assayed in three different cell lines. (b) TIDE analysis of indels for sgRNAs number 4 and 6 from a targeting NUDT5 tested in three cell lines. Asterisk indicates a sample where no cells survived. (c) Schematic of purine metabolism. Proteins are shown in blue circles, small molecules in italics. PRPS1 is also known as PRPP synthetase; PRPP is phosphoribosyl pyrophosphate.

  3. Tiled library screen for resistance genes.
    Figure 3: Tiled library screen for resistance genes.

    (a) Performance of sgRNAs by gene for each of three small-molecule challenges. The box represents the 25th, 50th and 75th percentiles, whiskers show 5th and 95th percentiles, and outliers are shown as individual dots. (b) For sgRNAs targeting MED12, comparison of the log2 fold-change when challenged with vemurafenib and selumetinib. (c) Activity of sgRNAs as a function of target site within the protein, divided by deciles, for 17 proteins. The box represents the 25th, 50th and 75th percentiles; whiskers show 10th and 90th percentiles. Asterisk indicates statistically significant difference in activity (adjusted P-values < 0.02, one-way ANOVA with repeated measures, with Tukey's correction for multiple comparisons).

  4. Development of Rule Set 2 for prediction of sgRNA on-target activity.
    Figure 4: Development of Rule Set 2 for prediction of sgRNA on-target activity.

    (a) Comparison of classification models. Spearman correlation between measured activity and predicted activity score is plotted. Error bars show the s.d. across genes with a leave-one-gene-out approach. SVM + LogReg (Rule Set 1) performs better than the next-best model for all three data sets (left to right, P = 1.8 × 10−8, 5.2 × 10−13 and P < 10−16, using the statistical test for differences in Spearman correlation)48. (b) Addition of new features improves performance using L1 linear regression. Significance determined as in a; **P = 4.2 × 10−3; ***P = 2.32 × 10−4; ****P < 10−16. (c) Comparison of regression models, as well as the best-performing classification model, SVM + LogReg. Significance values are shown for the comparison between gradient-boosted regression trees (Boosted RT) and L1 regression, using the same measure of significance as in a; P = 0.054 (n.s., not significant); ***P = 4.9 × 10−4; ****P = 5.3 × 10−5. (d) Assessment of modeling performance with increasing number of genes used in each training set. Error bars indicate one s.d. across genes with a leave-one-gene-out approach. (e) Rule Set 2 performance on independently generated negative selection data sets. From left to right, for the three comparisons P = 5.9 × 10−80, 2.1 × 10−24 and 3.9 × 10−35 (two-sample Kolmogorov-Smirnov test). (f) Rule Set 2 performance on independently generated CRISPRa/i data sets. From left to right, for the three comparisons ****P = 1.8 × 10−40, ***P = 1.1 × 10−4 and P = 0.14 (n.s.) (two-sample Kolmogorov-Smirnov test).

  5. CFD score for assessing off-target activity of sgRNAs.
    Figure 5: CFD score for assessing off-target activity of sgRNAs.

    (a) Activity of sgRNAs as a function of the final two nucleotides of the PAM. The box represents the 25th, 50th and 75th percentiles, whiskers show 5th and 95th percentiles, and outliers are shown as individual dots. (b) Distribution of log2 fold-change values for three classifications of sgRNAs assessed by flow cytometry for activity against CD33. (c) Heat-map of the percent-active values for all sgRNA-DNA interactions where one nucleotide was removed from the sgRNA, creating a bulged DNA base. (d) Same as in c but with an insertion of nucleotide in the sgRNA to create a bulged RNA base. (e) Same as in c and d but with symmetric mismatches. Grayscale is the same for ce. (f) Comparison of the correlation of three off-target scoring metrics to measured off-target activity of 89 sgRNAs with mismatches to the cell surface receptor H2-K. (g) AUC values for GUIDE-seq reads as a function of number of mismatches assessed by three scoring metrics; same color scheme as in f. (h) Distribution of sgRNAs targeting nonessential genes in a dropout screen in A375 cells. All 109,463 sgRNAs in the Avana library screened in A375 cells were ranked by their depletion, binned by decile, and the count of 4,950 sgRNAs targeting the set of nonessential genes in each bin is plotted. (i) For the sgRNAs targeting nonessential genes plotted in h, the distribution of the number of off-target sites in protein-coding regions with CFD scores > 0.2. The box represents the 25th, 50th and 75th percentiles, whiskers show 10th and 90th percentiles. ****P < 10−4, Kruskal-Wallis test; the most-depleted sgRNAs. The x-axis is the same for panels h and i.

  6. On-target and off-target properties of the Brunello and Brie libraries.
    Figure 6: On-target and off-target properties of the Brunello and Brie libraries.

    (a) Distribution of Rule Set 2 on-target activity scores across libraries. The box represents the 25th, 50th and 75th percentiles, whiskers show 5th and 95th percentiles. (b,c) Cumulative distribution of the number of off-target sites with CFD scores > 0.2 in protein-coding regions across human libraries (b) and mouse libraries (c).

Accession codes

Primary accessions

Sequence Read Archive

References

  1. Jinek, M. et al. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816821 (2012).
  2. Mali, P. et al. RNA-guided human genome engineering via Cas9. Science 339, 823826 (2013).
  3. Cong, L. et al. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819823 (2013).
  4. Jinek, M. et al. RNA-programmed genome editing in human cells. eLife 2, e00471 (2013).
  5. Hartenian, E. & Doench, J.G. Genetic screens and functional genomics using CRISPR/Cas9 technology. FEBS J. 282, 13831393 (2015).
  6. Shalem, O. et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science 343, 8487 (2014).
  7. Wang, T., Wei, J.J., Sabatini, D.M. & Lander, E.S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 8084 (2014).
  8. Koike-Yusa, H., Li, Y., Tan, E.-P., Velasco-Herrera, Mdel.C. & Yusa, K. Genome-wide recessive genetic screening in mammalian cells with a lentiviral CRISPR-guide RNA library. Nat. Biotechnol. 32, 267273 (2014).
  9. Fu, Y. et al. High-frequency off-target mutagenesis induced by CRISPR-Cas nucleases in human cells. Nat. Biotechnol. 31, 822826 (2013).
  10. Veres, A. et al. Low incidence of off-target mutations in individual CRISPR-Cas9 and TALEN targeted human stem cell clones detected by whole-genome sequencing. Cell Stem Cell 15, 2730 (2014).
  11. Ran, F.A. et al. Double nicking by RNA-guided CRISPR Cas9 for enhanced genome editing specificity. Cell 154, 13801389 (2013).
  12. Guilinger, J.P., Thompson, D.B. & Liu, D.R. Fusion of catalytically inactive Cas9 to FokI nuclease improves the specificity of genome modification. Nat. Biotechnol. 32, 577582 (2014).
  13. Hsu, P.D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827832 (2013).
  14. Doench, J.G. et al. Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 12621267 (2014).
  15. Sanjana, N.E., Shalem, O. & Zhang, F. Improved vectors and genome-wide libraries for CRISPR screening. Nat. Methods 11, 783784 (2014).
  16. Whittaker, S.R. et al. A genome-scale RNA interference screen implicates NF1 loss in resistance to RAF inhibition. Cancer Discov. 3, 350362 (2013).
  17. Bollag, G. et al. Clinical efficacy of a RAF inhibitor needs broad target blockade in BRAF-mutant melanoma. Nature 467, 596599 (2010).
  18. Johannessen, C.M. et al. COT drives resistance to RAF inhibition through MAP kinase pathway reactivation. Nature 468, 968972 (2010).
  19. Davies, B.R. et al. AZD6244 (ARRY-142886), a potent inhibitor of mitogen-activated protein kinase/extracellular signal-regulated kinase kinase 1/2 kinases: mechanism of action in vivo, pharmacokinetic/pharmacodynamic relationship, and potential for combination in preclinical models. Mol. Cancer Ther. 6, 22092219 (2007).
  20. Storey, J.D. & Tibshirani, R. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100, 94409445 (2003).
  21. Li, W. et al. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 15, 554 (2014).
  22. Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495501 (2014).
  23. Bae, S. et al. TRIAD1 inhibits MDM2-mediated p53 ubiquitination and degradation. FEBS Lett. 586, 30573063 (2012).
  24. Gamper, A.M. & Roeder, R.G. Multivalent binding of p53 to the STAGA complex mediates coactivator recruitment after UV damage. Mol. Cell. Biol. 28, 25172527 (2008).
  25. Hart, T., Brown, K.R., Sircoulomb, F., Rottapel, R. & Moffat, J. Measuring error rates in genomic perturbation screens: gold standards for human functional genomics. Mol. Syst. Biol. 10, 733 (2014).
  26. Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 1554515550 (2005).
  27. Wang, T. et al. Identification and characterization of essential genes in the human genome. Science 350, 10961101 (2015).
  28. Caskey, C.T. & Kruh, G.D. The HPRT locus. Cell 16, 19 (1979).
  29. Brinkman, E.K., Chen, T., Amendola, M. & van Steensel, B. Easy quantitative assessment of genome editing by sequence trace decomposition. Nucleic Acids Res. 42, e168 (2014).
  30. Zha, M. et al. Molecular mechanism of ADP-ribose hydrolysis by human NUDT5 from structural and kinetic studies. J. Mol. Biol. 379, 568578 (2008).
  31. Cheok, M.H. & Evans, W.E. Acute lymphoblastic leukaemia: a model for the pharmacogenomics of cancer therapy. Nat. Rev. Cancer 6, 117129 (2006).
  32. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603607 (2012).
  33. Liberzon, A. et al. Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 17391740 (2011).
  34. Shi, J. et al. Discovery of cancer drug targets by CRISPR-Cas9 screening of protein domains. Nat. Biotechnol. 33, 661667 (2015).
  35. Chari, R., Mali, P., Moosburner, M. & Church, G.M. Unraveling CRISPR-Cas9 genome engineering parameters via a library-on-library approach. Nat. Methods 12, 823826 (2015).
  36. Xu, H. et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 11471157 (2015).
  37. Jinek, M. et al. Structures of Cas9 endonucleases reveal RNA-mediated conformational activation. Science 343, 1247997 (2014).
  38. Sternberg, S.H., Redding, S., Jinek, M., Greene, E.C. & Doudna, J.A. DNA interrogation by the CRISPR RNA-guided endonuclease Cas9. Nature 507, 6267 (2014).
  39. Bae, S., Kweon, J., Kim, H.S. & Kim, J.-S. Microhomology-based choice of Cas9 nuclease target sites. Nat. Methods 11, 705706 (2014).
  40. Gilbert, L.A. et al. Genome-scale CRISPR-mediated control of gene repression and activation. Cell 159, 647661 (2014).
  41. Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res. 42, 74737485 (2014).
  42. Stemmer, M., Thumberger, T., Del Sol Keyer, M., Wittbrodt, J. & Mateo, J.L. CCTop: an intuitive, flexible and reliable CRISPR/Cas9 target prediction tool. PLoS One 10, e0124633e11 (2015).
  43. Tsai, S.Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR-Cas nucleases. Nat. Biotechnol. 33, 187197 (2015).
  44. Langmead, B. & Salzberg, S.L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357359 (2012).
  45. Heigwer, F., Kerr, G. & Boutros, M. E-CRISP: fast CRISPR target site identification. Nat. Methods 11, 122123 (2014).
  46. Bae, S., Park, J., Kim, J.S. & Kim, J.S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 14731475 (2014).
  47. Kampmann, M. et al. Next-generation libraries for robust RNA interference-based genome-wide screens. Proc. Natl. Acad. Sci. USA 112, E3384E3391 (2015).
  48. Steiger, J.H. Tests for comparing elements of a correlation matrix. Psychol. Bull. 87, 245251 (1980).
  49. Blasi, E., Radzioch, D., Durum, S.K. & Varesio, L. A murine macrophage cell line, immortalized by v-raf and v-myc oncogenes, exhibits normal macrophage functions. Eur. J. Immunol. 17, 14911498 (1987).
  50. Stansley, B., Post, J. & Hensley, K. A comparative review of cell culture systems for the study of microglial biology in Alzheimer's disease. J. Neuroinflammation 9, 115 (2012).
  51. Ben-Hur, A., Ong, C.S., Sonnenburg, S., Schölkopf, B. & Rätsch, G. Support vector machines and kernels for computational biology. PLoS Comput. Biol. 4, e1000173 (2008).
  52. Cock, P.J.A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 14221423 (2009).
  53. Le Novère, N. MELTING, computing the melting temperature of nucleic acid duplex. Bioinformatics 17, 12261227 (2001).
  54. Steiger, J.H. Tests for comparing elements of a correlation matrix. Psychol. Bull. 87, 245251 (1980).

Download references

Author information

  1. These authors contributed equally to this work.

    • John G Doench,
    • Nicolo Fusi,
    • Meagan Sullender,
    • Mudra Hegde,
    • Emma W Vaimberg &
    • Jennifer Listgarten

Affiliations

  1. Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.

    • John G Doench,
    • Meagan Sullender,
    • Mudra Hegde,
    • Emma W Vaimberg,
    • Katherine F Donovan,
    • Ian Smith,
    • Zuzana Tothova &
    • David E Root
  2. Microsoft Research New England, Cambridge, Massachusetts, USA.

    • Nicolo Fusi &
    • Jennifer Listgarten
  3. Dana Farber Cancer Institute, Division of Hematologic Malignancies, Boston, Massachusetts, USA.

    • Zuzana Tothova
  4. Department of Pathology and Immunology, Washington University School of Medicine, Washington University School of Medicine., St. Louis, Missouri, USA.

    • Craig Wilen,
    • Robert Orchard &
    • Herbert W Virgin

Contributions

J.G.D., M.S., E.W.V., Z.T., C.W. and R.O. designed experiments; M.S., E.W.V., K.F.D., Z.T., C.W. and R.O. performed experiments; J.G.D., M.H. and I.S. analyzed experiments; N.F. and J.L. performed the computational modeling; J.G.D., N.F., J.L. and D.E.R. wrote the manuscript with assistance from other authors; J.G.D., H.W.V. and D.E.R. supervised the research.

Competing financial interests

N.F. and J.L. are employed by Microsoft Research.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Figures (2,248 KB)

    Supplementary Figures 1–22

Zip files

  1. Supplementary Tables 1–23 (128,255 KB)

    Supplementary Table 1. Rounds of selection used to design Avana and Asiago library
    Supplementary Table 2. sgRNAs in the six subpools of Avana library
    Supplementary Table 3. sgRNAs in the six subpools of Asiago library
    Supplementary Table 4. Screening data for vemurafenib in A375 cells for all biological replicates screened with Avana libraries (divided by subpools) as well as GeCKOv1 and GeCKOv2 libraries
    Supplementary Table 5. RIGER analysis of vemurafenib screens using weighted-sum option
    Supplementary Table 6. STARS analysis of vemurafenib screens
    Supplementary Table 7. List of PanCancer genes
    Supplementary Table 8. Screening data for selumetinib in A375 cells for all biological replicates screened with Avana library
    Supplementary Table 9. STARS analysis of selumetinib screens
    Supplementary Table 10. Negative selection screening data in HT29 and A375 cells with GeCKO libraries
    Supplementary Table 11. Negative selection screening data in HT29 and A375 cells with GeCKO libraries and the set of 291 core essential genes annotated by Hart and colleagues
    Supplementary Table 12. STARS analysis of the negative selection screening data for GeCKO and Avana libraries individually
    Supplementary Table 13. STARS analysis of the negative selection screening data for GeCKO and Avana libraries merged
    Supplementary Table 14. Screening data for 6-thioguanine screen in 293T, A375 and HT29 cells
    Supplementary Table 15. Screening data for interferon-gamma treatment of BV2 cells and output of STARS analysis
    Supplementary Table 16. Screening data for the tiling of resistance genes
    Supplementary Table 17. Gini importance of individual features in the gradient-boosted regression tress model, Rule Set 2
    Supplementary Table 18. Screening data for off-target analysis of CD33 in MOLM13 cells
    Supplementary Table 19. Percent-active, delta-log-fold-change, and one-sided Welch's t-test p-value calculations for the CD33 off-target dataset that is used to calculate the CFD score
    Supplementary Table 20. Activity of sgRNAs designed against H2-D1 that have up to 6 mismatches to H2-K
    Supplementary Table 21. sgRNAs in the Brunello library
    Supplementary Table 22. sgRNAs in the Brie library
    Supplementary Table 23. sgRNA sequences and primers used for individual follow-up experiments

  2. Supplementary Code (802 KB)

Additional data