Abstract

Off-target effects of the CRISPR–Cas9 system can lead to suboptimal gene-editing outcomes and are a bottleneck in its development. Here, we introduce two interdependent machine-learning models for the prediction of off-target effects of CRISPR–Cas9. The approach, which we named Elevation, scores individual guide–target pairs, and also aggregates them into a single, overall summary guide score. We demonstrate that Elevation consistently outperforms competing approaches on both tasks. We also introduce an evaluation method that balances errors between active and inactive guides, thereby encapsulating a range of practical use cases. Because of the large-scale and computational demands of the prediction of off-target activities, we have developed a fast cloud-based service (https://crispr.ml) for end-to-end guide-RNA design. The service makes use of pre-computed on-target and off-target activity prediction for every genic region in the human genome.

  • Subscribe to Nature Biomedical Engineering for full access:

    $99

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    Doench, J. G. et al. Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR–Cas9. Nat. Biotechnol. 34, 184–191 (2016).

  2. 2.

    Hsu, P. D. et al. DNA targeting specificity of RNA-guided Cas9 nucleases. Nat. Biotechnol. 31, 827–832 (2013).

  3. 3.

    Tsai, S. Q. et al. GUIDE-seq enables genome-wide profiling of off-target cleavage by CRISPR–Cas nucleases. Nat. Biotechnol. 33, 187–197 (2015).

  4. 4.

    Frock, R. L. et al. Genome-wide detection of DNA double-stranded breaks induced by engineered nucleases. Nat. Biotechnol. 33, 179–186 (2015).

  5. 5.

    Wang, X. et al. Unbiased detection of off-target cleavage by CRISPR–Cas9 and TALENs using integrase-defective lentiviral vectors. Nat. Biotechnol. 33, 175–178 (2015).

  6. 6.

    Kim, D. et al. Digenome-seq: genome-wide profiling of CRISPR–Cas9 off-target effects in human cells. Nat. Methods 12, 237–243 (2015).

  7. 7.

    Kim, D., Kim, S., Kim, S., Park, J. & Kim, J.-S. Genome-wide target specificities of CRISPR–Cas9 nucleases revealed by multiplex Digenome-seq. Genome Res. 26, 406–415 (2016).

  8. 8.

    Tsai, S. Q. et al. CIRCLE-seq: a highly sensitive in vitro screen for genome-wide CRISPR–Cas9 nuclease off-targets. Nat. Methods 14, 607–614 (2017).

  9. 9.

    Cameron, P. et al. Mapping the genomic landscape of CRISPR–Cas9 cleavage. Nat. Methods 14, 600–606 (2017).

  10. 10.

    Ran, F. A. et al. In vivo genome editing using Staphylococcus aureus Cas9. Nature 520, 186–191 (2015).

  11. 11.

    Yan, W. X. et al. BLISS is a versatile and quantitative method for genome-wide profiling of DNA double-strand breaks. Nat. Commun. 8, 15058 (2017).

  12. 12.

    Crosetto, N. et al. Nucleotide-resolution DNA double-strand break mapping by next-generation sequencing. Nat. Methods 10, 361–365 (2013).

  13. 13.

    Stemmer, M., Thumberger, T., del Sol Keyer, M., Wittbrodt, J. & Mateo, J. L. CCTop: an intuitive, flexible and reliable CRISPR/Cas9 target prediction tool. PLoS ONE 10, e0124633 (2015).

  14. 14.

    Bae, S., Park, J. & Kim, J. S. Cas-OFFinder: a fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30, 1473–1475 (2014).

  15. 15.

    Haeussler, M. et al. Evaluation of off-target and on-target scoring algorithms and integration into the guide RNA selection tool CRISPOR. Genome Biol. 17, 148 (2016).

  16. 16.

    Labun, K., Montague, T. G., Gagnon, J. A., Thyme, S. B. & Valen, E. CHOPCHOP v2: a web tool for the next generation of CRISPR genome engineering. Nucleic Acids Res. 44, W272–W276 (2016).

  17. 17.

    Heigwer, F., Kerr, G. & Boutros, M. E-CRISP: fast CRISPR target site identification. Nat. Methods 11, 122–123 (2014).

  18. 18.

    Ma, J. et al. CRISPR-DO for genome-wide CRISPR design and optimization. Bioinformatics 32, 3336–3338 (2016).

  19. 19.

    Singh, R., Kuscu, C., Quinlan, A., Qi, Y. & Adli, M. Cas9–chromatin binding information enables more accurate CRISPR off-target prediction. Nucleic Acids Res. 43, e118 (2015).

  20. 20.

    Cradick, T. J., Qiu, P., Lee, C. M., Fine, E. J. & Bao, G. COSMID: a web-based tool for identifying and validating CRISPR/Cas off-target sites. Mol. Ther. Nucleic Acids 3, e214 (2014).

  21. 21.

    Xu, H.et al. Sequence determinants of improved CRISPR sgRNA design. Genome Res. 25, 1147–1157 2015).

  22. 22.

    Chari, R., Mali, P., Moosburner, M. & Church, G. M. Unraveling CRISPR–Cas9 genome engineering parameters via a library-on-library approach. Nat. Methods 12, 823–826 (2015).

  23. 23.

    Doench, J. G.et al. Rational design of highly active sgRNAs for CRISPR–Cas9-mediated gene inactivation. Nat. Biotechnol. 32, 1262–1267 (2014).

  24. 24.

    Wang, T., Wei, J. J., Sabatini, D. M. & Lander, E. S. Genetic screens in human cells using the CRISPR–Cas9 system. Science 343, 80–84 (2014).

  25. 25.

    Moreno-Mateos, M. A. et al. CRISPRscan: designing highly efficient sgRNAs for CRISPR–Cas9 targeting in vivo. Nat. Methods 12, 982–988 (2015).

  26. 26.

    Housden, B. E. et al. Identification of potential drug targets for tuberous sclerosis complex by synthetic screens combining CRISPR-based knockouts with RNAi. Sci. Signal. 8, rs9 (2015).

  27. 27.

    Kim, D. et al. Genome-wide analysis reveals specificities of Cpf1 endonucleases in human cells. Nat. Biotechnol. 34, 863–868 (2016).

  28. 28.

    Kleinstiver, B. P. et al. Genome-wide specificities of CRISPR–Cas Cpf1 nucleases in human cells. Nat. Biotechnol. 34, 869–874 (2016).

  29. 29.

    Lin, Y. et al. CRISPR/Cas9 systems have off-target activity with insertions or deletions between target DNA and guide RNA sequences. Nucleic Acids Res. 42, 7473–7485 (2014).

  30. 30.

    Kleinstiver, B. P. et al. High-fidelity CRISPR–Cas9 nucleases with no detectable genome-wide off-target effects. Nature 529, 490–495 (2016).

  31. 31.

    Aguirre, A. J. et al. Genomic copy number dictates a gene-independent cell response to CRISPR/Cas9 targeting. Cancer Discov. 6, 914–929 (2016).

  32. 32.

    Munoz, D. M. et al. CRISPR screens provide a comprehensive assessment of cancer vulnerabilities but generate false-positive hits for highly amplified genomic regions. Cancer Discov. 6, 900–913 (2016).

  33. 33.

    Morgens, D. W. et al. Genome-scale measurement of off-target activity using Cas9 toxicity in high-throughput screens. Nat. Commun. 8, 15178 (2017).

  34. 34.

    Yates, A. et al. Ensembl 2016. Nucleic Acids Res. 44, D710–D716 (2016).

  35. 35.

    Lee, C. M., Davis, T. H. & Bao, G. Examination of CRISPR/Cas9 design tools and the effect of target site accessibility on Cas9 activity. Exp. Physiol. https://doi.org/10.1113/EP086043 (2017).

  36. 36.

    Horlbeck, M. A. et al. Nucleosomes impede Cas9 access to DNA in vivo and in vitro. eLife 5, e12677 (2016).

  37. 37.

    Box, G. E. P. & Cox, D. R. An analysis of transformations. J. R. Stat. Soc. Ser. B Methodol. 26, 211–252 (1964).

  38. 38.

    Reyon, D.et al. FLASH assembly of TALENs for high-throughput genome editing. Nat. Biotechnol. 30, 460–465 (2012).

  39. 39.

    Tsai, S. Q., Topkar, V. V., Joung, J. K. & Aryee, M. J. Open-source guideseq software for analysis of GUIDE-seq data. Nat. Biotechnol. 34, 483 (2016).

  40. 40.

    Russell, S. & Norvig, P. Artificial Intelligence: A Modern Approach: International Edition 3rd edn (Pearson, New Jersey, 2010).

  41. 41.

    Frank, E., Trigg, L., Holmes, G. & Witten, I. H. Naive Bayes for regression. Mach. Learn. 41, 5–25 (2000).

  42. 42.

    Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comp. Syst. Sci. 55, 119–139 (1997).

  43. 43.

    Bishop, C. M. Pattern Recognition and Machine Learning (Springer, New York, 2007).

  44. 44.

    Wolpert, D. H. Stacked generalization. Neural Netw. 5, 241–259 (1992).

  45. 45.

    Baeza-Yates, R. A. & Perleberg, C. H. Fast and practical approximate string matching. Inf. Process. Lett. 59, 21–27 (1996).

  46. 46.

    Hoffman, M. M. et al. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res. 41, 827–841 (2013).

Download references

Acknowledgements

We thank A. Annavajhala for Azure cloud support, C. Kadie for use and support of his HPC cluster code, J. Jernigan, O. Losinets and the HPC team for cluster support, M. Hegde for help with the data, J. Lopez and M. Aryee for assistance with GUIDE-seq data analysis, M. Haeussler for help accessing the data from his paper, and J.-P. Concordet for feedback on the manuscript. M.W. is supported by a UCLA Collaboratory Fellowship. This work used computational and storage services associated with the Hoffman2 Shared Cluster provided by the UCLA Institute for Digital Research and Education’s Research Technology Group, and also an Azure-for-Research grant to UCLA. We acknowledge the ENCODE Consortium, the UW ENCODE group for generating these data, and UCSC for processing these data and making them available for download.

Author information

Author notes

  1. Jennifer Listgarten, Michael Weinstein, John G. Doench and Nicolo Fusi contributed equally to this work.

Affiliations

  1. Microsoft Research, Cambridge, MA, USA

    • Jennifer Listgarten
    • , Jake Crawford
    • , Kevin Gao
    • , Luong Hoang
    • , Melih Elibol
    •  & Nicolo Fusi
  2. Molecular, Cell, and Developmental Biology, and Quantitative and Computational Biosciences Institute, University of California Los Angeles, Los Angeles, CA, USA

    • Michael Weinstein
  3. Zymo Research, Irvine, CA, USA

    • Michael Weinstein
  4. Molecular Pathology Unit & Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA, USA

    • Benjamin P. Kleinstiver
    • , Alexander A. Sousa
    •  & J. Keith Joung
  5. Center for Computational and Integrative Biology, Massachusetts General Hospital, Charlestown, MA, USA

    • Benjamin P. Kleinstiver
    • , Alexander A. Sousa
    •  & J. Keith Joung
  6. Department of Pathology, Harvard Medical School, Boston, MA, USA

    • Benjamin P. Kleinstiver
    •  & J. Keith Joung
  7. Broad Institute of MIT and Harvard, Cambridge, MA, USA

    • John G. Doench

Authors

  1. Search for Jennifer Listgarten in:

  2. Search for Michael Weinstein in:

  3. Search for Benjamin P. Kleinstiver in:

  4. Search for Alexander A. Sousa in:

  5. Search for J. Keith Joung in:

  6. Search for Jake Crawford in:

  7. Search for Kevin Gao in:

  8. Search for Luong Hoang in:

  9. Search for Melih Elibol in:

  10. Search for John G. Doench in:

  11. Search for Nicolo Fusi in:

Contributions

J.L. and N.F. designed, implemented and evaluated the machine learning and statistical methods (Elevation-score and Elevation-aggregate). M.W. designed and implemented the Elevation-search infrastructure, also known as dsNickFury. J.G.D. provided biological expertise. B.P.K., J.K.J., J.L., N.F. and J.G.D. selected validation gRNAs. B.P.K., A.A.S. and J.K.J. assayed the validation gRNAs for off-target activity. J.L., N.F., M.W. and J.G.D. designed the web interface. L.H. and K.G. created the front-end webpage for the cloud service. M.E. and J.C. helped run the experiments and populated the cloud server. J.L., M.W., J.G.D., N.F., B.P.K. and J.K.J. wrote the paper.

Competing interests

J.L., L.H., M.E., J.C. and N.F. performed research related to this manuscript while employed by Microsoft. J.K.J. has financial interests in Beam Therapeutics, Editas Medicine, Monitor Biotechnologies, Pairwise Plants, Poseida Therapeutics and Transposagen Biopharmaceuticals. J.K.J.’s interests were reviewed and are managed by Massachusetts General Hospital and Partners HealthCare in accordance with their conflict of interest policies.

Corresponding authors

Correspondence to Jennifer Listgarten or Michael Weinstein or John G. Doench or Nicolo Fusi.

Supplementary information

  1. Supplementary Information

    Supplementary tables and figures.

  2. Life Sciences Reporting Summary

  3. Supplementary Table 1

    GUIDE-seq details for validation dataset 2.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/s41551-017-0178-6

Rights and permissions

To obtain permission to re-use content from this article visit RightsLink.