Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity

Abstract

We present two algorithms to predict the activity of AsCpf1 guide RNAs. Indel frequencies for 15,000 target sequences were used in a deep-learning framework based on a convolutional neural network to train Seq-deepCpf1. We then incorporated chromatin accessibility information to create the better-performing DeepCpf1 algorithm for cell lines for which such information is available and show that both algorithms outperform previous machine learning algorithms on our own and published data sets.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Deep learning outperforms conventional machine learning for the task of predicting Cpf1 activity based on the target sequence composition.
Figure 2: Consideration of chromatin accessibility significantly improves the prediction of Cpf1 activities at endogenous target sites.

Accession codes

Primary accessions

Sequence Read Archive

References

  1. 1

    Zetsche, B. et al. Cell 163, 759–771 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. 2

    Zetsche, B. et al. Nat. Biotechnol. 35, 31–34 (2017).

    CAS  Article  Google Scholar 

  3. 3

    Hur, J.K. et al. Nat. Biotechnol. 34, 807–808 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  4. 4

    Kim, Y. et al. Nat. Biotechnol. 34, 808–810 (2016).

    CAS  PubMed  Article  Google Scholar 

  5. 5

    Xu, R. et al. Plant Biotechnol. J. 15, 713–717 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6

    Kim, D. et al. Nat. Biotechnol. 34, 863–868 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  7. 7

    Kleinstiver, B.P. et al. Nat. Biotechnol. 34, 869–874 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8

    Kim, H.K. et al. Nat. Methods 14, 153–159 (2017).

    CAS  Article  Google Scholar 

  9. 9

    Doench, J.G. et al. Nat. Biotechnol. 34, 184–191 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  10. 10

    Lee, C.M., Davis, T.H. & Bao, G. Exp. Physiol. doi:10.1113/EP086043 (2017).

  11. 11

    Encode Project Consortium. Nature 489, 57–74 (2012).

  12. 12

    Chari, R., Yeo, N.C., Chavez, A. & Church, G.M. ACS Synth. Biol. 6, 902–904 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13

    Haeussler, M. et al. Genome Biol. 17, 148 (2016).

    PubMed  PubMed Central  Article  Google Scholar 

  14. 14

    Yamano, T. et al. Cell 165, 949–962 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. 15

    Langmead, B., Trapnell, C., Pop, M. & Salzberg, S.L. Genome Biol. 10, R25 (2009).

    PubMed  PubMed Central  Article  Google Scholar 

  16. 16

    LeCun, Y., Bengio, Y. & Hinton, G. Nature 521, 436–444 (2015).

    CAS  Article  Google Scholar 

  17. 17

    Goodfellow, I., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).

  18. 18

    Min, S., Lee, B. & Yoon, S. Brief. Bioinform. 18, 851–869 (2017).

    PubMed  Google Scholar 

  19. 19

    Alipanahi, B., Delong, A., Weirauch, M.T. & Frey, B.J. Nat. Biotechnol. 33, 831–838 (2015).

    CAS  Article  Google Scholar 

  20. 20

    Kelley, D.R., Snoek, J. & Rinn, J.L. Genome Res. 26, 990–999 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  21. 21

    Doench, J.G. et al. Nat. Biotechnol. 32, 1262–1267 (2014).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  22. 22

    Wang, T., Wei, J.J., Sabatini, D.M. & Lander, E.S. Science 343, 80–84 (2014).

    CAS  Article  Google Scholar 

  23. 23

    Chari, R., Mali, P., Moosburner, M. & Church, G.M. Nat. Methods 12, 823–826 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  24. 24

    Moreno-Mateos, M.A. et al. Nat. Methods 12, 982–988 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  25. 25

    Xu, H. et al. Genome Res. 25, 1147–1157 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  26. 26

    Wong, N., Liu, W. & Wang, X. Genome Biol. 16, 218 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  27. 27

    Bergstra, J. et al. in. Proc. 9th Python Sci. Conf. 3–10 (2010).

  28. 28

    Kingma, D.P. & Ba, J. Preprint at https://arxiv.org/abs/1412.6980 (2014).

  29. 29

    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    Google Scholar 

Download references

Acknowledgements

The authors thank E.-S. Lee for proofreading and R. Gopalappa, N. Kim, S. Park, and J. Park for assisting in sample preparation. This work was supported in part by the National Research Foundation of Korea (grants 2017R1A2B3004198 (H.K.), 2017M3A9B4062403 (H.K.), 2013M3A9B4076544 (H.K.), 2014M3C9A3063541 (S.Y.)), Brain Korea 21 Plus Project (Yonsei University College of Medicine), Brain Korea 21 Plus Project (SNU ECE) in 2017, Institute for Basic Science (IBS; IBS-R026-D1), and the Korean Health Technology R&D Project, Ministry of Health and Welfare, Republic of Korea (grants HI17C0676 (H.K.), and HI16C1012 (H.K.)).

Author information

Affiliations

Authors

Contributions

H.K.K., M.S., and S.J. performed experiments to build data sets of AsCpf1 indel frequencies. S.M. and S.Y. developed the framework, and carried out the model training and computational validation. J.W.C. performed bioinformatic analyses. Y.K. and S.L. made substantial contributions to the performance of the experiments including cell culture and deep-sequencing. H.H.K. conceived and designed the study. H.K.K., S.M., S.Y., and H.H.K. analyzed the data and wrote the manuscript.

Corresponding authors

Correspondence to Sungroh Yoon or Hyongbum (Henry) Kim.

Ethics declarations

Competing interests

Yonsei University and Seoul National University have filed a patent based on this work, in which H.K.K., S.M., M.S., S.J., S.Y., and H.K. are co-inventors.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14 and Supplementary Note (PDF 2816 kb)

Life Sciences Reporting Summary (PDF 130 kb)

Supplementary Tables

All tables that are included together, Supplementary tables 2, 4, and 6 (PDF 521 kb)

Supplementary Table 1

Source data used for this study. (XLSX 2463 kb)

Supplementary Table 3

Model selection results of Seq-deepCpf1 (XLSX 19 kb)

Supplementary Table 5

Oligonucleotides used in this study (XLSX 40 kb)

Supplementary Table 7

Confidence intervals for the result values (XLSX 15 kb)

Supplementary Code

The source code of Seq-deepCpf1 and DeepCpf1 (ZIP 750 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kim, H., Min, S., Song, M. et al. Deep learning improves prediction of CRISPR–Cpf1 guide RNA activity. Nat Biotechnol 36, 239–241 (2018). https://doi.org/10.1038/nbt.4061

Download citation

Further reading

Search

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing