Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Selene: a PyTorch-based deep learning library for sequence data

Abstract

To enable the application of deep learning in biology, we present Selene (https://selene.flatironinstitute.org/), a PyTorch-based deep learning library for fast and easy development, training, and application of deep learning model architectures for any biological sequence data. We demonstrate on DNA sequences how Selene allows researchers to easily train a published architecture on new data, develop and evaluate a new architecture, and use a trained model to answer biological questions of interest.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Fig. 1: Overview of Selene.
Fig. 2: Visualizations generated by using Selene to train and apply a model to sequences.
Fig. 3: Using Selene to train a model and obtain model predictions for variants in an Alzheimer’s GWAS study.

Code availability

Selene is open-source software (license BSD 3-Clause Clear). Project homepage: https://selene.flatironinstitute.org. GitHub: https://github.com/FunctionLab/selene. Archived version: https://github.com/FunctionLab/selene/archive/0.2.0.tar.gz.

Data availability

Cistrome14, Cistrome file ID 33545, measurements from GSM970258: http://dc2.cistrome.org/api/downloads/eyJpZCI6IjMzNTQ1In0%3A1fujCu%3ArNvWLCNoET6o9SdkL8fEv13uRu4b/. ENCODE21 and Roadmap Epigenomics22 chromatin profiles: files listed in Supplementary Table 1 of ref. 4. IGAP age at onset survival16,17: https://www.niagads.org/datasets/ng00058 (P-values-only file). The case studies used processed datasets from these sources. They can be downloaded at the following Zenodo links: Cistrome, https://zenodo.org/record/2214130/files/data.tar.gz; ENCODE and Roadmap Epigenomics chromatin profiles, https://zenodo.org/record/2214970/files/chromatin_profiles.tar.gz; IGAP age at onset survival, https://zenodo.org/record/1445556/files/variant_effect_prediction_data.tar.gz. Source data for Figs. 2 and 3 are available online.

References

  1. 1.

    LeCun, Y., Bengio, Y. & Hinton, G. Nature 521, 436–444 (2015).

    CAS  Article  Google Scholar 

  2. 2.

    Ching, T. et al. J. R. Soc. Interface. 15, 20170387 (2018).

    Article  Google Scholar 

  3. 3.

    Segler, M. H. S., Preuss, M. & Waller, M. P. Nature 555, 604–610 (2018).

    CAS  Article  Google Scholar 

  4. 4.

    Zhou, J. & Troyanskaya, O. G. Nat. Meth. 12, 931–934 (2015).

    CAS  Article  Google Scholar 

  5. 5.

    Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Nat. Biotechnol. 33, 831–838 (2015).

    CAS  Article  Google Scholar 

  6. 6.

    Kelley, D. R., Snoek, J. & Rinn, J. L. Genome Res. 26, 990–999 (2016).

    CAS  Article  Google Scholar 

  7. 7.

    Angermueller, C., Lee, H. J., Reik, W. & Stegle, O. Genome. Biol. 18, 67 (2017).

    Article  Google Scholar 

  8. 8.

    Kelley, D. R. et al. Genome Res. 28, 739–750 (2018).

    CAS  Article  Google Scholar 

  9. 9.

    Quang, D. & Xie, X. Nucleic Acids Res. 44, e107 (2016).

    Article  Google Scholar 

  10. 10.

    Sundaram, L. et al. Nat. Genet. 50, 1161–1170 (2018).

    CAS  Article  Google Scholar 

  11. 11.

    Min, S., Lee, B. & Yoon, S. Brief. Bioinform. 18, 851–869 (2017).

    PubMed  Google Scholar 

  12. 12.

    Budach, S. & Marsico, A. Bioinformatics 34, 3035–3037 (2018).

    Article  Google Scholar 

  13. 13.

    Avsec, Z. et al. bioRxiv Preprint at https://www.biorxiv.org/content/10.1101/375345v1 (2018).

  14. 14.

    Mei, S. et al. Nucleic Acids Res. 45, D658–D662 (2017).

    CAS  Article  Google Scholar 

  15. 15.

    Troyanskaya, O. G. et al. Selene CLI operations and outputs. Selene https://selene.flatironinstitute.org/overview/cli.html (2018).

  16. 16.

    Ruiz, A. et al. Transl. Psychiatry 4, e358 (2014).

    CAS  Article  Google Scholar 

  17. 17.

    Huang, K.-L. et al. Nat. Neurosci. 20, 1052–1061 (2017).

    CAS  Article  Google Scholar 

  18. 18.

    Li, H. et al. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  19. 19.

    Li, H. Bioinformatics 27, 718–719 (2011).

    Article  Google Scholar 

  20. 20.

    ENCODE Project. Reference sequences. ENCODE: Encyclopedia of DNA Elements https://www.encodeproject.org/data-standards/reference-sequences/ (2016).

  21. 21.

    ENCODE Project Consortium. Nature 489, 57–74 (2012).

    Article  Google Scholar 

  22. 22.

    Kundaje, A. et al. Nature 518, 317–330 (2015).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

The authors acknowledge all members of the Troyanskaya lab for helpful discussions. In addition, the authors thank D. Simon for setting up the website and automating updates to the site. The authors are pleased to acknowledge that this work was performed using the high-performance computing resources at Simons Foundation and the TIGRESS computer center at Princeton University. This work was supported by NIH grants R01HG005998, U54HL117798, R01GM071966, and T32HG003284; HHS grant HHSN272201000054C; and Simons Foundation grant 395506, all to O.G.T. O.G.T. is a CIFAR fellow.

Author information

Affiliations

Authors

Contributions

K.M.C and J.Z. conceived the Selene library. K.M.C. and E.M.C. designed, implemented, and documented Selene. K.M.C. performed the analyses described in the manuscript. O.G.T. supervised the project. K.M.C., E.M.C., and O.G.T wrote the manuscript.

Corresponding author

Correspondence to Olga G. Troyanskaya.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Chen, K.M., Cofer, E.M., Zhou, J. et al. Selene: a PyTorch-based deep learning library for sequence data. Nat Methods 16, 315–318 (2019). https://doi.org/10.1038/s41592-019-0360-8

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing