Selene: a PyTorch-based deep learning library for sequence data

Chen, Kathleen M.; Cofer, Evan M.; Zhou, Jian; Troyanskaya, Olga G.

doi:10.1038/s41592-019-0360-8

Brief Communication
Published: 28 March 2019

Selene: a PyTorch-based deep learning library for sequence data

Nature Methods volume 16, pages 315–318 (2019)Cite this article

13k Accesses
73 Citations
90 Altmetric
Metrics details

Subjects

Abstract

To enable the application of deep learning in biology, we present Selene (https://selene.flatironinstitute.org/), a PyTorch-based deep learning library for fast and easy development, training, and application of deep learning model architectures for any biological sequence data. We demonstrate on DNA sequences how Selene allows researchers to easily train a published architecture on new data, develop and evaluate a new architecture, and use a trained model to answer biological questions of interest.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Visualizations generated by using Selene to train and apply a model to sequences.**

**Fig. 3: Using Selene to train a model and obtain model predictions for variants in an Alzheimer’s GWAS study.**

Current progress and open challenges for applying deep learning across the biosciences

Article Open access 01 April 2022

A guide to machine learning for biologists

Article 13 September 2021

Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter

Article Open access 11 May 2022

Code availability

Selene is open-source software (license BSD 3-Clause Clear). Project homepage: https://selene.flatironinstitute.org. GitHub: https://github.com/FunctionLab/selene. Archived version: https://github.com/FunctionLab/selene/archive/0.2.0.tar.gz.

Data availability

Cistrome¹⁴, Cistrome file ID 33545, measurements from GSM970258: http://dc2.cistrome.org/api/downloads/eyJpZCI6IjMzNTQ1In0%3A1fujCu%3ArNvWLCNoET6o9SdkL8fEv13uRu4b/. ENCODE²¹ and Roadmap Epigenomics²² chromatin profiles: files listed in Supplementary Table 1 of ref. ⁴. IGAP age at onset survival^16,17: https://www.niagads.org/datasets/ng00058 (P-values-only file). The case studies used processed datasets from these sources. They can be downloaded at the following Zenodo links: Cistrome, https://zenodo.org/record/2214130/files/data.tar.gz; ENCODE and Roadmap Epigenomics chromatin profiles, https://zenodo.org/record/2214970/files/chromatin_profiles.tar.gz; IGAP age at onset survival, https://zenodo.org/record/1445556/files/variant_effect_prediction_data.tar.gz. Source data for Figs. 2 and 3 are available online.

References

LeCun, Y., Bengio, Y. & Hinton, G. Nature 521, 436–444 (2015).
Article CAS Google Scholar
Ching, T. et al. J. R. Soc. Interface. 15, 20170387 (2018).
Article Google Scholar
Segler, M. H. S., Preuss, M. & Waller, M. P. Nature 555, 604–610 (2018).
Article CAS Google Scholar
Zhou, J. & Troyanskaya, O. G. Nat. Meth. 12, 931–934 (2015).
Article CAS Google Scholar
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Nat. Biotechnol. 33, 831–838 (2015).
Article CAS Google Scholar
Kelley, D. R., Snoek, J. & Rinn, J. L. Genome Res. 26, 990–999 (2016).
Article CAS Google Scholar
Angermueller, C., Lee, H. J., Reik, W. & Stegle, O. Genome. Biol. 18, 67 (2017).
Article Google Scholar
Kelley, D. R. et al. Genome Res. 28, 739–750 (2018).
Article CAS Google Scholar
Quang, D. & Xie, X. Nucleic Acids Res. 44, e107 (2016).
Article Google Scholar
Sundaram, L. et al. Nat. Genet. 50, 1161–1170 (2018).
Article CAS Google Scholar
Min, S., Lee, B. & Yoon, S. Brief. Bioinform. 18, 851–869 (2017).
PubMed Google Scholar
Budach, S. & Marsico, A. Bioinformatics 34, 3035–3037 (2018).
Article Google Scholar
Avsec, Z. et al. bioRxiv Preprint at https://www.biorxiv.org/content/10.1101/375345v1 (2018).
Mei, S. et al. Nucleic Acids Res. 45, D658–D662 (2017).
Article CAS Google Scholar
Troyanskaya, O. G. et al. Selene CLI operations and outputs. Selene https://selene.flatironinstitute.org/overview/cli.html (2018).
Ruiz, A. et al. Transl. Psychiatry 4, e358 (2014).
Article CAS Google Scholar
Huang, K.-L. et al. Nat. Neurosci. 20, 1052–1061 (2017).
Article CAS Google Scholar
Li, H. et al. Bioinformatics 25, 2078–2079 (2009).
Article Google Scholar
Li, H. Bioinformatics 27, 718–719 (2011).
Article Google Scholar
ENCODE Project. Reference sequences. ENCODE: Encyclopedia of DNA Elements https://www.encodeproject.org/data-standards/reference-sequences/ (2016).
ENCODE Project Consortium. Nature 489, 57–74 (2012).
Article Google Scholar
Kundaje, A. et al. Nature 518, 317–330 (2015).
Article CAS Google Scholar

Download references

Acknowledgements

The authors acknowledge all members of the Troyanskaya lab for helpful discussions. In addition, the authors thank D. Simon for setting up the website and automating updates to the site. The authors are pleased to acknowledge that this work was performed using the high-performance computing resources at Simons Foundation and the TIGRESS computer center at Princeton University. This work was supported by NIH grants R01HG005998, U54HL117798, R01GM071966, and T32HG003284; HHS grant HHSN272201000054C; and Simons Foundation grant 395506, all to O.G.T. O.G.T. is a CIFAR fellow.

Author information

These authors contributed equally: Kathleen M. Chen, Evan M. Cofer.

Authors and Affiliations

Flatiron Institute, Simons Foundation, New York, NY, USA
Kathleen M. Chen, Jian Zhou & Olga G. Troyanskaya
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
Evan M. Cofer, Jian Zhou & Olga G. Troyanskaya
Graduate Program in Quantitative and Computational Biology, Princeton University, Princeton, NJ, USA
Evan M. Cofer
Department of Computer Science, Princeton University, Princeton, NJ, USA
Olga G. Troyanskaya

Authors

Kathleen M. Chen
View author publications
You can also search for this author in PubMed Google Scholar
Evan M. Cofer
View author publications
You can also search for this author in PubMed Google Scholar
Jian Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Olga G. Troyanskaya
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

K.M.C and J.Z. conceived the Selene library. K.M.C. and E.M.C. designed, implemented, and documented Selene. K.M.C. performed the analyses described in the manuscript. O.G.T. supervised the project. K.M.C., E.M.C., and O.G.T wrote the manuscript.

Corresponding author

Correspondence to Olga G. Troyanskaya.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Reporting Summary

Source data

Source Data Fig. 2

Source Data Fig. 3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, K.M., Cofer, E.M., Zhou, J. et al. Selene: a PyTorch-based deep learning library for sequence data. Nat Methods 16, 315–318 (2019). https://doi.org/10.1038/s41592-019-0360-8

Download citation

Received: 08 October 2018
Accepted: 20 February 2019
Published: 28 March 2019
Issue Date: April 2019
DOI: https://doi.org/10.1038/s41592-019-0360-8

This article is cited by

Fundamentals for predicting transcriptional regulations from DNA sequence patterns
- Masaru Koido
- Kohei Tomizuka
- Chikashi Terao
Journal of Human Genetics (2024)
Predicting the impact of sequence motifs on gene regulation using single-cell data
- Jacob Hepkema
- Nicholas Keone Lee
- Martin Hemberg
Genome Biology (2023)
Predictive analyses of regulatory sequences with EUGENe
- Adam Klie
- David Laub
- Hannah Carter
Nature Computational Science (2023)
From macro to micro: rethinking multi-scale pedestrian detection
- Yuzhe He
- Ning He
- Kang Yan
Multimedia Systems (2023)
Histone-Net: a multi-paradigm computational framework for histone occupancy and modification prediction
- Muhammad Nabeel Asim
- Muhammad Ali Ibrahim
- Sheraz Ahmed
Complex & Intelligent Systems (2023)

Selene: a PyTorch-based deep learning library for sequence data

Subjects

Abstract

Access options

Similar content being viewed by others

Current progress and open challenges for applying deep learning across the biosciences

A guide to machine learning for biologists

Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter

Code availability

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Reporting Summary

Source data

Source Data Fig. 2

Source Data Fig. 3

Rights and permissions

About this article

Cite this article

This article is cited by

Fundamentals for predicting transcriptional regulations from DNA sequence patterns

Predicting the impact of sequence motifs on gene regulation using single-cell data

Predictive analyses of regulatory sequences with EUGENe

From macro to micro: rethinking multi-scale pedestrian detection

Histone-Net: a multi-paradigm computational framework for histone occupancy and modification prediction

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Code availability

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links