Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Orchestrating high-throughput genomic analysis with Bioconductor

Abstract

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Example uses of the Ranges algebra.
Figure 2: The integrative data container SummarizedExperiment.
Figure 3: Visualization along genomic coordinates with ggbio.

References

  1. Gentleman, R.C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).

    Article  Google Scholar 

  2. R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).

  3. Hahne, F., Huber, W., Gentleman, R. & Falcon, S. Bioconductor Case Studies (Springer, 2008).

    Book  Google Scholar 

  4. Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).

    Article  CAS  Google Scholar 

  5. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  6. Ohnishi, Y. et al. Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nat. Cell Biol. 16, 27–37 (2014).

    Article  CAS  Google Scholar 

  7. Finak, G. et al. OpenCyto: an open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis. PLoS Comput. Biol. 10, e1003806 (2014).

    Article  Google Scholar 

  8. Chelaru, F., Smith, L., Goldstein, N. & Corrada Bravo, H. Epiviz: interactive visual analytics for functional genomics data. Nat. Methods 11, 938–940 (2014).

    Article  CAS  Google Scholar 

  9. Gentleman, R. Reproducible research: a bioinformatics case study. Stat. Appl. Genet. Mol. Biol. 4, Article 2 (2005).

    Article  Google Scholar 

  10. Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).

    Article  CAS  Google Scholar 

  11. Laufer, C., Fischer, B., Billmann, M., Huber, W. & Boutros, M. Mapping genetic interactions in human cancer cells with RNAi and multiparametric phenotyping. Nat. Methods 10, 427–431 (2013).

    Article  CAS  Google Scholar 

  12. Waldron, L. et al. Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. J. Natl. Cancer Inst. 106, dju049 (2014).

    PubMed  PubMed Central  Google Scholar 

  13. Riester, M. et al. Risk prediction for late-stage ovarian cancer by meta-analysis of 1525 patient samples. J. Natl. Cancer Inst. 106, dju048 (2014).

    Article  Google Scholar 

  14. McMurdie, P.J. & Holmes, S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput. Biol. 10, e1003531 (2014).

    Article  Google Scholar 

  15. Goecks, J., Nekrutenko, A., Taylor, J. & The Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).

    Article  Google Scholar 

  16. Pérez, F. & Granger, B.E. IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9, 21–29 (2007).

    Article  Google Scholar 

  17. Anonymous. Credit for code. Nat. Genet. 46, 1 (2014).

  18. Altschul, S. et al. The anatomy of successful computational biology software. Nat. Biotechnol. 31, 894–897 (2013).

    Article  CAS  Google Scholar 

  19. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    Article  Google Scholar 

  20. Lawrence, M. & Morgan, M. Scalable genomics with R and Bioconductor. Stat. Sci. 29, 214–226 (2014).

    Article  Google Scholar 

  21. Brazma, A. et al. Minimum information about a microarray experiment (MIAME) - toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).

    Article  CAS  Google Scholar 

  22. Cabezas-Wallscheid, N. et al. Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. Cell Stem Cell 15, 507–522 (2014).

    Article  CAS  Google Scholar 

  23. Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).

    Article  CAS  Google Scholar 

  24. Obenchain, V. et al. VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants. Bioinformatics 30, 2076 (2014).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank all contributors to the Bioconductor and R projects. Bioconductor is supported by the National Human Genome Research Institute of the US National Institutes of Health (U41HG004059 to M.M.). Additional support is from the US National Science Foundation (1247813 to M.M.) and the European Commission FP7 project RADIANT (to W.H.). A. Bruce provided graphics support for Figure 2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wolfgang Huber.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huber, W., Carey, V., Gentleman, R. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12, 115–121 (2015). https://doi.org/10.1038/nmeth.3252

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3252

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing