Orchestrating high-throughput genomic analysis with Bioconductor

Huber, Wolfgang; Carey, Vincent J; Gentleman, Robert; Anders, Simon; Carlson, Marc; Carvalho, Benilton S; Bravo, Hector Corrada; Davis, Sean; Gatto, Laurent; Girke, Thomas; Gottardo, Raphael; Hahne, Florian; Hansen, Kasper D; Irizarry, Rafael A; Lawrence, Michael; Love, Michael I; MacDonald, James; Obenchain, Valerie; Oleś, Andrzej K; Pagès, Hervé; Reyes, Alejandro; Shannon, Paul; Smyth, Gordon K; Tenenbaum, Dan; Waldron, Levi; Morgan, Martin

doi:10.1038/nmeth.3252

Perspective
Published: 29 January 2015

Orchestrating high-throughput genomic analysis with Bioconductor

Wolfgang Huber ORCID: orcid.org/0000-0002-0474-2218¹,
Vincent J Carey^2,3,
Robert Gentleman⁴,
Simon Anders ORCID: orcid.org/0000-0003-4868-1805¹,
Marc Carlson⁵,
Benilton S Carvalho ORCID: orcid.org/0000-0001-5122-5646⁶,
Hector Corrada Bravo⁷,
Sean Davis ORCID: orcid.org/0000-0002-8991-6458⁸,
Laurent Gatto ORCID: orcid.org/0000-0002-1520-2268⁹,
Thomas Girke¹⁰,
Raphael Gottardo¹¹,
Florian Hahne¹²,
Kasper D Hansen^13,14,
Rafael A Irizarry^3,15,
Michael Lawrence⁴,
Michael I Love ORCID: orcid.org/0000-0001-8401-0545^3,15,
James MacDonald¹⁶,
Valerie Obenchain⁵,
Andrzej K Oleś¹,
Hervé Pagès⁵,
Alejandro Reyes¹,
Paul Shannon⁵,
Gordon K Smyth^17,18,
Dan Tenenbaum⁵,
Levi Waldron¹⁹ &
…
Martin Morgan ORCID: orcid.org/0000-0002-5874-8148⁵

Nature Methods volume 12, pages 115–121 (2015)Cite this article

33k Accesses
1938 Citations
167 Altmetric
Metrics details

Subjects

Computational platforms and environments

Abstract

Bioconductor is an open-source, open-development software project for the analysis and comprehension of high-throughput data in genomics and molecular biology. The project aims to enable interdisciplinary research, collaboration and rapid development of scientific software. Based on the statistical programming language R, Bioconductor comprises 934 interoperable packages contributed by a large, diverse community of scientists. Packages cover a range of bioinformatic and statistical applications. They undergo formal initial review and continuous automated testing. We present an overview for prospective users and contributors.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Example uses of the Ranges algebra.**

**Figure 2: The integrative data container *SummarizedExperiment*.**

**Figure 3: Visualization along genomic coordinates with *ggbio*.**

Inferring gene regulatory networks from single-cell multiome data using atlas-scale external data

Article Open access 12 April 2024

Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis

Article Open access 25 March 2024

scGPT: toward building a foundation model for single-cell multi-omics using generative AI

Article 26 February 2024

References

Gentleman, R.C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Article Google Scholar
R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).
Hahne, F., Huber, W., Gentleman, R. & Falcon, S. Bioconductor Case Studies (Springer, 2008).
Book Google Scholar
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Article CAS Google Scholar
The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Ohnishi, Y. et al. Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Nat. Cell Biol. 16, 27–37 (2014).
Article CAS Google Scholar
Finak, G. et al. OpenCyto: an open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis. PLoS Comput. Biol. 10, e1003806 (2014).
Article Google Scholar
Chelaru, F., Smith, L., Goldstein, N. & Corrada Bravo, H. Epiviz: interactive visual analytics for functional genomics data. Nat. Methods 11, 938–940 (2014).
Article CAS Google Scholar
Gentleman, R. Reproducible research: a bioinformatics case study. Stat. Appl. Genet. Mol. Biol. 4, Article 2 (2005).
Article Google Scholar
Anders, S. & Huber, W. Differential expression analysis for sequence count data. Genome Biol. 11, R106 (2010).
Article CAS Google Scholar
Laufer, C., Fischer, B., Billmann, M., Huber, W. & Boutros, M. Mapping genetic interactions in human cancer cells with RNAi and multiparametric phenotyping. Nat. Methods 10, 427–431 (2013).
Article CAS Google Scholar
Waldron, L. et al. Comparative meta-analysis of prognostic gene signatures for late-stage ovarian cancer. J. Natl. Cancer Inst. 106, dju049 (2014).
PubMed PubMed Central Google Scholar
Riester, M. et al. Risk prediction for late-stage ovarian cancer by meta-analysis of 1525 patient samples. J. Natl. Cancer Inst. 106, dju048 (2014).
Article Google Scholar
McMurdie, P.J. & Holmes, S. Waste not, want not: why rarefying microbiome data is inadmissible. PLoS Comput. Biol. 10, e1003531 (2014).
Article Google Scholar
Goecks, J., Nekrutenko, A., Taylor, J. & The Galaxy Team. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11, R86 (2010).
Article Google Scholar
Pérez, F. & Granger, B.E. IPython: a system for interactive scientific computing. Comput. Sci. Eng. 9, 21–29 (2007).
Article Google Scholar
Anonymous. Credit for code. Nat. Genet. 46, 1 (2014).
Altschul, S. et al. The anatomy of successful computational biology software. Nat. Biotechnol. 31, 894–897 (2013).
Article CAS Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article Google Scholar
Lawrence, M. & Morgan, M. Scalable genomics with R and Bioconductor. Stat. Sci. 29, 214–226 (2014).
Article Google Scholar
Brazma, A. et al. Minimum information about a microarray experiment (MIAME) - toward standards for microarray data. Nat. Genet. 29, 365–371 (2001).
Article CAS Google Scholar
Cabezas-Wallscheid, N. et al. Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. Cell Stem Cell 15, 507–522 (2014).
Article CAS Google Scholar
Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008–2017 (2012).
Article CAS Google Scholar
Obenchain, V. et al. VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants. Bioinformatics 30, 2076 (2014).
Article CAS Google Scholar

Download references

Acknowledgements

We thank all contributors to the Bioconductor and R projects. Bioconductor is supported by the National Human Genome Research Institute of the US National Institutes of Health (U41HG004059 to M.M.). Additional support is from the US National Science Foundation (1247813 to M.M.) and the European Commission FP7 project RADIANT (to W.H.). A. Bruce provided graphics support for Figure 2.

Author information

Authors and Affiliations

European Molecular Biology Laboratory, Heidelberg, Germany
Wolfgang Huber, Simon Anders, Andrzej K Oleś & Alejandro Reyes
Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA
Vincent J Carey
Harvard School of Public Health, Boston, Massachusetts, USA
Vincent J Carey, Rafael A Irizarry & Michael I Love
Genentech, South San Francisco, California, USA
Robert Gentleman & Michael Lawrence
Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
Marc Carlson, Valerie Obenchain, Hervé Pagès, Paul Shannon, Dan Tenenbaum & Martin Morgan
Department of Medical Genetics, School of Medical Sciences, State University of Campinas, Campinas, Brazil
Benilton S Carvalho
Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA
Hector Corrada Bravo
Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
Sean Davis
Department of Biochemistry, University of Cambridge, Cambridge, UK
Laurent Gatto
Institute for Integrative Genome Biology, University of California, Riverside, Riverside, California, USA
Thomas Girke
Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
Raphael Gottardo
Novartis Institutes for Biomedical Research, Basel, Switzerland
Florian Hahne
McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland, USA
Kasper D Hansen
Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
Kasper D Hansen
Dana-Farber Cancer Institute, Boston, Massachusetts, USA
Rafael A Irizarry & Michael I Love
Department of Environmental and Occupational Health Sciences, University of Washington, Seattle, Washington, USA
James MacDonald
Walter and Eliza Hall Institute of Medical Research, Parkville, Victoria, Australia
Gordon K Smyth
Department of Mathematics and Statistics, University of Melbourne, Parkville, Victoria, Australia
Gordon K Smyth
School of Urban Public Health at Hunter College, City University of New York, New York, New York, USA
Levi Waldron

Authors

Wolfgang Huber
View author publications
You can also search for this author in PubMed Google Scholar
Vincent J Carey
View author publications
You can also search for this author in PubMed Google Scholar
Robert Gentleman
View author publications
You can also search for this author in PubMed Google Scholar
Simon Anders
View author publications
You can also search for this author in PubMed Google Scholar
Marc Carlson
View author publications
You can also search for this author in PubMed Google Scholar
Benilton S Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Hector Corrada Bravo
View author publications
You can also search for this author in PubMed Google Scholar
Sean Davis
View author publications
You can also search for this author in PubMed Google Scholar
Laurent Gatto
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Girke
View author publications
You can also search for this author in PubMed Google Scholar
Raphael Gottardo
View author publications
You can also search for this author in PubMed Google Scholar
Florian Hahne
View author publications
You can also search for this author in PubMed Google Scholar
Kasper D Hansen
View author publications
You can also search for this author in PubMed Google Scholar
Rafael A Irizarry
View author publications
You can also search for this author in PubMed Google Scholar
Michael Lawrence
View author publications
You can also search for this author in PubMed Google Scholar
Michael I Love
View author publications
You can also search for this author in PubMed Google Scholar
James MacDonald
View author publications
You can also search for this author in PubMed Google Scholar
Valerie Obenchain
View author publications
You can also search for this author in PubMed Google Scholar
Andrzej K Oleś
View author publications
You can also search for this author in PubMed Google Scholar
Hervé Pagès
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Reyes
View author publications
You can also search for this author in PubMed Google Scholar
Paul Shannon
View author publications
You can also search for this author in PubMed Google Scholar
Gordon K Smyth
View author publications
You can also search for this author in PubMed Google Scholar
Dan Tenenbaum
View author publications
You can also search for this author in PubMed Google Scholar
Levi Waldron
View author publications
You can also search for this author in PubMed Google Scholar
Martin Morgan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wolfgang Huber.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huber, W., Carey, V., Gentleman, R. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods 12, 115–121 (2015). https://doi.org/10.1038/nmeth.3252

Download citation

Received: 30 July 2014
Accepted: 09 December 2014
Published: 29 January 2015
Issue Date: February 2015
DOI: https://doi.org/10.1038/nmeth.3252

This article is cited by

CytoPipeline and CytoPipelineGUI: a Bioconductor R package suite for building and visualizing automated pre-processing pipelines for flow cytometry data
- Philippe Hauchamps
- Babak Bayat
- Laurent Gatto
BMC Bioinformatics (2024)
Comparing microbiotas of foals and their mares’ milk in the first two weeks after birth
- Michael J. Mienaltowski
- Mitchell Callahan
- Elizabeth A. Maga
BMC Veterinary Research (2024)
ReUseData: an R/Bioconductor tool for reusable and reproducible genomic data management
- Qian Liu
- Qiang Hu
- Martin Morgan
BMC Bioinformatics (2024)
Modeling methyl-sensitive transcription factor motifs with an expanded epigenetic alphabet
- Coby Viner
- Charles A. Ishak
- Michael M. Hoffman
Genome Biology (2024)
Orchestrating chromosome conformation capture analysis with Bioconductor
- Jacques Serizay
- Cyril Matthey-Doret
- Romain Koszul
Nature Communications (2024)