Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Wisdom of crowds for robust gene network inference

Abstract

Reconstructing gene regulatory networks from high-throughput data is a long-standing challenge. Through the Dialogue on Reverse Engineering Assessment and Methods (DREAM) project, we performed a comprehensive blind assessment of over 30 network inference methods on Escherichia coli, Staphylococcus aureus, Saccharomyces cerevisiae and in silico microarray data. We characterize the performance, data requirements and inherent biases of different inference approaches, and we provide guidelines for algorithm application and development. We observed that no single inference method performs optimally across all data sets. In contrast, integration of predictions from multiple inference methods shows robust and high performance across diverse data sets. We thereby constructed high-confidence networks for E. coli and S. aureus, each comprising 1,700 transcriptional interactions at a precision of 50%. We experimentally tested 53 previously unobserved regulatory interactions in E. coli, of which 23 (43%) were supported. Our results establish community-based methods as a powerful and robust tool for the inference of transcriptional gene regulatory networks.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: The DREAM5 network inference challenge.
Figure 2: Evaluation of network inference methods.
Figure 3: Analysis of community networks compared to individual inference methods.
Figure 4: E. coli and S. aureus community networks.

References

  1. Surowiecki, J. The Wisdom of Crowds: Why the Many are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations (Doubleday, 2004).

  2. De Smet, R. & Marchal, K. Advantages and limitations of current network inference methods. Nat. Rev. Microbiol. 8, 717–729 (2010).

    Article  CAS  Google Scholar 

  3. Marbach, D. et al. Revealing strengths and weaknesses of methods for gene network inference. Proc. Natl. Acad. Sci. USA 107, 6286–6291 (2010).

    Article  CAS  Google Scholar 

  4. Bar-Joseph, Z. et al. Computational discovery of gene modules and regulatory networks. Nat. Biotechnol. 21, 1337–1342 (2003).

    Article  CAS  Google Scholar 

  5. Reiss, D.J., Baliga, N.S. & Bonneau, R. Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 7, 280 (2006).

    Article  Google Scholar 

  6. Lemmens, K. et al. DISTILLER: a data integration framework to reveal condition dependency of complex regulons in Escherichia coli. Genome Biol. 10, R27 (2009).

    Article  Google Scholar 

  7. Marbach, D. et al. Predictive regulatory models in Drosophila melanogaster by integrative inference of transcriptional networks. Genome Res. published online (28 March 2012).

  8. Friedman, N., Linial, M., Nachman, I. & Pe'er, D. Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620 (2000).

    Article  CAS  Google Scholar 

  9. Margolin, A.A. et al. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics 7 (suppl. 1), S7 (2006).

    Article  Google Scholar 

  10. di Bernardo, D. et al. Chemogenomic profiling on a genome-wide scale using reverse-engineered gene networks. Nat. Biotechnol. 23, 377–383 (2005).

    Article  CAS  Google Scholar 

  11. Faith, J.J. et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol. 5, e8 (2007).

    Article  Google Scholar 

  12. Stolovitzky, G., Monroe, D. & Califano, A. Dialogue on reverse-engineering assessment and methods: the DREAM of high-throughput pathway inference. Ann. NY Acad. Sci. 1115, 1–22 (2007).

    Article  Google Scholar 

  13. Stolovitzky, G., Prill, R.J. & Califano, A. Lessons from the DREAM2 Challenges. Ann. NY Acad. Sci. 1158, 159–195 (2009).

    Article  CAS  Google Scholar 

  14. Prill, R.J. et al. Towards a rigorous assessment of systems biology models: the DREAM3 challenges. PLoS ONE 5, e9202 (2010).

    Article  Google Scholar 

  15. Reich, M. et al. GenePattern 2.0. Nat. Genet. 38, 500–501 (2006).

    Article  CAS  Google Scholar 

  16. Gama-Castro, S. et al. RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units). Nucleic Acids Res. 39, D98–D105 (2011).

    Article  CAS  Google Scholar 

  17. Harbison, C.T. et al. Transcriptional regulatory code of a eukaryotic genome. Nature 431, 99–104 (2004).

    Article  CAS  Google Scholar 

  18. MacIsaac, K.D. et al. An improved map of conserved regulatory sites for Saccharomyces cerevisiae. BMC Bioinformatics 7, 113 (2006).

    Article  Google Scholar 

  19. Huynh-Thu, V.A., Irrthum, A., Wehenkel, L. & Geurts, P. Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5, e12776 (2010).

    Article  Google Scholar 

  20. Küffner, R., Petri, T., Tavakkolkhah, P., Windhager, L. & Zimmer, R. Inferring Gene Regulatory Networks by ANOVA. Bioinformatics 28, 1376–1382 (2012).

    Article  Google Scholar 

  21. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B Stat. Methodol. 58, 267–288 (1996).

    Google Scholar 

  22. Mordelet, F. & Vert, J.-P. SIRENE: supervised inference of regulatory networks. Bioinformatics 24, i76–i82 (2008).

    Article  Google Scholar 

  23. Ravcheev, D.A. et al. Inference of the transcriptional regulatory network in Staphylococcus aureus by integration of experimental and genomics-based evidence. J. Bacteriol. 193, 3228–3240 (2011).

    Article  CAS  Google Scholar 

  24. Newman, M.E.J. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103, 8577–8582 (2006).

    Article  CAS  Google Scholar 

  25. Dietterich, T.G. Ensemble methods in machine learning. Multiple Classifier Systems, First International Workshop (eds. Kittler, J. & Roli, F.) 1857, 1–15 (Springer, 2000).

    Article  Google Scholar 

  26. Prinz, A.A., Bucher, D. & Marder, E. Similar network activity from disparate circuit parameters. Nat. Neurosci. 7, 1345–1352 (2004).

    Article  CAS  Google Scholar 

  27. Kuepfer, L., Peter, M., Sauer, U. & Stelling, J. Ensemble modeling for analysis of cell signaling dynamics. Nat. Biotechnol. 25, 1001–1006 (2007).

    Article  CAS  Google Scholar 

  28. Kaltenbach, H.-M., Dimopoulos, S. & Stelling, J. Systems analysis of cellular networks under uncertainty. FEBS Lett. 583, 3923–3930 (2009).

    Article  CAS  Google Scholar 

  29. Marbach, D., Mattiussi, C. & Floreano, D. Combining multiple results of a reverse-engineering algorithm: application to the DREAM five-gene network challenge. Ann. NY Acad. Sci. 1158, 102–113 (2009).

    Article  Google Scholar 

  30. Marder, E. & Taylor, A.L. Multiple models to capture the variability in biological neurons and networks. Nat. Neurosci. 14, 133–138 (2011).

    Article  CAS  Google Scholar 

  31. Moult, J. A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr. Opin. Struct. Biol. 15, 285–289 (2005).

    Article  CAS  Google Scholar 

  32. Bell, R.M. & Koren, Y. Lessons from the Netflix Prize Challenge. SIGKDD Explor. 9, 75–79 (2007).

    Article  Google Scholar 

  33. Haury, A.-C., Mordelet, F., Vera-Licona, P. & Vert, J.-P. TIGRESS: trustful inference of gene regulation using stability selection. Preprint at <http://arxiv.org/abs/1205.1181> (2012).

  34. Yuan, M. & Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Series B Stat. Methodol. 68, 49–67 (2006).

    Article  Google Scholar 

  35. Lèbre, S., Becq, J., Devaux, F., Stumpf, M.P.H. & Lelandais, G. Statistical inference of the time-varying structure of gene-regulation networks. BMC Syst. Biol. 4, 130 (2010).

    Article  Google Scholar 

  36. Meinshausen, N. & Bühlmann, P. Stability selection. J. R. Stat. Soc. Series B Stat. Methodol. 72, 417–473 (2010).

    Article  Google Scholar 

  37. van Someren, E.P. et al. Least absolute regression network analysis of the murine osteoblast differentiation network. Bioinformatics 22, 477–484 (2006).

    Article  CAS  Google Scholar 

  38. Butte, A.J. & Kohane, I.S. Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. Pac. Symp. Biocomput. 2000, 418–429 (2000).

    Google Scholar 

  39. Mani, S. & Cooper, G.F. A Bayesian local causal discovery algorithm. in Proceedings of the World Congress on Medical Informatics, MedInfo 2004 (eds. Fieschi, M. et al.) 731–735 (IOS, 2004).

  40. Tsamardinos, I., Aliferis, C.F. & Statnikov, A. Time and sample efficient discovery of Markov blankets and direct causal relations. in Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 673–678 (ACM, 2003).

  41. Aliferis, C.F., Statnikov, A., Tsamardinos, I., Mani, S. & Koutsoukos, X.D. Local causal and Markov blanket induction for causal discovery and feature selection for classification part I: algorithm and empirical evaluation. J. Mach. Learn. Res. 11, 171–234 (2010).

    Google Scholar 

  42. Statnikov, A. & Aliferis, C.F. Analysis and computational dissection of molecular signature multiplicity. PLoS Comput. Biol. 6, e1000790 (2010).

    Article  Google Scholar 

  43. Karlebach, G. & Shamir, R. Constructing logical models of gene regulatory networks by integrating transcription factor-DNA interactions with expression data: an entropy-based approach. J. Comput. Biol. 19, 30–41 (2012).

    Article  CAS  Google Scholar 

  44. Yeung, K.Y., Bumgarner, R.E. & Raftery, A.E. Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21, 2394–2402 (2005).

    Article  CAS  Google Scholar 

  45. Yip, K.Y., Alexander, R.P., Yan, K.-K. & Gerstein, M. Improved reconstruction of in silico gene regulatory networks by integrating knockout and perturbation data. PLoS ONE 5, e8121 (2010).

    Article  Google Scholar 

  46. Sîrbu, A., Ruskin, H.J. & Crane, M. Stages of gene regulatory network inference: the evolutionary algorithm role. in Evolutionary Algorithms (ed. Kita, E.) Ch. 27, 521–546 (Intech, 2011).

  47. Song, M.J. et al. Reconstructing generalized logical networks of transcriptional regulation in mouse brain from temporal gene expression data. EURASIP J. Bioinform. Syst. Biol. 2009, 545176 (2009).

    Article  Google Scholar 

  48. Greenfield, A., Madar, A., Ostrer, H. & Bonneau, R. DREAM4: Combining genetic and dynamic information to identify biological networks and dynamical models. PLoS ONE 5, e13397 (2010).

    Article  Google Scholar 

  49. Watkinson, J., Liang, K.-C., Wang, X., Zheng, T. & Anastassiou, D. Inference of regulatory gene interactions from expression data using three-way mutual information. Ann. NY Acad. Sci. 1158, 302–313 (2009).

    Article  CAS  Google Scholar 

  50. Barrett, T. et al. NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 39, D1005–D1010 (2011).

    Article  CAS  Google Scholar 

  51. Bolstad, B.M., Irizarry, R.A., Astrand, M. & Speed, T.P. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19, 185–193 (2003).

    Article  CAS  Google Scholar 

  52. Marbach, D., Schaffter, T., Mattiussi, C. & Floreano, D. Generating realistic in silico gene networks for performance assessment of reverse engineering methods. J. Comput. Biol. 16, 229–239 (2009).

    Article  CAS  Google Scholar 

  53. Schaffter, T., Marbach, D. & Floreano, D. GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods. Bioinformatics 27, 2263–2270 (2011).

    Article  CAS  Google Scholar 

  54. Hu, Z., Killion, P.J. & Iyer, V.R. Genetic reconstruction of a functional transcriptional regulatory network. Nat. Genet. 39, 683–687 (2007).

    Article  CAS  Google Scholar 

  55. Faith, J.J. et al. Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Res. 36, D866–D870 (2008).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank all challenge participants for their invaluable contribution; R. Norel and J. Saez Rodriguez, who participated in different aspects of the organization and scoring of DREAM5; and P. Carr, M. Reich, J. Mesirov and the rest of the GenePattern team for providing software and support. This work was funded by the US National Institutes of Health (NIH) National Centers for Biomedical Computing Roadmap Initiative (U54CA121852), Howard Hughes Medical Institute, NIH Director's Pioneer Award DPI OD003644 and a fellowship from the Swiss National Science Foundation to D.M. Challenge participants acknowledge: grants ANR-07-BLAN-0311-03 and ANR-09-BLAN-0051-04 from the French National Research Agency (A.-C.H., P.V.-L., F.M., J.-P.V.); the Interuniversity Attraction Poles Programme (IAP P6/25 BIOMAGNET), initiated by the Belgian State, Science Policy Office, the French Community of Belgium (ARC Biomod) and the European Network of Excellence PASCAL2 (V.A.H.-T., A.I., L.W., Y.S., P.G.); the European Community's 7th Framework Program, grant no. HEALTH-F4-2007-200767 for the APO-SYS program, and a doctoral fellowship from the Edmond J. Safra Bioinformatics Program at Tel Aviv University (G.K., R.S.); the Irish Research Council for Science Engineering and Technology for financial support under the EMBARK scheme, and the Irish Centre for High-End Computing for provision of computational facilities and technical support (A. Sîrbu, H.J.R., M.C.); the US National Cancer Institute grant U54CA132383 and US National Science Foundation grant HRD-0420407 (Z.O., Y.Z., H.W., M.S.); and the Sardinian Regional Authorities (A.F., A.P., N.S., V.L.). V.A.H.-T. is recipient of a fellowship from the Fonds pour la formation à la Recherche dans l′Industrie et dans l′Agriculture (F.R.I.A., Belgium); Y.S. is a postdoctoral fellow of the Fonds voor Wetenschappelijk Onderzoek - Vlaanderen (FWO, Belgium); P.G. is Research Associate of the Fonds National de la Recherche Scientifique (FNRS, Belgium).

Author information

Authors and Affiliations

Authors

Consortia

Contributions

D.M., J.C.C., D.M.C., R.J.P., M.K., J.J.C. and G.S. conceived the challenge; R.J.P. and G.S. performed team scoring; N.M.V. and K.R.A. performed experimental validation; D.M., J.C.C., R.K., R.J.P. and G.S. performed research; D.M., J.C.C., R.K., N.M.V., R.J.P., K.R.A., M.K., J.J.C. and G.S. analyzed results; D.M., J.C.C., R.K., M.K., J.J.C. and G.S. wrote the paper; and challenge participants performed network inference and provided method descriptions.

Corresponding author

Correspondence to Gustavo Stolovitzky.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Notes 1–10 (PDF 19429 kb)

Supplementary Data 1

DREAM5 network inference challenge (expression data, experiment descriptions, gene names, gold standards and evaluation scripts) (ZIP 37067 kb)

Supplementary Data 2

DREAM5 method scores (AUPR, AUROC and overall score) and summaries (XLS 57 kb)

Supplementary Data 3

DREAM5 network predictions of all individual methods and the community (ZIP 77113 kb)

Supplementary Data 4

E. coli, S. aureus and S. cerevisiae community networks (all predictions) (ZIP 5322 kb)

Supplementary Data 5

E. coli and S. aureus community networks at 50% precision cutoff (ZIP 192 kb)

Supplementary Data 6

E. coli and S. aureus network modules (XLS 595 kb)

Supplementary Data 7

E. coli experimental support for tested interactions (XLS 390 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Marbach, D., Costello, J., Küffner, R. et al. Wisdom of crowds for robust gene network inference. Nat Methods 9, 796–804 (2012). https://doi.org/10.1038/nmeth.2016

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.2016

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing