Unlocking the potential of metagenomics through replicated experimental design

Journal name:
Nature Biotechnology
Year published:
Published online


Metagenomics holds enormous promise for discovering novel enzymes and organisms that are biomarkers or drivers of processes relevant to disease, industry and the environment. In the past two years, we have seen a paradigm shift in metagenomics to the application of cross-sectional and longitudinal studies enabled by advances in DNA sequencing and high-performance computing. These technologies now make it possible to broadly assess microbial diversity and function, allowing systematic investigation of the largely unexplored frontier of microbial life. To achieve this aim, the global scientific community must collaborate and agree upon common objectives and data standards to enable comparative research across the Earth's microbiome. Improvements in comparability of data will facilitate the study of biotechnologically relevant processes, such as bioprospecting for new glycoside hydrolases or identifying novel energy sources.

At a glance


  1. Conceptual diagram of why replicated samples, especially across a gradient or along a time series, are critical for interpretation of results.
    Figure 1: Conceptual diagram of why replicated samples, especially across a gradient or along a time series, are critical for interpretation of results.

    Structure that is externally imposed by study design greatly improves our ability to recover biologically meaningful relationships rather than simply finding statistical differences between samples (especially important because every pair of biological samples will be different if sequenced deeply enough). In this case, we show the L4 Western English Channel ocean time series samples (Graph reprinted from Gilbert et al.22). Sampling only during the summer, highlighted by blue shading, would only reveal the tip of the iceberg of variability in this ecosystem, which is driven by seasonal change. Similar principles apply in other ecosystems that have other major drivers of variation that, when overlooked, can influence the results in ways that are puzzling, or give a misleading picture of variation.

  2. Importance of metadata-enabled studies.
    Figure 2: Importance of metadata-enabled studies.

    Matched-pair diagrams showing visualizations from recently published, high-impact studies. Standard clustering of the data (left) is contrasted with the same diagram in which each data point is colored according to metadata (right). (a) Principal coordinate analysis plot of UniFrac distances between human body habitat–associated communities reveals that microbes cluster by habitat type (Reprinted by permission of AAAS from Costello et al. (N.F., J.I.G., R.K. and colleagues)46). (b) A bipartite network diagram shows that mammalian fecal communities mainly cluster by diet (Reprinted by permission of AAAS from Ley et al. (R.K., J.I.G. and colleagues)47). (c) A nonmetric multidimensional scaling plot of UniFrac distances between soil communities shows that the main factor driving variation in these communities is pH (Reprinted by permission of PNAS from Fierer et al. (N.F., R.K. and colleagues)48). These relationships are immediately and intuitively obvious when the right metadata are applied, but would be essentially impossible to see otherwise.


  1. Whitman, W.B., Coleman, D.C. & Wiebe, W.J. Prokaryotes: the unseen majority. Proc. Natl. Acad. Sci. USA 95, 65786583 (1998).
  2. Falkowski, P.G., Fenchel, T. & Delong, E.F. The microbial engines that drive Earth's biogeochemical cycles. Science 320, 10341039 (2008).
  3. Field, D. et al. The Genomic Standards Consortium. PLoS Biol. 9, e1001088 (2011).
  4. Fisher, R.A. The Design of Experiments (Macmillan, 1935).
  5. Gilbert, J.A. et al. Bioprospecting metagenomics for new glycoside hydrolases. in Methods in Molecular Biology vol. 908, Biomass Conversion. (ed. Himmel, M.E.) (Humana Press, 2012).
  6. Jansson, J. Towards “Tera-Terra”: Terabase Sequencing of Terrestrial Metagenomes. ASM Microbe 6, 309315 (2011).
  7. Davies, N. & Field, D. & Genomic Observatories Network. Sequencing data: A genomic network to monitor Earth. Nature 481, 145 (2012).
  8. Tyson, G.W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 3743 (2004).
  9. Mackelprang, R. et al. Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw. Nature 480, 368371 (2011).
  10. Delmont, T.O. et al. Structure, fluctuation and magnitude of a natural grassland soil metagenome. ISME J. published online, doi: 10.1038/ismej.2011.197 (2 February 2012).
  11. Rusch, D.B. et al. The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 5, e77 (2007).
  12. DeLong, E.F. et al. Community genomics among stratified microbial assemblages in the ocean's interior. Science 311, 496503 (2006).
  13. Gilbert, J.A. et al. The taxonomic and functional diversity of microbes at a temperate coastal site: a 'multi-omic' study of seasonal and diel temporal variation. PLoS ONE 5, e15545 (2010).
  14. Warnecke, F. et al. Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature 450, 560565 (2007).
  15. Hess, M. et al. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331, 463467 (2011).
  16. Gill, S.R. et al. Metagenomic analysis of the human distal gut microbiome. Science 312, 13551359 (2006).
  17. Turnbaugh, P.J. et al. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature 444, 10271031 (2006).
  18. Delmont, T.O. et al. Metagenomic mining for microbiologists. ISME J. 5, 18371843 (2011).
  19. Dinsdale, E.A. et al. Functional metagenomic profiling of nine biomes. Nature 452, 629632 (2008).
  20. Tringe, S.G. et al. Comparative metagenomics of microbial communities. Science 308, 554557 (2005).
  21. Prosser, J.I. Replicate or lie. Environ. Microbiol. 12, 18061810 (2010).
  22. Gilbert, J.A. et al. Defining seasonal marine microbial community dynamics. ISME J. 6, 298308 (2012).
  23. Caporaso, J.G. et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc. Natl. Acad. Sci. USA 108 (suppl. 1), 45164522 (2011).
  24. Caporaso, J.G. et al. Moving pictures of the human microbiome. Genome Biol. 12, R50 (2011).
  25. Parsons, R.J., Breitbart, M., Lomas, M.W. & Carlson, C.A. Ocean time-series reveals recurring seasonal patterns of virioplankton dynamics in the northwestern Sargasso Sea. ISME J. 6, 273284 (2012).
  26. Farnelid, H. et al. Nitrogenase gene amplicons from global marine surface waters are dominated by genes of non-cyanobacteria. PLoS ONE 6, e19223 (2011).
  27. Desai, N., Antonopoulos, D., Gilbert, J.A., Glass, E.M. & Meyer, F. From genomics to metagenomics. Curr. Opin. Biotechnol. 23, 7276 (2012).
  28. Thauer, R.K., Kaster, A.K., Seedorf, H., Buckel, W. & Hedderich, R. Methanogenic archaea: ecologically relevant differences in energy conservation. Nat. Rev. Microbiol. 6, 579591 (2008).
  29. Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 5965 (2010).
  30. Larsen, P., Field, D. & Gilbert, J.A. Predicting bacterial community assemblages using an artificial neural network approach. Nat. Methods advance online publication, doi:10.1038/nmeth.1975 (15 April 2012).
  31. Caporaso, J.G., Paszkiewicz, K., Field, D., Knight, R. & Gilbert, J.A. The Western English Channel contains a persistent microbial seed bank. ISME J. 6, 10891093 (2012).
  32. Mahowald, M.A. et al. Characterizing a model human gut microbiota composed of members of its two dominant bacterial phyla. Proc. Natl. Acad. Sci. USA 106, 58595864 (2009).
  33. Gilbert, J.A. et al. The seasonal structure of microbial communities in the Western English Channel. Environ. Microbiol. 11, 31323139 (2009).
  34. Arumugam, M. et al. Enterotypes of the human gut microbiome. Nature 473, 174180 (2011).
  35. Wu, G.D. et al. Linking long-term dietary patterns with gut microbial enterotypes. Science 334, 105108 (2011).
  36. Kuczynski, J. et al. Direct sequencing of the human microbiome readily reveals community differences. Genome Biol. 11, 210 (2010).
  37. Gilbert, J.A. et al. The Earth Microbiome Project: meeting report of the “1 EMP meeting on sample selection and acquisition” at Argonne National Laboratory October 6 2010. Stand. Genomic Sci. 3, 249253 (2010).
  38. Woelfle, M., Olliaro, P. & Todd, M.H. Open science is a research accelerator. Nat. Chem. 3, 745748 (2011).
  39. Yilmaz, P. et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat. Biotechnol. 29, 415420 (2011).
  40. Lozupone, C.A. & Knight, R. Global patterns in bacterial diversity. Proc. Natl. Acad. Sci. USA 104, 1143611440 (2007).
  41. Caporaso, J.G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335336 (2010).
  42. Meyer, F. et al. The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9, 386 (2008).
  43. Wu, D. et al. A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462, 10561060 (2009).
  44. Markowitz, V.M. et al. IMG: the Integrated Microbial Genomes database and comparative analysis system. Nucleic Acids Res. 40, D115D122 (2012).
  45. Markowitz, V.M. et al. IMG/M: the integrated metagenome data management and comparative analysis system. Nucleic Acids Res. 40, D123D129 (2012).
  46. Costello, E.K. et al. Bacterial community variation in human body habitats across space and time. Science 326, 16941697 (2009).
  47. Ley, R.E. et al. Evolution of mammals and their gut microbes. Science 320, 16471651 (2008).
  48. Fierer, N. et al. Forensic identification using skin bacterial communities. Proc. Natl. Acad. Sci. USA 107, 64776481 (2010).
  49. Zinger, L. et al. Global patterns of bacterial beta-diversity in seafloor and seawater ecosystems. PLoS ONE 6, e24570 (2011).
  50. Muegge, B.D. et al. Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans. Science 332, 970974 (2011).
  51. Claesson, M.J. et al. Composition, variability, and temporal stability of the intestinal microbiota of the elderly. Proc. Natl. Acad. Sci. USA 108 (suppl. 1), 45864591 (2011).
  52. Frank, D.N. et al. Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc. Natl. Acad. Sci. USA 104, 1378013785 (2007).
  53. Turnbaugh, P.J. et al. A core gut microbiome in obese and lean twins. Nature 457, 480484 (2009).
  54. Reyes, A. et al. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466, 334338 (2010).
  55. Ravel, J. et al. Vaginal microbiome of reproductive-age women. Proc. Natl. Acad. Sci. USA 108 (suppl. 1), 46804687 (2011).
  56. Rasche, F. et al. Seasonality and resource availability control bacterial and archaeal communities in soils of a temperate beech forest. ISME J. 5, 389402 (2011).

Download references

Author information


  1. Howard Hughes Medical Institute and Department of Chemistry & Biochemistry, University of Colorado at Boulder, Boulder, Colorado, USA.

    • Rob Knight
  2. Lawrence Berkeley National Laboratory, Earth Sciences Division, Berkeley, California, USA.

    • Janet Jansson
  3. Lawrence Berkeley National Laboratory, Joint Genome Institute, Walnut Creek, California, USA.

    • Janet Jansson
  4. Joint Bioenergy Institute, Emeryville, California, USA.

    • Janet Jansson
  5. Centre for Ecology & Hydrology, Wallingford, Oxford, UK.

    • Dawn Field &
    • Mark J Bailey
  6. Department of Ecology and Evolutionary Biology, Cooperative Institute for Research in Environmental Sciences, University of Colorado at Boulder, Boulder, Colorado, USA.

    • Noah Fierer
  7. Argonne National Laboratory, Argonne, Illinois, USA.

    • Narayan Desai,
    • Folker Meyer,
    • Rick Stevens &
    • Jack A Gilbert
  8. Department of Biological Sciences, University of Southern California, Los Angeles, California, USA.

    • Jed A Fuhrman
  9. Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences & Institute for Molecular Bioscience, The University of Queensland, St. Lucia, Australia.

    • Phil Hugenholtz
  10. RTI, Research Triangle Park, Durham, North Carolina, USA.

    • Daniel van der Lelie
  11. The Computation Institute, University of Chicago, Chicago, Illinois, USA.

    • Folker Meyer &
    • Rick Stevens
  12. Center for Genome Sciences and Systems Biology, Washington University School of Medicine, St. Louis, Missouri, USA.

    • Jeffrey I Gordon
  13. Netherlands Institute of Ecology (NIOO-KNAW), Wageningen, The Netherlands.

    • George A Kowalchuk
  14. Institute of Ecological Science, VU University Amsterdam, Amsterdam, The Netherlands.

    • George A Kowalchuk
  15. Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, USA.

    • Jack A Gilbert

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Additional data