Metagenomic species profiling using universal phylogenetic marker genes

Journal name:
Nature Methods
Year published:
Published online

To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed that on average 43% of the species abundance and 58% of the richness cannot be captured by current reference genome–based methods. An implementation of the method is available at

At a glance


  1. Phylogenetic marker gene-based mOTUs.
    Figure 1: Phylogenetic marker gene–based mOTUs.

    (a) Schematic showing the mOTUs that contained at least one marker gene (MG) that originated from a sequenced reference genome and at least one metagenomic MG (mOTURefMeta) and mOTUs that contained at least one metagenomic MG but no reference MG (mOTUMeta). Black and red lines indicate known and unknown topologies, respectively. (b) Mean fractions of mOTUMeta and mOTURefMeta of the observed mOTU richness per sample, and mean relative abundances based on mOTU abundance profiles of 252 human fecal samples.

  2. Phylogenetic analysis of mOTU linkage groups.
    Figure 2: Phylogenetic analysis of mOTU linkage groups.

    (a) Maximum likelihood phylogenetic tree of prokaryotic species used to infer the topology of mOTU-LGs (Online Methods). US National Center for Biotechnology Information phylum-level taxonomy is color-coded on the outer ring, and placements of mOTU-LGs are shown as circles on tree edges. Dashed lines indicate a clade of Oscillibacter valericigenes and related mOTU-LGs that are highlighted in c. (b) Phylum-level breakdown of new mOTU-LG (Online Methods). *, for mOTU-LGs that had no consistent annotation across their mOTU members at the phylum level, BLASTp identities of the member sequences are shown in the inset. Median protein identities (n = 6–10) with interquartile ranges (box) are shown with whiskers extending up to 1.5 times the interquartile range. (c) Maximum likelihood tree for a subset of mOTU-LGs that represent previously unidentified (new) species in the genus Oscillibacter.

  3. Performance and application of mOTU linkage groups.
    Figure 3: Performance and application of mOTU linkage groups.

    (a) Fraction of samples originating from 43 individuals that were sampled at least twice (total 88 samples) for which the most similar sample originated from the same individual using mOTU-LG (red), a subset of mOTU-LGs that represent reference species (reference mOTU-LG) and clade-specific genes7 at species level (MetaPhlAn). (b) Shannon diversity index for samples originating from US individuals (AM; n = 97), asymptomatic European individuals (EU; n = 85) and individuals diagnosed with IBD (IBD; n = 25). Individual samples are shown as closed circles and collective data for each group superimposed as box plots. Padj. denotes Bonferroni-adjusted P values of Wilcoxon's rank-sum test results. (c) Relative abundances of mOTU-LGs that were significantly different between fecal samples from a cohort of UC patients (n = 21) and matched asymptomatic individuals (n = 35). The mean (across mOTU-LG members) protein identity for best BLASTp hits is shown as a proxy for phylogenetic distance to the closest organism for which a reference genome sequence was available. Padj. values denote FDR-adjusted P values of Wilcoxon test results.


  1. Klappenbach, J.A., Saxman, P.R., Cole, J.R. & Schmidt, T.M. Nucleic Acids Res. 29, 181184 (2001).
  2. Engelbrektson, A. et al. ISME J. 4, 642647 (2010).
  3. Claesson, M.J. et al. Nucleic Acids Res. 38, e200 (2010).
  4. Gevers, D. et al. Nat. Rev. Microbiol. 3, 733739 (2005).
  5. Arumugam, M. et al. Nature 473, 174180 (2011).
  6. Liu, B., Gibbons, T., Ghodsi, M., Treangen, T. & Pop, M. BMC Genomics 12 (suppl. 2), S4 (2011).
  7. Segata, N. et al. Nat. Methods 9, 811814 (2012).
  8. Ciccarelli, F. et al. Science 311, 12831287 (2006).
  9. Sorek, R. et al. Science 318, 14491452 (2007).
  10. von Mering, C. et al. Science 315, 11261130 (2007).
  11. Mende, D.R., Sunagawa, S., Zeller, G. & Bork, P. Nat. Methods 10, 881884 (2013).
  12. Qin, J. et al. Nature 464, 5965 (2010).
  13. The Human Microbiome Project Consortium. Nature 486, 215221 (2012).
  14. Nelson, K.E. et al. Science 328, 994999 (2010).
  15. Walker, A.W. et al. ISME J. 5, 220230 (2011).
  16. Mondot, S. et al. Inflamm. Bowel Dis. 17, 185192 (2011).
  17. Schloissnig, S. et al. Nature 493, 4550 (2013).
  18. Turnbaugh, P.J. et al. Nature 457, 480484 (2009).
  19. Rajilic-Stojanovic, M., Heilig, H.G., Tims, S., Zoetendal, E.G. & de Vos, W.M. Environ. Microbiol. 15, 11461159 (2012).
  20. Manichanh, C., Borruel, N., Casellas, F. & Guarner, F. Nat. Rev. Gastroenterol. Hepatol. 9, 599608 (2012).
  21. Rajilic-Stojanovic, M., Shanahan, F., Guarner, F. & de Vos, W.M. Inflamm. Bowel Dis. 19, 481488 (2013).
  22. Png, C.W. et al. Am. J. Gastroenterol. 105, 24202428 (2010).
  23. Qin, J. et al. Nature 490, 5560 (2012).
  24. Forslund, K. et al. Genome Res. 23, 11631169 (2013).
  25. Kultima, J.R. et al. PLoS ONE 7, e47656 (2012).
  26. Eddy, S.R. PLoS Comput. Biol. 7, e1002195 (2011).
  27. Muller, J., Creevey, C.J., Thompson, J.D., Arendt, D. & Bork, P. Bioinformatics 26, 263265 (2010).
  28. Powell, S. et al. Nucleic Acids Res. 40, D284D289 (2012).
  29. Mende, D.R. et al. PLoS ONE 7, e31386 (2012).
  30. Edgar, R.C. Bioinformatics 26, 24602461 (2010).
  31. Stamatakis, A. Bioinformatics 22, 26882690 (2006).
  32. Stamatakis, A. & Aberer, A. in Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed Processing 11951204 (2013).
  33. Berger, S.A. & Stamatakis, A. Bioinformatics 27, 20682075 (2011).
  34. Berger, S.A., Krompass, D. & Stamatakis, A. Syst. Biol. 60, 291302 (2011).
  35. Letunic, I. & Bork, P. Bioinformatics 23, 127128 (2007).

Download references

Author information


  1. European Molecular Biology Laboratory, Heidelberg, Germany.

    • Shinichi Sunagawa,
    • Daniel R Mende,
    • Georg Zeller,
    • Jens Roat Kultima,
    • Luis Pedro Coelho,
    • Manimozhiyan Arumugam,
    • Julien Tap &
    • Peer Bork
  2. The Exelixis Lab, Scientific Computing Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany.

    • Fernando Izquierdo-Carrasco,
    • Simon A Berger &
    • Alexandros Stamatakis
  3. The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

    • Manimozhiyan Arumugam,
    • Oluf Pedersen &
    • Jun Wang
  4. Beijing Genomics Institute (BGI) Shenzhen, Shenzhen, China.

    • Manimozhiyan Arumugam,
    • Jun Wang &
    • Junhua Li
  5. Unité de Service 1367 Metagenopolis, Institut National de la Recherche Agronomique, Jouy en Josas, France.

    • Julien Tap,
    • Joël Doré &
    • S Dusko Ehrlich
  6. Center for Biological Sequence Analysis, Technical University of Denmark, Kongens Lyngby, Denmark.

    • Henrik Bjørn Nielsen,
    • Simon Rasmussen &
    • Søren Brunak
  7. Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark.

    • Henrik Bjørn Nielsen &
    • Søren Brunak
  8. Hagedorn Research Institute, Gentofte, Denmark.

    • Oluf Pedersen
  9. Institute of Biomedical Science, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.

    • Oluf Pedersen
  10. Faculty of Health Sciences, Aarhus University, Aarhus, Denmark.

    • Oluf Pedersen
  11. Digestive System Research Unit, University Hospital Vall d'Hebron, Barcelona, Spain.

    • Francisco Guarner
  12. Laboratory of Microbiology, Wageningen University, Wageningen, The Netherlands.

    • Willem M de Vos
  13. Department of Bacteriology and Immunology, University of Helsinki, Helsinki, Finland.

    • Willem M de Vos
  14. King Abdulaziz University, Jeddah, Saudi Arabia.

    • Jun Wang
  15. Department of Biology, University of Copenhagen, Copenhagen, Denmark.

    • Jun Wang
  16. Macau University of Science and Technology, Macau, China.

    • Jun Wang
  17. BGI Hong Kong Research Institute, Hong Kong, China.

    • Junhua Li
  18. School of Bioscience and Biotechnology, South China University of Technology, Guangzhou, China.

    • Junhua Li
  19. Unité Mixte de Recherche 1319 Micalis, Institut National de la Recherche Agronomique, Jouy en Josas, France.

    • Joël Doré
  20. Karlsruhe Institute of Technology, Institute for Theoretical Informatics, Karlsruhe, Germany.

    • Alexandros Stamatakis
  21. Max Delbrück Centre for Molecular Medicine, Berlin, Germany.

    • Peer Bork


P.B. and S.S. conceived the study, S.S., D.R.M., G.Z., F.I.-C., S.A.B., M.A., J.T. and A.S. designed and performed the analyses, S.S., D.R.M., G.Z., J.R.K., L.P.C. and J.L. developed and implemented the program, O.P., F.G., J.D. and J.W. provided data, S.S., D.R.M., G.Z. and P.B. wrote the manuscript, and M.A., J.T., H.B.N., S.R., O.P., F.G., W.M.d.V., S.D.E. and A.S. gave conceptual advice and revised the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (2,102 KB)

    Supplementary Figures 1–8, and Supplementary Tables 4, 5 and 7

Excel files

  1. Supplementary Table 1 (162 KB)

    Prokaryotic reference genomes used in this study.

  2. Supplementary Table 2 (13 KB)

    Summary of benchmark results for speed and accuracy of marker gene identification.

  3. Supplementary Table 3 (21 KB)

    Metagenomic data sets used in this study.

  4. Supplementary Table 6 (68 KB)

    Summary of mOTU linkage groups with taxonomic annotations.

Zip files

  1. Supplementary Software (88,181 KB)

    mOTU profiling tool.

Additional data