Metagenomic species profiling using universal phylogenetic marker genes

Journal name:
Nature Methods
Year published:
Published online

To quantify known and unknown microorganisms at species-level resolution using shotgun sequencing data, we developed a method that establishes metagenomic operational taxonomic units (mOTUs) based on single-copy phylogenetic marker genes. Applied to 252 human fecal samples, the method revealed that on average 43% of the species abundance and 58% of the richness cannot be captured by current reference genome–based methods. An implementation of the method is available at

At a glance


  1. Phylogenetic marker gene-based mOTUs.
    Figure 1: Phylogenetic marker gene–based mOTUs.

    (a) Schematic showing the mOTUs that contained at least one marker gene (MG) that originated from a sequenced reference genome and at least one metagenomic MG (mOTURefMeta) and mOTUs that contained at least one metagenomic MG but no reference MG (mOTUMeta). Black and red lines indicate known and unknown topologies, respectively. (b) Mean fractions of mOTUMeta and mOTURefMeta of the observed mOTU richness per sample, and mean relative abundances based on mOTU abundance profiles of 252 human fecal samples.

  2. Phylogenetic analysis of mOTU linkage groups.
    Figure 2: Phylogenetic analysis of mOTU linkage groups.

    (a) Maximum likelihood phylogenetic tree of prokaryotic species used to infer the topology of mOTU-LGs (Online Methods). US National Center for Biotechnology Information phylum-level taxonomy is color-coded on the outer ring, and placements of mOTU-LGs are shown as circles on tree edges. Dashed lines indicate a clade of Oscillibacter valericigenes and related mOTU-LGs that are highlighted in c. (b) Phylum-level breakdown of new mOTU-LG (Online Methods). *, for mOTU-LGs that had no consistent annotation across their mOTU members at the phylum level, BLASTp identities of the member sequences are shown in the inset. Median protein identities (n = 6–10) with interquartile ranges (box) are shown with whiskers extending up to 1.5 times the interquartile range. (c) Maximum likelihood tree for a subset of mOTU-LGs that represent previously unidentified (new) species in the genus Oscillibacter.

  3. Performance and application of mOTU linkage groups.
    Figure 3: Performance and application of mOTU linkage groups.

    (a) Fraction of samples originating from 43 individuals that were sampled at least twice (total 88 samples) for which the most similar sample originated from the same individual using mOTU-LG (red), a subset of mOTU-LGs that represent reference species (reference mOTU-LG) and clade-specific genes7 at species level (MetaPhlAn). (b) Shannon diversity index for samples originating from US individuals (AM; n = 97), asymptomatic European individuals (EU; n = 85) and individuals diagnosed with IBD (IBD; n = 25). Individual samples are shown as closed circles and collective data for each group superimposed as box plots. Padj. denotes Bonferroni-adjusted P values of Wilcoxon's rank-sum test results. (c) Relative abundances of mOTU-LGs that were significantly different between fecal samples from a cohort of UC patients (n = 21) and matched asymptomatic individuals (n = 35). The mean (across mOTU-LG members) protein identity for best BLASTp hits is shown as a proxy for phylogenetic distance to the closest organism for which a reference genome sequence was available. Padj. values denote FDR-adjusted P values of Wilcoxon test results.


P.B. and S.S. conceived the study, S.S., D.R.M., G.Z., F.I.-C., S.A.B., M.A., J.T. and A.S. designed and performed the analyses, S.S., D.R.M., G.Z., J.R.K., L.P.C. and J.L. developed and implemented the program, O.P., F.G., J.D. and J.W. provided data, S.S., D.R.M., G.Z. and P.B. wrote the manuscript, and M.A., J.T., H.B.N., S.R., O.P., F.G., W.M.d.V., S.D.E. and A.S. gave conceptual advice and revised the manuscript.

Competing financial interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to:

Author details

