Resource | Published:

A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life

Nature Biotechnology volume 36, pages 9961004 (2018) | Download Citation

Abstract

Taxonomy is an organizing principle of biology and is ideally based on evolutionary relationships among organisms. Development of a robust bacterial taxonomy has been hindered by an inability to obtain most bacteria in pure culture and, to a lesser extent, by the historical use of phenotypes to guide classification. Culture-independent sequencing technologies have matured sufficiently that a comprehensive genome-based taxonomy is now possible. We used a concatenated protein phylogeny as the basis for a bacterial taxonomy that conservatively removes polyphyletic groups and normalizes taxonomic ranks on the basis of relative evolutionary divergence. Under this approach, 58% of the 94,759 genomes comprising the Genome Taxonomy Database had changes to their existing taxonomy. This result includes the description of 99 phyla, including six major monophyletic units from the subdivision of the Proteobacteria, and amalgamation of the Candidate Phyla Radiation into a single phylum. Our taxonomy should enable improved classification of uncultured bacteria and provide a sound basis for ecological and evolutionary studies.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

BioProject

References

  1. 1.

    A new genomics-driven taxonomy of Bacteria and Archaea: are we there yet? J. Clin. Microbiol. 54, 1956–1963 (2016).

  2. 2.

    , & Genome-based microbial taxonomy coming of age. in Microbial Evolution (ed. Ochman, H.) 55–65 (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, USA, 2016).

  3. 3.

    et al. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int. J. Syst. Evol. Microbiol. 67, 1613–1617 (2017).

  4. 4.

    Challenges for taxonomy. Nature 417, 17–19 (2002).

  5. 5.

    The NCBI taxonomy database. Nucleic Acids Res. 40, D136–D143 (2012).

  6. 6.

    et al. The SILVA and “All-species Living Tree Project (LTP)” taxonomic frameworks. Nucleic Acids Res. 42, D643–D648 (2014).

  7. 7.

    et al. Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 42, D633–D642 (2014).

  8. 8.

    et al. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J. 6, 610–618 (2012).

  9. 9.

    & A genomic update on clostridial phylogeny: Gram-negative spore formers and other misplaced clostridia. Environ. Microbiol. 15, 2631–2641 (2013).

  10. 10.

    Microbial malaise: how can we classify the microbiome? Trends Microbiol. 23, 671–679 (2015).

  11. 11.

    et al. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nat. Rev. Microbiol. 12, 635–645 (2014).

  12. 12.

    & in The Prokaryotes 3rd edn. (eds. Dworkin, M. et al.) 72–89 (Springer, New York, 2006).

  13. 13.

    , & Description of 'Synergistetes' phyl. nov. and emended description of the phylum 'Deferribacteres' and of the family Syntrophomonadaceae, phylum 'Firmicutes'. Int. J. Syst. Evol. Microbiol. 59, 1028–1035 (2009).

  14. 14.

    & 16S rRNA gene sequencing for bacterial identification in the diagnostic laboratory: pluses, perils, and pitfalls. J. Clin. Microbiol. 45, 2761–2764 (2007).

  15. 15.

    et al. Towards a balanced view of the bacterial tree of life. Microbiome 5, 140 (2017).

  16. 16.

    et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006).

  17. 17.

    , & An emerging phylogenetic core of Archaea: phylogenies of transcription and translation machineries converge following addition of new genome sequences. BMC Evol. Biol. 5, 36 (2005).

  18. 18.

    et al. Toward automatic reconstruction of a highly resolved tree of life. Science 311, 1283–1287 (2006).

  19. 19.

    , & Concatenated alignments and the case of the disappearing tree. BMC Evol. Biol. 14, 266 (2014).

  20. 20.

    et al. Unusual biology across a group comprising more than 15% of domain Bacteria. Nature 523, 208–211 (2015).

  21. 21.

    et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat. Commun. 7, 13219 (2016).

  22. 22.

    et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nat. Microbiol. 2, 1533–1542 (2017).

  23. 23.

    et al. Do orthologous gene phylogenies really support tree-thinking? BMC Evol. Biol. 5, 33 (2005).

  24. 24.

    , , , & Concatenation and species tree methods exhibit statistically indistinguishable accuracy under a range of simulated conditions. PLoS Curr. (2015).

  25. 25.

    et al. A new view of the tree of life. Nat. Microbiol. 1, 16048 (2016).

  26. 26.

    , & Phylogeny of bacterial and archaeal genomes using conserved genes: supertrees and supermatrices. PLoS One 8, e62510 (2013).

  27. 27.

    et al. Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage. ISME J. 6, 1186–1199 (2012).

  28. 28.

    , & Systematic identification of gene families for use as “markers” for phylogenetic and phylogeny-driven ecological studies of bacteria and archaea and their major subgroups. PLoS One 8, e77033 (2013).

  29. 29.

    , , & 16S rRNA genes reveal stratified open ocean bacterioplankton populations related to the Green Non-Sulfur bacteria. Proc. Natl. Acad. Sci. USA 93, 7979–7984 (1996).

  30. 30.

    , , & Microbial diversity in a hydrocarbon- and chlorinated-solvent-contaminated aquifer undergoing intrinsic bioremediation. Appl. Environ. Microbiol. 64, 3869–3877 (1998).

  31. 31.

    et al. Rapid screening for freshwater bacterial groups by using reverse line blot hybridization. Appl. Environ. Microbiol. 69, 5875–5883 (2003).

  32. 32.

    , , & Phylogeny of Firmicutes with special reference to Mycoplasma (Mollicutes) as inferred from phosphoglycerate kinase amino acid sequence data. Int. J. Syst. Evol. Microbiol. 54, 871–875 (2004).

  33. 33.

    et al. Phylogenetic analysis of dissimilatory Fe(III)-reducing bacteria. J. Bacteriol. 178, 2402–2408 (1996).

  34. 34.

    Telling the whole story in a 10,000-genome world. Biol. Direct 6, 34 (2011).

  35. 35.

    & Pan-genome analyses identify lineage- and niche-specific markers of evolution and adaptation in Epsilonproteobacteria. Front. Microbiol. 5, 110 (2014).

  36. 36.

    , , & Novel division level bacterial diversity in a Yellowstone hot spring. J. Bacteriol. 180, 366–376 (1998).

  37. 37.

    & Towards a genome-based taxonomy for prokaryotes. J. Bacteriol. 187, 6258–6264 (2005).

  38. 38.

    , & TreeOTU: operational taxonomic unit classification based on phylogenetic trees. Preprint at (2013).

  39. 39.

    in Molecular Biology and Pathogenicity of Mycoplasma (eds. Razin, S. & Herrmann, R.) 31–43 (Springer, New York, 2002).

  40. 40.

    , , & TimeTree: a resource for timelines, timetrees, and divergence times. Mol. Biol. Evol. 34, 1812–1819 (2017).

  41. 41.

    , , & The timetree of prokaryotes: new insights into their evolution and speciation. Mol. Biol. Evol. 34, 437–446 (2017).

  42. 42.

    , & Inferring species phylogenies from multiple genes: concatenated sequence tree versus consensus gene tree. J. Exp. Zoolog. B Mol. Dev. Evol. 304, 64–74 (2005).

  43. 43.

    & SILVA, RDP, Greengenes, NCBI and OTT: how do these taxonomies compare? BMC Genomics 18 (Suppl. 2), 114 (2017).

  44. 44.

    Modest proposals to expand the type material for naming of prokaryotes. Int. J. Syst. Evol. Microbiol. 66, 2108–2112 (2016).

  45. 45.

    , & Uncultivated microbes in need of their own taxonomy. ISME J. 11, 2399–2406 (2017).

  46. 46.

    , , & Genotyping of genetically monomorphic bacteria: DNA sequencing in Mycobacterium tuberculosis highlights the limitations of current methodologies. PLoS One 4, e7815 (2009).

  47. 47.

    et al. Microbial biogeography: putting microorganisms on the map. Nat. Rev. Microbiol. 4, 102–112 (2006).

  48. 48.

    , , , & Analysis and comparison of the pan-genomic properties of sixteen well-characterized bacterial genera. BMC Microbiol. 10, 258 (2010).

  49. 49.

    , , & Strategies to avoid wrongly labelled genomes using as example the detected wrong taxonomic affiliation for aeromonas genomes in the GenBank database. PLoS One 10, e0115813 (2015).

  50. 50.

    et al. Genome-based reclassification of Fusobacterium nucleatum subspecies at the species level. Curr. Microbiol. 74, 1137–1147 (2017).

  51. 51.

    & Biological species are universal across life's domains. Genome Biol. Evol. 9, 491–501 (2017).

  52. 52.

    , , & Phylogenomic analysis of the family Peptostreptococcaceae (Clostridium cluster XI) and proposal for reclassification of Clostridium litorale (Fendrich et al. 1991) and Eubacterium acidaminophilum (Zindel et al. 1989) as Peptoclostridium litorale gen. nov. comb. nov. and Peptoclostridium acidaminophilum comb. nov. Int. J. Syst. Evol. Microbiol. 66, 5506–5513 (2016).

  53. 53.

    et al. The All-Species Living Tree project: a 16S rRNA-based phylogenetic tree of all sequenced type strains. Syst. Appl. Microbiol. 31, 241–250 (2008).

  54. 54.

    , & Faecalimonas umbilicata gen. nov., sp. nov., isolated from human faeces, and reclassification of Eubacterium contortum, Eubacterium fissicatena and Clostridium oroticum as Faecalicatena contorta gen. nov., comb. nov., Faecalicatena fissicatena comb. nov. and Faecalicatena orotica comb. nov. Int. J. Syst. Evol. Microbiol. 67, 1219–1227 (2017).

  55. 55.

    et al. Genome-based taxonomic classification of Bacteroidetes. Front. Microbiol. 7, 2003 (2016).

  56. 56.

    , & in Bergey's Manual of Systematic Bacteriology (eds. Garrity, G. et al.) 575–922 (Springer, New York, 2005).

  57. 57.

    et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013).

  58. 58.

    et al. Comparative genomic analysis of the class Epsilonproteobacteria and proposed reclassification to Epsilonbacteraeota (phyl. nov.). Front. Microbiol. 8, 682 (2017).

  59. 59.

    in Bergey's Manual of Systematic Bacteriology (eds. Krieg, N.R. et al.) 567–724 (Springer, New York, 2010).

  60. 60.

    et al. Phylogenomic analysis of Candidatus 'Izimaplasma' species: free-living representatives from a Tenericutes clade found in methane seeps. ISME J. 10, 2679–2692 (2016).

  61. 61.

    , & Revised phylogeny of Bacteroidetes and proposal of sixteen new taxa and two new combinations including Rhodothermaeota phyl. nov. Syst. Appl. Microbiol. 39, 281–296 (2016).

  62. 62.

    , , & Complex microbial communities inhabiting sulfide-rich black mud from marine coastal environments. Biotechnol. Alia 8, 1–16 (2000).

  63. 63.

    et al. Characterization of filamentous bacteria, belonging to candidate phylum KSB3, that are associated with bulking in methanogenic granular sludges. ISME J. 1, 246–255 (2007).

  64. 64.

    et al. First genomic insights into members of a candidate bacterial phylum responsible for wastewater bulking. PeerJ 3, e740 (2015).

  65. 65.

    et al. Syst. Appl. Microbiol. The importance of designating type material for uncultured taxa (2018).

  66. 66.

    et al. RefSeq: an update on prokaryotic genome annotation and curation. Nucleic Acids Res. 46, D851–D860 (2018).

  67. 67.

    , & The sequence read archive. Nucleic Acids Res. 39, D19–D21 (2011).

  68. 68.

    et al. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17, 132 (2016).

  69. 69.

    , , , & CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res. 25, 1043–1055 (2015).

  70. 70.

    Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).

  71. 71.

    et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).

  72. 72.

    et al. Pfam: the protein families database. Nucleic Acids Res. 42, D222–D230 (2014).

  73. 73.

    , & The TIGRFAMs database of protein families. Nucleic Acids Res. 31, 371–373 (2003).

  74. 74.

    et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).

  75. 75.

    , & FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).

  76. 76.

    & A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol. 18, 691–699 (2001).

  77. 77.

    Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994).

  78. 78.

    et al. New substitution models for rooting phylogenetic trees. Phil. Trans. R. Soc. Lond. B 370, 20140336 (2015).

  79. 79.

    et al. ARB: a software environment for sequence data. Nucleic Acids Res. 32, 1363–1371 (2004).

  80. 80.

    List of bacterial names with standing in nomenclature: a folder available on the internet. Int. J. Syst. Bacteriol. 47, 590–592 (1997).

  81. 81.

    , & International Code of Nomenclature of Prokaryotes. Int. J. Syst. Evol. Microbiol. (2015).

  82. 82.

    et al. Proposal to include the rank of phylum in the International Code of Nomenclature of Prokaryotes. Int. J. Syst. Evol. Microbiol. 65, 4284–4287 (2015).

  83. 83.

    in Proceedings of the 9th Workshop on Algorithms in Bioinformatics (eds. Salzberg, S.L. & Warnow, T.) 375–389 (Springer, Berlin, 2009).

  84. 84.

    , & ExaML version 3: a tool for phylogenomic analyses on supercomputers. Bioinformatics 31, 2577–2579 (2015).

  85. 85.

    , , & IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).

  86. 86.

    RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

  87. 87.

    & An improved general amino acid replacement matrix. Mol. Biol. Evol. 25, 1307–1320 (2008).

  88. 88.

    Structural RNA Homology Search and Alignment Using Covariance Models PhD thesis,Washington Univ. in Saint Louis, (2009).

  89. 89.

    Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math Life Sci. 17, 57–86 (1986).

  90. 90.

    , & Accuracy of phylogeny reconstruction methods combining overlapping gene data sets. Algorithms Mol. Biol. 5, 37 (2010).

Download references

Acknowledgements

We thank P. Yilmaz for helpful discussions on the proposed genome-based taxonomy; QFAB Bioinformatics for providing computational resources; and members of ACE for beta-testing GTDB. The project was primarily supported by an Australian Research Council Laureate Fellowship (FL150100038) awarded to P.H.

Author information

Affiliations

  1. Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Queensland, Australia.

    • Donovan H Parks
    • , Maria Chuvochina
    • , David W Waite
    • , Christian Rinke
    • , Adam Skarshewski
    • , Pierre-Alain Chaumeil
    •  & Philip Hugenholtz

Authors

  1. Search for Donovan H Parks in:

  2. Search for Maria Chuvochina in:

  3. Search for David W Waite in:

  4. Search for Christian Rinke in:

  5. Search for Adam Skarshewski in:

  6. Search for Pierre-Alain Chaumeil in:

  7. Search for Philip Hugenholtz in:

Contributions

D.H.P., D.W.W. and P.H. wrote the paper, and all other authors provided constructive suggestions. D.H.P. and P.H. designed the study. M.C. and P.H. performed the taxonomic curation. D.H.P., D.W.W., C.R., A.S., and P.-A.C. performed the bioinformatic analyses. P.-A.C. designed the website.

Competing interests

The authors declare no competing financial interests.

Corresponding author

Correspondence to Philip Hugenholtz.

Integrated supplementary information

Supplementary information

PDF files

  1. 1.

    Supplementary Text and Figures

    Supplementary Figures 1–8

  2. 2.

    Life Sciences Reporting Summary

Excel files

  1. 1.

    Supplementary Table 1

    Robustness of GTDB taxonomy under varying marker sets, maximum-likelihood tree inference software, subsets of taxa, and evolutionary models

  2. 2.

    Supplementary Table 2

    16S rRNA-based taxa names adopted in the GTDB taxonomy and their associated rank and number of circumscribed genomes.

  3. 3.

    Supplementary Table 3

    Correspondence between standardly named NCBI and GTDB taxa ordered by degree of polyphyly.

  4. 4.

    Supplementary Table 4

    Taxa found to be polyphyletic in one or more of the trees inferred with FastTree, IQ-TREE, or ExaML on species- or genusdereplicated genome sets, or in trees inferred with FastTree using the ribosomal proteins (rp1) marker set or 16S rRNA gene.

  5. 5.

    Supplementary Table 5

    Genomes with conflicting or unresolved taxonomic assignments when applying the GTDB taxonomy to the species-dereplicated FastTree, IQ-TREE, or ExaML trees, or trees inferred from the concatenation of 16 ribosomal proteins (rp1) or the 16S rRNA gene.

  6. 6.

    Supplementary Table 6

    Pairwise comparison of trees inferred with varying inference methods and marker sets.

  7. 7.

    Supplementary Table 7

    Percentage of GTDB taxa at each taxonomic rank that are monophyletic, operationally monophyletic, or polyphyletic in each gene within the bac120 marker set.

  8. 8.

    Supplementary Table 8

    NCBI taxa that have been 'retired' in the GTDB taxonomy and brief explanations for their retirement.

  9. 9.

    Supplementary Table 9

    Correspondence between NCBI and SILVA taxa ordered by degree of polyphyly.

  10. 10.

    Supplementary Table 10

    Comparison of NCBI and GTDB genus and species classifications to those proposed by Beaz-Hidalgo et al. (2015), Kook et al. (2017), and Bobay & Ochman (2017).

  11. 11.

    Supplementary Table 11

    Comparison of clostridia classifications proposed by Yutin & Galperin (2013) to the GTDB taxonomy.

  12. 12.

    Supplementary Table 12

    Draft genomes with 16S rRNA genes that did not meet the selection criteria for inclusion in the 16S rRNA tree (Online Methods).

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nbt.4229

Further reading