Original Article

Subject Category: Evolutionary genetics

The ISME Journal (2012) 6, 610–618; doi:10.1038/ismej.2011.139; published online 1 December 2011

An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea
Open

Daniel McDonald1, Morgan N Price2, Julia Goodrich1,8, Eric P Nawrocki3, Todd Z DeSantis5,8, Alexander Probst4,9, Gary L Andersen4, Rob Knight1,6 and Philip Hugenholtz7

  1. 1Department of Chemistry & Biochemistry and Biofrontiers Institute, University of Colorado, Boulder, CO, USA
  2. 2Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, CA, USA
  3. 3Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, VA, USA
  4. 4Lawrence Berkeley National Laboratory, Center for Environmental Biotechnology, Berkeley, CA, USA
  5. 5Department of Bioinformatics, Second Genome Inc., San Bruno, CA, USA
  6. 6Howard Hughes Medical Institute, Boulder, CO, USA
  7. 7Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience, St Lucia, Queensland, Australia

Correspondence: P Hugenholtz, Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences and Institute for Molecular Bioscience, University of Queensland, Molecular Biosciences Building 76, St Lucia, Queensland, 4072, Australia. E-mail: p.hugenholtz@uq.edu.au

8Current address: Department of Biology, Cornell University, Ithaca, NY, USA.

9Current address: Institute for Microbiology and Archaea Centre, University of Regensburg, Regensburg, Germany.

Received 10 May 2011; Revised 23 August 2011; Accepted 25 August 2011; Published online 1 December 2011.

Top

Abstract

Reference phylogenies are crucial for providing a taxonomic framework for interpretation of marker gene and metagenomic surveys, which continue to reveal novel species at a remarkable rate. Greengenes is a dedicated full-length 16S rRNA gene database that provides users with a curated taxonomy based on de novo tree inference. We developed a ‘taxonomy to tree’ approach for transferring group names from an existing taxonomy to a tree topology, and used it to apply the Greengenes, National Center for Biotechnology Information (NCBI) and cyanoDB (Cyanobacteria only) taxonomies to a de novo tree comprising 408315 sequences. We also incorporated explicit rank information provided by the NCBI taxonomy to group names (by prefixing rank designations) for better user orientation and classification consistency. The resulting merged taxonomy improved the classification of 75% of the sequences by one or more ranks relative to the original NCBI taxonomy with the most pronounced improvements occurring in under-classified environmental sequences. We also assessed candidate phyla (divisions) currently defined by NCBI and present recommendations for consolidation of 34 redundantly named groups. All intermediate results from the pipeline, which includes tree inference, jackknifing and transfer of a donor taxonomy to a recipient tree (tax2tree) are available for download. The improved Greengenes taxonomy should provide important infrastructure for a wide range of megasequencing projects studying ecosystems on scales ranging from our own bodies (the Human Microbiome Project) to the entire planet (the Earth Microbiome Project). The implementation of the software can be obtained from http://sourceforge.net/projects/tax2tree/.

Keywords:

evolution; phylogenetics; taxonomy