A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea

Wu, Dongying; Hugenholtz, Philip; Mavromatis, Konstantinos; Pukall, Rüdiger; Dalin, Eileen; Ivanova, Natalia N.; Kunin, Victor; Goodwin, Lynne; Wu, Martin; Tindall, Brian J.; Hooper, Sean D.; Pati, Amrita; Lykidis, Athanasios; Spring, Stefan; Anderson, Iain J.; D’haeseleer, Patrik; Zemla, Adam; Singer, Mitchell; Lapidus, Alla; Nolan, Matt; Copeland, Alex; Han, Cliff; Chen, Feng; Cheng, Jan-Fang; Lucas, Susan; Kerfeld, Cheryl; Lang, Elke; Gronow, Sabine; Chain, Patrick; Bruce, David; Rubin, Edward M.; Kyrpides, Nikos C.; Klenk, Hans-Peter; Eisen, Jonathan A.

doi:10.1038/nature08656

Download PDF

Letter
Open access
Published: 24 December 2009

A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea

Dongying Wu^1,2,
Philip Hugenholtz¹,
Konstantinos Mavromatis¹,
Rüdiger Pukall³,
Eileen Dalin¹,
Natalia N. Ivanova¹,
Victor Kunin¹,
Lynne Goodwin⁴,
Martin Wu⁵,
Brian J. Tindall³,
Sean D. Hooper¹,
Amrita Pati¹,
Athanasios Lykidis¹,
Stefan Spring³,
Iain J. Anderson¹,
Patrik D’haeseleer^1,6,
Adam Zemla⁶,
Mitchell Singer²,
Alla Lapidus¹,
Matt Nolan¹,
Alex Copeland¹,
Cliff Han⁴,
Feng Chen¹,
Jan-Fang Cheng¹,
Susan Lucas¹,
Cheryl Kerfeld¹,
Elke Lang³,
Sabine Gronow³,
Patrick Chain^1,4,
David Bruce⁴,
Edward M. Rubin¹,
Nikos C. Kyrpides¹,
Hans-Peter Klenk³ &
…
Jonathan A. Eisen^1,2

Nature volume 462, pages 1056–1060 (2009)Cite this article

28k Accesses
778 Citations
61 Altmetric
Metrics details

Abstract

Sequencing of bacterial and archaeal genomes has revolutionized our understanding of the many roles played by microorganisms¹. There are now nearly 1,000 completed bacterial and archaeal genomes available², most of which were chosen for sequencing on the basis of their physiology. As a result, the perspective provided by the currently available genomes is limited by a highly biased phylogenetic distribution^3,4,5. To explore the value added by choosing microbial genomes for sequencing on the basis of their evolutionary relationships, we have sequenced and analysed the genomes of 56 culturable species of Bacteria and Archaea selected to maximize phylogenetic coverage. Analysis of these genomes demonstrated pronounced benefits (compared to an equivalent set of genomes randomly selected from the existing database) in diverse areas including the reconstruction of phylogenetic history, the discovery of new protein families and biological properties, and the prediction of functions for known genes from other organisms. Our results strongly support the need for systematic ‘phylogenomic’ efforts to compile a phylogeny-driven ‘Genomic Encyclopedia of Bacteria and Archaea’ in order to derive maximum knowledge from existing microbial genome data as well as from genome sequences to come.

A complete domain-to-species taxonomy for Bacteria and Archaea

Article 27 April 2020

Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea

Article Open access 02 December 2019

A standardized archaeal taxonomy for the Genome Taxonomy Database

Article 21 June 2021

Main

Since the publication of the first complete bacterial genome, sequencing of the microbial world has accelerated beyond expectations. The inventory of bacterial and archaeal isolates with complete or draft sequences is approaching the two thousand mark². Most of these genome sequences are the product of studies in which one or a few isolates were targeted because of an interest in a specific characteristic of the organism. Although large-scale multi-isolate genome sequencing studies have been performed, they have tended to be focused on particular habitats or on the relatives of specific organisms. This overall lack of broad phylogenetic considerations in the selection of microbial genomes for sequencing, combined with a cultivation bottleneck⁶, has led to a strongly biased representation of recognized microbial phylogenetic diversity^3,4,5. Although some projects have attempted to correct this (for example, see ref. 5), they have all been small in scope. To evaluate the potential benefits of a more systematic effort, we embarked on a pilot project to sequence approximately 100 genomes selected solely for their phylogenetic novelty: the ‘Genomic Encyclopedia of Bacteria and Archaea’ (GEBA).

Organisms were selected on the basis of their position in a phylogenetic tree of small subunit (SSU) ribosomal RNA, the best sampled gene from across the tree of life⁷. Working from the root to the tips of the tree, we identified the most divergent lineages that lacked representatives with sequenced genomes (completed or in progress)⁸ and for which a species has been formally described⁹ and a type strain designated and deposited in a publicly accessible culture collection¹⁰. From hundreds of candidates, 200 type strains were selected both to obtain broad coverage across Bacteria and Archaea and to perform in-depth sampling of a single phylum. The Gram-positive bacterial phylum Actinobacteria was chosen for the latter purpose because of the availability of many phylogenetically and phenotypically diverse cultured strains, and because it had the lowest percentage of sequenced isolates of any phylum (1% versus an average of 2.3%)¹¹. Of the 200 targeted isolates, 159 were designated as ‘high’ priority primarily on the basis of phylum-level novelty and the ability to obtain microgram quantities of high quality DNA. The genomes of these 159 are being sequenced, assembled, annotated (including recommended metadata¹²) and finished, and relevant data are being released through a dedicated Integrated Microbial Genomes database portal¹³ and deposited into GenBank. Currently, data from 106 genomes (62 of which are finished) are available.

To assess the ramifications of this tree-based selection of organisms, we focused our analyses on the first 56 genomes for which the shotgun phase of sequencing was completed. The 53 bacteria and 3 archaea (Supplementary Table 1) represent both a broad sampling of bacterial diversity and a deeper sampling of the phylum Actinobacteria (26 GEBA genomes). An initial question we addressed was whether selection on the basis of phylogenetic novelty of SSU rRNA genes reliably identifies genomes that are phylogenetically novel on the basis of other criteria. This question arises because it is known that single genes, even SSU rRNA genes, do not perfectly predict genome-wide phylogenetic patterns^14,15. To investigate this, we created a ‘genome tree’ (ref. 16) of completed bacterial genomes (Fig. 1) and then measured the relative contribution of the GEBA project using the phylogenetic diversity metric¹⁷. We found that the 53 GEBA bacteria accounted for 2.8–4.4 times more phylogenetic diversity than randomly sampled subsets of 53 non-GEBA bacterial genomes. A similar degree of improvement in phylogenetic diversity was seen for the more intensively sampled actinobacteria (Table 1). These analyses indicate that although SSU rRNA genes are not a perfect indicator of organismal evolution, their phylogenetic relationships are a sound predictor of phylogenetic novelty within the universal gene core present in bacterial genomes.

Table 1 Effect of SSU rRNA tree-based selection of organisms on comparative genomic metrics

Full size table

The discovery and characterization of new gene families and their associated novel functions provide one incentive for sequencing additional genomes, analysis of which has helped to redefine the protein family universe¹⁸. We explored the quantitative effect of tree-based genome selection on the pace of discovery of novel proteins and functions. Specifically, we compared the rate of discovery of novel protein families when progressively adding more closely related genomes versus when adding more distantly related ones (Fig. 2). Granted, many factors contribute to protein family diversity, such as ecological niche; nevertheless, higher rates of novel protein family discovery were found in the more phylogenetically diverse taxa (Fig. 2). In addition, of the 16,797 families identified in the 56 GEBA genomes, 1,768 showed no significant sequence similarity to any proteins, indicating the presence of novel functional diversity. These results highlight the utility of tree-based genome selection as a means to maximize the identification of novel protein families and argues against lateral gene transfer significantly redistributing genetic novelty between distantly related lineages.

Figure 2: **Rate of discovery of protein families as a function of phylogenetic breadth of genomes.**

Novel proteins also can serve to link distantly related homologues whose relatedness would otherwise go undetected. Forty-six such links were identified in the 56 GEBA genomes compared to an average of only three new links in equivalent sets of randomly sampled non-GEBA genomes (Table 1). A useful complement to homology-based predictions of gene function are ‘non-homology methods’ (ref. 19) such as gene context-based inference that relies on the conserved clustering of functionally related genes across multiple genomes, often in operons or as gene fusions²⁰. We identified over 70,000 genes in new chromosomal cassettes of two or more genes in the GEBA genomes. This represents a three- to sixfold increase over equivalent sets of non-GEBA genomes (Table 1). Similarly, the number of new gene fusions identified in the GEBA genomes is 4 to ∼13 times greater than in randomly selected genome sets (Table 1). Because the GEBA data set produced a several-fold improvement over random sets for all metrics examined (Table 1), we predict that other aspects of sequence-based biological discovery will similarly benefit from tree-based genome sequencing.

The GEBA genomes also show significant phylogenetic expansions within known protein families. For example, although only two of the 56 GEBA organisms are known cellulose degraders, we identified in the set of genomes a variety of glycoside hydrolase (GH) genes that may participate in the breakdown of cellulose and hemicelluloses. Among these are 28 and 7 phylogenetically divergent members of the endoglucanase- and processive exoglucanase-containing GH6 and GH48 families, respectively. Halorhabdus utahensis, a halophilic archaeon known to have β-xylanase and β-xylosidase activities²¹, has a chromosomal cluster including two GH10 family β-xylanases and six novel GH5 family proteins of unknown specificity.

The enrichment of genetic diversity is also seen within families of non-coding RNAs, transposable elements, and other cellular components. For example, the genome of the marine myxobacterium Haliangium ochraceum contains 807 CRISPR (clustered regularly interspaced short palindromic repeats) units including the largest single CRISPR array known, comprising 382 spacer/repeat units. CRISPR is a newly recognized, but ancient and widespread, system in bacteria and archaea that confers resistance to viruses and other invading foreign DNAs²².

Results from the GEBA pilot project challenge our current understanding for the taxonomic distribution of known gene families. The most striking example of which is the discovery of an actin homologue in H. ochraceum. Actin and its close relatives are structural components of the eukaryotic cytoskeleton that are found in every eukaryote and only in eukaryotes. Bacteria and archaea encode instead the shape-determining protein MreB. Although MreBs have some functional and structural similarities to eukaryotic actins, they are regarded, at best, distantly related homologues²³ and possibly not even homologous. Like other bacteria, H. ochraceum encodes a bona fide MreB protein, but in addition, it encodes a protein that is clearly a member of the actin family, which we have named BARP (bacterial actin-related protein; Fig. 3). Although we do not yet have evidence for its precise function, BARP is expressed in H. ochraceum (Fig. 3b). Assuming that the H. ochraceum mreB orthologue performs the same function as in other bacteria, and given that the myxobacteria, to which this species belongs, are known to synthesize actin-targeting toxins²⁴, we propose that this BARP may be a dominant-negative inhibitor of eukaryotic actin polymerization. Regardless of its precise function, this first—and so far only—discovery of an expressed homologue of eukaryotic actin in a member of the Bacteria highlights the potential for novel and surprising biological discoveries given a wider genomic sampling of the tree of life.

Figure 3: **A bacterial homologue of actin.**

We conclude that targeting microorganisms for genome sequencing solely on the basis of phylogenetic considerations offers significant far-reaching benefits in diverse areas. Furthermore, the benefits of phylogenetically driven genome sequencing show no sign of saturating with these first 56 genomes. A key question then lies in determining how much bacterial and archaeal diversity remains to be sampled. Using SSU rRNA gene sequences as a proxy for organismal diversity (Fig. 4), we estimate that sequencing the genomes of only 1,520 phylogenetically selected isolates could encompass half of the phylogenetic diversity represented by known cultured bacteria and archaea. Given the continuing reductions in both the cost and difficulty in sequencing genomes²⁵, this is certainly a tractable target in the next few years.

Figure 4: **Phylogenetic diversity of bacteria and archaea on the basis of SSU rRNA genes.**

However, the great majority of recognized bacterial and archaeal diversity is not represented by pure cultures and an additional 9,218 genome sequences from currently uncultured species would be required to capture 50% of this recognized diversity (Fig. 4). Such an undertaking will require new approaches to culturing or processing of multi-species samples using methods such as metagenomics²⁶ or physical isolation of cells from mixed populations followed by whole genome amplification methods²⁷. Obtaining reference genomes for the uncultured microbial majority will be a natural extension of the GEBA project, the ultimate goal of which is to provide a phylogenetically balanced genomic representation of the microbial tree of life. The pilot study presented here is a dedicated first step in this direction.

Methods Summary

Starting with a phylogenetic tree of SSU rRNA genes⁷, we identified major branches that had no available genome sequences but for which cultured isolates were available in the DSMZ or ATCC culture collections. Selected isolates (Supplementary Table 1a, b) from these branches were grown and DNA isolated (Supplementary Table 1c) and quality checked. DNA was then used for shotgun genome sequencing by Sanger/ABI, Roche/454 and/or Illumina/Solexa technologies (Supplementary Table 2). Sequence reads were assembled separately with different assembly methods and the best draft assembly was used for annotation and as a starting point for genome completion (current genome status is in Supplementary Table 2). Annotation (gene identification, functional prediction, etc.) was performed using the IMG system (http://img.jgi.doe.gov/geba); this was done both after shotgun sequencing and again after genome completion. For ‘whole genome tree’ analysis, a PHYML maximum likelihood phylogenetic tree of a concatenated alignment of 31 marker genes was built using AMPHORA¹⁶. Phylogenetic diversity was calculated as the sum of branch lengths in this and other trees. Protein families were built for various genome sets by using the Markov clustering algorithm (MCL)²⁸ to group proteins on the basis of ‘all versus all’ blastp searches. For analysis of phylogenetic diversity of organisms, a phylogenetic tree was built for a combined alignment of SSU rRNA sequences from published genomes and a non-redundant subset of greengenes SSU rRNA⁷. Further analysis of the genomes was done using IMG database queries and new computational analyses as described in the main text, legends and Supplementary Methods.

Accession codes

Data deposits

Genome sequence and annotation data is available at the JGI IMG-GEBA page http://img.jgi.doe.gov/geba and has been submitted to GenBank with accessions ABSZ00000000, ABTA00000000, ABTB00000000, ABTC00000000, ABTD00000000, CP001618, ABTF00000000, ABTG00000000, ABTH00000000, ABTI00000000, ABTJ00000000, ABTK00000000, ABTM00000000, NZ_ABTN00000000, NZ_ABTO00000000, ABTP00000000, ABTQ00000000, NZ_ABTR00000000, NZ_ABTS00000000, NZ_ABTT00000000, NZ_ABTU00000000, NZ_ABTV00000000, NZ_ABTW00000000, NZ_ABTX00000000, NZ_ABTY00000000, NZ_ABTZ00000000, NZ_ABUA00000000, NZ_ABUB00000000, NZ_ABUC00000000, NZ_ABUD00000000. NZ_ABUE00000000, NZ_ABUF00000000, NZ_ABUG00000000, NZ_ABUH00000000, NZ_ABUI00000000, NZ_ABUJ00000000, NZ_ABUK00000000, ABUL00000000, NZ_ABUM00000000, NZ_ABUO00000000, NZ_ABUP00000000, ABUQ00000000, NZ_ABUR00000000, ABUS00000000, NZ_ABUT00000000, NZ_ABUU00000000, NZ_ABUV00000000, NZ_ABUW00000000, NZ_ABUX00000000, NZ_ABUZ00000000, NZ_ABVA00000000, NZ_ABVB00000000 and NZ_ABVC00000000. All strains that have been sequenced are available from the DSMZ culture collection and culture accessions are available in Supplementary Information. Further details on sequencing and genome properties of each organism are being published in the journal Standards in Genomic Sciences (SIGS) (http://standardsingenomics.org/).

References

Fraser, C. M., Eisen, J. A. & Salzberg, S. L. Microbial genome sequencing. Nature 406, 799–803 (2000)
Article CAS Google Scholar
Liolios, K., Mavromatis, K., Tavernarakis, N. & Kyrpides, N. C. The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res. 36 (database issue). D475–D479 (2008)
Article CAS Google Scholar
Hugenholtz, P. Exploring prokaryotic diversity in the genomic era. Genome Biol. 3, REVIEWS0003.1–REVIEWS0003.8 (2002)
Article Google Scholar
Eisen, J. A. Assessing evolutionary relationships among microbes from whole-genome analysis. Curr. Opin. Microbiol. 3, 475–480 (2000)
Article CAS Google Scholar
Wu, D. et al. Complete genome sequence of the aerobic CO-oxidizing thermophile Thermomicrobium roseum . PLoS One 4, e4207 (2009)
Article ADS Google Scholar
Pace, N. R. A molecular view of microbial diversity and the biosphere. Science 276, 734–740 (1997)
Article CAS Google Scholar
DeSantis, T. Z. et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072 (2006)
Article CAS Google Scholar
Bernal, A., Ear, U. & Kyrpides, N. Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 29, 126–127 (2001)
Article CAS Google Scholar
Lapage, S. P. et al. International Code of Nomenclature of Bacteria, 1990 Revision. (American Society for Microbiology, 1992)
Ward, N., Eisen, J., Fraser, C. & Stackebrandt, E. Sequenced strains must be saved from extinction. Nature 414, 148 (2001)
Article ADS CAS Google Scholar
Hugenholtz, P. & Kyrpides, N. C. A changing of the guard. Environ. Microbiol. 11, 551–553 (2009)
Article Google Scholar
Field, D. et al. The minimum information about a genome sequence (MIGS) specification. Nature Biotechnol. 26, 541–547 (2008)
Article CAS Google Scholar
Markowitz, V. M. et al. The integrated microbial genomes (IMG) system in 2007: data content and analysis tool extensions. Nucleic Acids Res. 36 (database issue). D528–D533 (2008)
Article CAS Google Scholar
Achtman, M. & Wagner, M. Microbial diversity and the genetic nature of microbial species. Nature Rev. Microbiol. 6, 431–440 (2008)
Article CAS Google Scholar
Beiko, R. G., Doolittle, W. F. & Charlebois, R. L. The impact of reticulate evolution on genome phylogeny. Syst. Biol. 57, 844–856 (2008)
Article Google Scholar
Wu, M. & Eisen, J. A. A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 9, R151 (2008)
Article Google Scholar
Pardi, F. & Goldman, N. Resource-aware taxon selection for maximizing phylogenetic diversity. Syst. Biol. 56, 431–444 (2007)
Article Google Scholar
Kunin, V., Cases, I., Enright, A. J., de Lorenzo, V. & Ouzounis, C. A. Myriads of protein families, and still counting. Genome Biol. 4, 401 (2003)
Article Google Scholar
Marcotte, E. M. et al. Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753 (1999)
Article CAS Google Scholar
Enright, A. J., Iliopoulos, I., Kyrpides, N. C. & Ouzounis, C. A. Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90 (1999)
Article ADS CAS Google Scholar
Wainø, M. & Ingvorsen, K. Production of β-xylanase and β-xylosidase by the extremely halophilic archaeon Halorhabdus utahensis . Extremophiles 7, 87–93 (2003)
Article Google Scholar
Barrangou, R. et al. CRISPR provides acquired resistance against viruses in prokaryotes. Science 315, 1709–1712 (2007)
Article ADS CAS Google Scholar
Doolittle, R. F. & York, A. L. Bacterial actins? An evolutionary perspective. Bioessays 24, 293–296 (2002)
Article CAS Google Scholar
Sasse, F., Kunze, B., Gronewold, T. M. & Reichenbach, H. The chondramides: cytostatic agents from myxobacteria acting on the actin cytoskeleton. J. Natl. Cancer Inst. 90, 1559–1563 (1998)
Article CAS Google Scholar
Shendure, J. & Ji, H. Next-generation DNA sequencing. Nature Biotechnol. 26, 1135–1145 (2008)
Article CAS Google Scholar
Kunin, V., Copeland, A., Lapidus, A., Mavromatis, K. & Hugenholtz, P. A bioinformatician’s guide to metagenomics. Microbiol. Mol. Biol. Rev. 72, 557–578 (2008)
Article CAS Google Scholar
Ishoey, T., Woyke, T., Stepanauskas, R., Novotny, M. & Lasken, R. S. Genomic sequencing of single microbial cells from environmental samples. Curr. Opin. Microbiol. 11, 198–204 (2008)
Article CAS Google Scholar
Enright, A. J., Van Dongen, S. & Ouzounis, C. A. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584 (2002)
Article CAS Google Scholar
Matsuura, Y. et al. Structural basis for the higher Ca²⁺-activation of the regulated actin-activated myosin ATPase observed with Dictyostelium/Tetrahymena actin chimeras. J. Mol. Biol. 296, 579–595 (2000)
Article CAS Google Scholar
Moulton, V., Semple, C. & Steel, M. Optimizing phylogenetic diversity under constraints. J. Theor. Biol. 246, 186–194 (2007)
Article MathSciNet Google Scholar

Download references

Acknowledgements

We thank the following people for assistance in aspects of the project including planning and discussions (R. Stevens, G. Olsen, R. Edwards, J. Bristow, N. Ward, S. Baker, T. Lowe, J. Tiedje, G. Garrity, A. Darling, S. Giovannoni), analysis of genomes whose work could not be included in this report (B. Henrissat, G. Xie, J. Kinney, I. Paulsen, N. Rawlings, M. Huntemann), project management (M. Miller, M. Fenner, M. McGowen, A. Greiner), sequencing and finishing (K. Ikeda, M. Chovatia, P. Richardson, T. Glavinadelrio, C. Detter), culture growth, DNA extraction, and metadata (D. Gleim, E. Brambilla, S. Schneider, M. Schröder, M. Jando, G. Gehrich-Schröter, C. Wahrenburg, K. Steenblock, S. Welnitz, M. Kopitz, R. Fähnrich, H. Pomrenke, A. Schütze, M. Rohde, M. Göker), and manuscript editing (M. Youle). This work was performed under the auspices of the US Department of Energy’s Office of Science, Biological and Environmental Research Program, and by the University of California, Lawrence Berkeley National Laboratory under contract no. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under contract no. DE-AC52-07NA27344, and Los Alamos National Laboratory under contract no. DE-AC02-06NA25396. Support for J.A.E., D.W. and M.W. was provided by the Gordon and Betty Moore Foundation Grant no. 1660 to J.A.E. Support for work at DSMZ was provided under DFG INST 599/1-1.

Author Contributions D.W. (rRNA analysis, gene families, actin tree, manuscript preparation), P.H. (selection of strains, analysis, manuscript preparation, project coordination), L.G. and D.B. (project management), R.P., B.J.T., E.L., S.G., S.S. (strain curation and growth), K.M., N.N.I., I.J.A., S.D.H., A.P., A.Ly. (annotation, genome analysis), V.K. (CRISPRs, actin), M.W. (whole genome tree), P.D., C.K., A.Z. and M.S. (actin studies), M.N., S.L., J.-F.C., F.C. and E.D. (sequencing), C.H., A.La., M.N. and A.C. (finishing), P.C. (analysis), E.M.R. (manuscript preparation), N.C.K. (selection of strains, annotation, analysis), H.-P.K. (strain selection and growth, DNA preparation, manuscript preparation), J.A.E. (project lead and coordination, analysis, manuscript preparation).

Author information

Authors and Affiliations

DOE Joint Genome Institute, Walnut Creek, California 94598, USA ,
Dongying Wu, Philip Hugenholtz, Konstantinos Mavromatis, Eileen Dalin, Natalia N. Ivanova, Victor Kunin, Sean D. Hooper, Amrita Pati, Athanasios Lykidis, Iain J. Anderson, Patrik D’haeseleer, Alla Lapidus, Matt Nolan, Alex Copeland, Feng Chen, Jan-Fang Cheng, Susan Lucas, Cheryl Kerfeld, Patrick Chain, Edward M. Rubin, Nikos C. Kyrpides & Jonathan A. Eisen
University of California, Davis, Davis, California 95616, USA ,
Dongying Wu, Mitchell Singer & Jonathan A. Eisen
DSMZ, German Collection of Microorganisms and Cell Cultures, 38124 Braunschweig, Germany
Rüdiger Pukall, Brian J. Tindall, Stefan Spring, Elke Lang, Sabine Gronow & Hans-Peter Klenk
DOE Joint Genome Institute-Los Alamos National Laboratory, Los Alamos, California 87545, USA ,
Lynne Goodwin, Cliff Han, Patrick Chain & David Bruce
University of Virginia, Charlottesville, Virginia 22904, USA ,
Martin Wu
Lawrence Livermore National Laboratory, Livermore, California 94550, USA ,
Patrik D’haeseleer & Adam Zemla

Authors

Dongying Wu
View author publications
You can also search for this author in PubMed Google Scholar
Philip Hugenholtz
View author publications
You can also search for this author in PubMed Google Scholar
Konstantinos Mavromatis
View author publications
You can also search for this author in PubMed Google Scholar
Rüdiger Pukall
View author publications
You can also search for this author in PubMed Google Scholar
Eileen Dalin
View author publications
You can also search for this author in PubMed Google Scholar
Natalia N. Ivanova
View author publications
You can also search for this author in PubMed Google Scholar
Victor Kunin
View author publications
You can also search for this author in PubMed Google Scholar
Lynne Goodwin
View author publications
You can also search for this author in PubMed Google Scholar
Martin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Brian J. Tindall
View author publications
You can also search for this author in PubMed Google Scholar
Sean D. Hooper
View author publications
You can also search for this author in PubMed Google Scholar
Amrita Pati
View author publications
You can also search for this author in PubMed Google Scholar
Athanasios Lykidis
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Spring
View author publications
You can also search for this author in PubMed Google Scholar
Iain J. Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Patrik D’haeseleer
View author publications
You can also search for this author in PubMed Google Scholar
Adam Zemla
View author publications
You can also search for this author in PubMed Google Scholar
Mitchell Singer
View author publications
You can also search for this author in PubMed Google Scholar
Alla Lapidus
View author publications
You can also search for this author in PubMed Google Scholar
Matt Nolan
View author publications
You can also search for this author in PubMed Google Scholar
Alex Copeland
View author publications
You can also search for this author in PubMed Google Scholar
Cliff Han
View author publications
You can also search for this author in PubMed Google Scholar
Feng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jan-Fang Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Susan Lucas
View author publications
You can also search for this author in PubMed Google Scholar
Cheryl Kerfeld
View author publications
You can also search for this author in PubMed Google Scholar
Elke Lang
View author publications
You can also search for this author in PubMed Google Scholar
Sabine Gronow
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Chain
View author publications
You can also search for this author in PubMed Google Scholar
David Bruce
View author publications
You can also search for this author in PubMed Google Scholar
Edward M. Rubin
View author publications
You can also search for this author in PubMed Google Scholar
Nikos C. Kyrpides
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Klenk
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan A. Eisen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan A. Eisen.

Supplementary information

Supplementary Information

This file contains Supplementary Methods, Supplementary References, Supplementary Figure 1 with Legend and Supplementary Tables 1 A-C and 2. (PDF 2302 kb)

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution-Non-Commercial-Share Alike licence (http://creativecommons.org/licenses/by-nc-sa/3.0/), which permits distribution, and reproduction in any medium, provided the original author and source are credited. This licence does not permit commercial exploitation, and derivative works must be licensed under the same or similar licence.

Reprints and permissions

About this article

Cite this article

Wu, D., Hugenholtz, P., Mavromatis, K. et al. A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462, 1056–1060 (2009). https://doi.org/10.1038/nature08656

Download citation

Received: 03 June 2009
Accepted: 30 October 2009
Issue Date: 24 December 2009
DOI: https://doi.org/10.1038/nature08656

This article is cited by

Long-read metagenomics paves the way toward a complete microbial tree of life
- Mads Albertsen
Nature Methods (2023)
Previously uncharacterized rectangular bacterial structures in the dolphin mouth
- Natasha K. Dudek
- Jesus G. Galaz-Montoya
- David A. Relman
Nature Communications (2023)
Accounting for 16S rRNA copy number prediction uncertainty and its implications in bacterial diversity analyses
- Yingnan Gao
- Martin Wu
ISME Communications (2023)
Baseline metagenome-assembled genome (MAG) data of Sikkim hot springs from Indian Himalayan geothermal belt (IHGB) showcasing its potential CAZymes, and sulfur-nitrogen metabolic activity
- Sayak Das
- Ishfaq Nabi Najar
- Nagendra Thakur
World Journal of Microbiology and Biotechnology (2023)
Ecosystem-specific microbiota and microbiome databases in the era of big data
- Victor Lobanov
- Angélique Gobet
- Alyssa Joyce
Environmental Microbiome (2022)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea

Abstract

Similar content being viewed by others

A complete domain-to-species taxonomy for Bacteria and Archaea

Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea

A standardized archaeal taxonomy for the Genome Taxonomy Database

Main

Methods Summary

Accession codes

Data deposits

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary information

Supplementary Information

PowerPoint slides

PowerPoint slide for Fig. 1

PowerPoint slide for Fig. 2

PowerPoint slide for Fig. 3

PowerPoint slide for Fig. 4

Rights and permissions

About this article

Cite this article

This article is cited by

Long-read metagenomics paves the way toward a complete microbial tree of life

Previously uncharacterized rectangular bacterial structures in the dolphin mouth

Accounting for 16S rRNA copy number prediction uncertainty and its implications in bacterial diversity analyses

Baseline metagenome-assembled genome (MAG) data of Sikkim hot springs from Indian Himalayan geothermal belt (IHGB) showcasing its potential CAZymes, and sulfur-nitrogen metabolic activity

Ecosystem-specific microbiota and microbiome databases in the era of big data

Comments

Search

Quick links

Abstract

Similar content being viewed by others

Main

Methods Summary

Accession codes

Data deposits

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Supplementary information

PowerPoint slides

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links