Species phylogenies derived from comparisons of single genes are rarely consistent with each other, due to horizontal gene transfer1, unrecognized paralogy and highly variable rates of evolution2. The advent of completely sequenced genomes allows the construction of a phylogeny that is less sensitive to such inconsistencies and more representative of whole-genomes than are single-gene trees. Here, we present a distance-based phylogeny3 constructed on the basis of gene content, rather than on sequence identity, of 13 completely sequenced genomes of unicellular species. The similarity between two species is defined as the number of genes that they have in common divided by their total number of genes. In this type of phylogenetic analysis, evolutionary distance can be interpreted in terms of evolutionary events such as the acquisition and loss of genes, whereas the underlying properties (the gene content) can be interpreted in terms of function. As such, it takes a position intermediate to phylogenies based on single genes and phylogenies based on phenotypic characteristics. Although our comprehensive genome phylogeny is independent of phylogenies based on the level of sequence identity of individual genes, it correlates with the standard reference of prokarytic phylogeny based on sequence similarity of 16s rRNA (ref. 4). Thus, shared gene content between genomes is quantitatively determined by phylogeny, rather than by phenotype, and horizontal gene transfer has only a limited role in determining the gene content of genomes.
Subscribe to Journal
Get full journal access for 1 year
only $4.92 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Rent or Buy article
Get time limited or full article access on ReadCube.
All prices are NET prices.
Doolittle, W.F. & Logsdon, J.M. Archaeal genomics: do Archaea have a mixed heritage? Curr. Biol. 8, R209–R211 (1998).
Huynen, M.A. & Bork, P. Measuring genome evolution. Proc. Natl Acad. Sci. USA 95, 5849– 5856 (1998).
Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).
Olsen, G.J., Woese, C.R. & Overbeek, R. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol. 176, 1– 6 (1994).
Fitch, W.M. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–110 ( 1970).
Maidak, B.L. et al. The RDP (Ribosomal Database Project). Nucleic Acids Res. 25, 109–111 ( 1997).
Klenk, H. & Zillig, W. DNA-dependent RNA polymerase subunit B as a tool for phylogenetic reconstructions: branching topology of the archaeal domain. J. Mol. Evol. 38, 420– 432 (1994).
Baldauf, S., Palmer, J.D. & Doolittle, W.F. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc. Natl Acad. Sci. USA 93, 7749–7754 ( 1996).
Gruber, T.M. & Bryant, D.A. Molecular systematic studies of eubacteria, using σ70-type σ factors of group 1 and group 2. J. Bacteriol. 179, 1734–1747 (1997).
Huynen, M.A., Dandekar, T. & Bork, P. Differential genome analysis applied to the species-specific features of Helicobacter pylori. FEBS Lett. 426, 1–5 (1998).
Lawrence, J.G. & Ochman, H. Molecular archaeology of the Escherichia coli genome. Proc. Natl Acad. Sci. USA 95, 9413–9417 ( 1998).
Smith, T. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195– 197 (1981).
Pearson, W. Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84 (1998).
Brenner, S., Chotia, C. & Hubbard, T.J. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl Acad. Sci. USA 95, 6073–6078 (1998).
Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 ( 1997).
Fleishmann, R. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae. Science 269, 496– 512 (1995).
Fraser, C.M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 (1995).
Kaneko, T. et al. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. ii. sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res. 3, 109–136 ( 1996).
Bult, C.J. et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058–1072 (1996).
Blattner, F.E. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453– 1462 (1997).
Smith, D.R. et al. Complete genome sequence of Methanobacterium thermoautotrophicum δH: functional analysis and comparative genomics. J. Bacteriol. 17, 7135–7155 (1997).
Tomb, J.-F. et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539 –547 (1997).
Klenk, H.P. et al. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 390, 364–370 (1997).
Kunst, F. et al. The complete genome sequence of the Gram-positive bacterium Bacillus subtilis. Nature 390, 249– 256 (1997).
Fraser, C.M. et al. Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390, 580– 586 (1997).
Mewes, H.W. et al. Overview of the yeast genome. Nature 387, 7–65 (1997).
Deckert, G. et al. The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392, 353– 358 (1998).
Kawarabayasi, Y. et al. Complete sequence and gene organization of the genome of a hyper-thermophylic archaebacterium Pyrococcus horikoshii OT3. DNA Res. 5, 55–76 (1998).
Wu, C.F.J. Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Stat. 14, 1261–1295 (1986).
Himmelreich, R., Plagens, H., Hilbert, H., Reiner, B. & Herrmann, R. Comparative analysis of the genomes of the bacteria Mycoplasma pneumoniae and Mycoplasma genitalium. Nucleic Acids Res. 24, 4420–4449 (1996).
This work was supported by BMBF.
About this article
Cite this article
Snel, B., Bork, P. & Huynen, M. Genome phylogeny based on gene content. Nat Genet 21, 108–110 (1999). https://doi.org/10.1038/5052
Metapangenomics of the oral microbiome provides insights into habitat adaptation and cultivar diversity
Genome Biology (2020)
Nature Reviews Microbiology (2020)
Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity
The ISME Journal (2020)
Taxogenomic assessment and genomic characterisation of Weissella cibaria strain 92 able to metabolise oligosaccharides derived from dietary fibres
Scientific Reports (2020)
Antibiotic resistance and metabolic profiles as functional biomarkers that accurately predict the geographic origin of city metagenomics samples
Biology Direct (2019)