Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Genome phylogeny based on gene content

Abstract

Species phylogenies derived from comparisons of single genes are rarely consistent with each other, due to horizontal gene transfer1, unrecognized paralogy and highly variable rates of evolution2. The advent of completely sequenced genomes allows the construction of a phylogeny that is less sensitive to such inconsistencies and more representative of whole-genomes than are single-gene trees. Here, we present a distance-based phylogeny3 constructed on the basis of gene content, rather than on sequence identity, of 13 completely sequenced genomes of unicellular species. The similarity between two species is defined as the number of genes that they have in common divided by their total number of genes. In this type of phylogenetic analysis, evolutionary distance can be interpreted in terms of evolutionary events such as the acquisition and loss of genes, whereas the underlying properties (the gene content) can be interpreted in terms of function. As such, it takes a position intermediate to phylogenies based on single genes and phylogenies based on phenotypic characteristics. Although our comprehensive genome phylogeny is independent of phylogenies based on the level of sequence identity of individual genes, it correlates with the standard reference of prokarytic phylogeny based on sequence similarity of 16s rRNA (ref. 4). Thus, shared gene content between genomes is quantitatively determined by phylogeny, rather than by phenotype, and horizontal gene transfer has only a limited role in determining the gene content of genomes.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Relationship between the number of genes in a genome and the number of genes that have a closest relative (Table 1) in another genome.
Figure 2: Genome phylogeny.

References

  1. 1

    Doolittle, W.F. & Logsdon, J.M. Archaeal genomics: do Archaea have a mixed heritage? Curr. Biol. 8, R209–R211 (1998).

    CAS  Article  Google Scholar 

  2. 2

    Huynen, M.A. & Bork, P. Measuring genome evolution. Proc. Natl Acad. Sci. USA 95, 5849– 5856 (1998).

    CAS  Article  Google Scholar 

  3. 3

    Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987).

    CAS  Google Scholar 

  4. 4

    Olsen, G.J., Woese, C.R. & Overbeek, R. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol. 176, 1– 6 (1994).

    CAS  Article  Google Scholar 

  5. 5

    Fitch, W.M. Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–110 ( 1970).

    CAS  Article  Google Scholar 

  6. 6

    Maidak, B.L. et al. The RDP (Ribosomal Database Project). Nucleic Acids Res. 25, 109–111 ( 1997).

    CAS  Article  Google Scholar 

  7. 7

    Klenk, H. & Zillig, W. DNA-dependent RNA polymerase subunit B as a tool for phylogenetic reconstructions: branching topology of the archaeal domain. J. Mol. Evol. 38, 420– 432 (1994).

    CAS  Article  Google Scholar 

  8. 8

    Baldauf, S., Palmer, J.D. & Doolittle, W.F. The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny. Proc. Natl Acad. Sci. USA 93, 7749–7754 ( 1996).

    CAS  Article  Google Scholar 

  9. 9

    Gruber, T.M. & Bryant, D.A. Molecular systematic studies of eubacteria, using σ70-type σ factors of group 1 and group 2. J. Bacteriol. 179, 1734–1747 (1997).

    CAS  Article  Google Scholar 

  10. 10

    Huynen, M.A., Dandekar, T. & Bork, P. Differential genome analysis applied to the species-specific features of Helicobacter pylori. FEBS Lett. 426, 1–5 (1998).

    CAS  Article  Google Scholar 

  11. 11

    Lawrence, J.G. & Ochman, H. Molecular archaeology of the Escherichia coli genome. Proc. Natl Acad. Sci. USA 95, 9413–9417 ( 1998).

    CAS  Article  Google Scholar 

  12. 12

    Smith, T. & Waterman, M.S. Identification of common molecular subsequences. J. Mol. Biol. 147, 195– 197 (1981).

    CAS  Article  Google Scholar 

  13. 13

    Pearson, W. Empirical statistical estimates for sequence similarity searches. J. Mol. Biol. 276, 71–84 (1998).

    CAS  Article  Google Scholar 

  14. 14

    Brenner, S., Chotia, C. & Hubbard, T.J. Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. Proc. Natl Acad. Sci. USA 95, 6073–6078 (1998).

    CAS  Article  Google Scholar 

  15. 15

    Tatusov, R.L., Koonin, E.V. & Lipman, D.J. A genomic perspective on protein families. Science 278, 631–637 ( 1997).

    CAS  Article  Google Scholar 

  16. 16

    Fleishmann, R. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae. Science 269, 496– 512 (1995).

    Article  Google Scholar 

  17. 17

    Fraser, C.M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 (1995).

    CAS  Article  Google Scholar 

  18. 18

    Kaneko, T. et al. Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. ii. sequence determination of the entire genome and assignment of potential protein-coding regions. DNA Res. 3, 109–136 ( 1996).

    CAS  Article  Google Scholar 

  19. 19

    Bult, C.J. et al. Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii. Science 273, 1058–1072 (1996).

    CAS  Article  Google Scholar 

  20. 20

    Blattner, F.E. et al. The complete genome sequence of Escherichia coli K-12. Science 277, 1453– 1462 (1997).

    CAS  Article  Google Scholar 

  21. 21

    Smith, D.R. et al. Complete genome sequence of Methanobacterium thermoautotrophicum δH: functional analysis and comparative genomics. J. Bacteriol. 17, 7135–7155 (1997).

    Article  Google Scholar 

  22. 22

    Tomb, J.-F. et al. The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature 388, 539 –547 (1997).

    CAS  Article  Google Scholar 

  23. 23

    Klenk, H.P. et al. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 390, 364–370 (1997).

    CAS  Article  Google Scholar 

  24. 24

    Kunst, F. et al. The complete genome sequence of the Gram-positive bacterium Bacillus subtilis. Nature 390, 249– 256 (1997).

    CAS  Article  Google Scholar 

  25. 25

    Fraser, C.M. et al. Genomic sequence of a Lyme disease spirochaete, Borrelia burgdorferi. Nature 390, 580– 586 (1997).

    CAS  Article  Google Scholar 

  26. 26

    Mewes, H.W. et al. Overview of the yeast genome. Nature 387, 7–65 (1997).

    Article  Google Scholar 

  27. 27

    Deckert, G. et al. The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392, 353– 358 (1998).

    CAS  Article  Google Scholar 

  28. 28

    Kawarabayasi, Y. et al. Complete sequence and gene organization of the genome of a hyper-thermophylic archaebacterium Pyrococcus horikoshii OT3. DNA Res. 5, 55–76 (1998).

    CAS  Article  Google Scholar 

  29. 29

    Wu, C.F.J. Jackknife, bootstrap and other resampling methods in regression analysis. Ann. Stat. 14, 1261–1295 (1986).

    Article  Google Scholar 

  30. 30

    Himmelreich, R., Plagens, H., Hilbert, H., Reiner, B. & Herrmann, R. Comparative analysis of the genomes of the bacteria Mycoplasma pneumoniae and Mycoplasma genitalium. Nucleic Acids Res. 24, 4420–4449 (1996).

    CAS  Article  Google Scholar 

Download references

Acknowledgements

This work was supported by BMBF.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Peer Bork.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Snel, B., Bork, P. & Huynen, M. Genome phylogeny based on gene content. Nat Genet 21, 108–110 (1999). https://doi.org/10.1038/5052

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing