Key Points
-
A minimal set of genes that is necessary and sufficient for sustaining a functional cell can be delineated either by computational comparisons of microbial genomes or experimentally by knocking out genes in simple microbes.
-
The minimal gene-set needs to be defined together with the environmental conditions under which these genes are sufficient to support a cell. For the most favourable conditions, with all nutrients provided and no environmental stress, computational and experimental approaches agree on 250–300 genes as the size of the minimal set.
-
For most essential cellular functions, two or more unrelated or distantly related proteins have evolved; only ∼60 proteins — primarily those involved in translation and the basic aspects of transcription — are conserved in all cellular life-forms. Therefore, even for the same conditions, there can be many versions of the minimal gene-set.
-
The reconstruction of ancestral life-forms is based on the principle of evolutionary parsimony: the simplest scenario is developed so as to reconcile the observed distribution of genes among species with the species tree. The size and composition of the reconstructed ancestral gene repertoires depend on relative rates of gene loss and horizontal gene-transfer, two phenomena that have been central to microbial evolution.
-
The parsimony approach suggests that the last universal common ancestor (LUCA) of all extant life forms might have had as few as 500–600 genes. The gene set of LUCA that is derived in this fashion might resemble the minimal gene-set for a free-living prokaryote. However, arguments have also been made for a more complex LUCA.
-
The experimental investigation of various versions of the minimal gene-set for cellular life and reconstructed ancestral life-forms might be an important research direction in the second and third decades of the twenty-first century.
Abstract
Comparative genomics, using computational and experimental methods, enables the identification of a minimal set of genes that is necessary and sufficient for sustaining a functional cell. For most essential cellular functions, two or more unrelated or distantly related proteins have evolved; only about 60 proteins, primarily those involved in translation, are common to all cellular life. The reconstruction of ancestral life-forms is based on the principle of evolutionary parsimony, but the size and composition of the reconstructed ancestral gene-repertoires depend on relative rates of gene loss and horizontal gene-transfer. The present estimate suggests a simple last universal common ancestor with only 500–600 genes.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Fleischmann, R. D. et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269, 496–512 (1995). The first bacterial genome sequenced.
Fraser, C. M. et al. The minimal gene complement of Mycoplasma genitalium. Science 270, 397–403 (1995). The second bacterial genome sequenced, and still the smallest.
Fraser, C. M., Eisen, J. A. & Salzberg, S. L. Microbial genome sequencing. Nature 406, 799–803 (2000).
Koonin, E. V., Aravind, L. & Kondrashov, A. S. The impact of comparative genomics on our understanding of evolution. Cell 101, 573–576 (2000).
Alberts, B. et al. Molecular Biology of the Cell (Garland Science, New York, 2002).
Gerstein, M. & Hegyi, H. Comparing genomes in terms of protein structure: surveys of a finite parts list. FEMS Microbiol. Rev. 22, 277–304 (1998).
Mushegian, A. R. & Koonin, E. V. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc. Natl Acad. Sci. USA 93, 10268–10273 (1996). The first attempt to derive a minimal gene-set using a comparative-genomic computational approach (comparing the gene sets of H. influenzae and M. genitalium , the only two bacterial genomes sequenced at the time).
Maniloff, J. The minimal cell genome: 'on being the right size'. Proc. Natl Acad. Sci. USA 93, 10004–10006 (1996).
Mushegian, A. The minimal genome concept. Curr. Opin. Genet. Dev. 9, 709–714 (1999).
Koonin, E. V. How many genes can make a cell: the minimal-gene-set concept. Annu. Rev. Genomics Hum. Genet. 1, 99–116 (2000).
Zimmer, C. Tinker, tailor: can Venter stitch together a genome from scratch? Science 299, 1006–1007 (2003). The closest so far to a scientific publication on the brave new project of minimal-genome construction.
Katinka, M. D. et al. Genome sequence and gene compaction of the eukaryote parasite Encephalitozoon cuniculi. Nature 414, 450–453 (2001).
Huber, H. et al. A new phylum of Archaea represented by a nanosized hyperthermophilic symbiont. Nature 417, 63–67 (2002).
Deckert, G. et al. The complete genome of the hyperthermophilic bacterium Aquifex aeolicus. Nature 392, 353–358 (1998).
Gerdes, S. Y. et al. Experimental determination and system-level analysis of essential genes in Escherichia coli MG1655. J. Bacteriol. 185, 5673–5684. A thorough experimental and theoretical analysis of the essential genes of E. coli.
Rottem, S. Interaction of mycoplasmas with host cells. Physiol. Rev. 83, 417–432 (2003).
Pauling, L. & Zuckerkandl, E. Chemical paleogenetics. Molecular 'restoration studies' of extinct forms of life. Acta Chemica Scandinavica 17, S9–S16 (1963).
Fitch, W. M. Distinguishing homologous from analogous proteins. Systematic Zoology 19, 99–106 (1970).
Fitch, W. M. Homology: a personal view on some of the problems. Trends Genet. 16, 227–231 (2000).
Sonnhammer, E. L. & Koonin, E. V. Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. 18, 619–620 (2002).
Tatusov, R. L., Koonin, E. V. & Lipman, D. J. A genomic perspective on protein families. Science 278, 631–637 (1997).
Huynen, M. A. & Bork, P. Measuring genome evolution. Proc. Natl Acad. Sci. USA 95, 5849–5856 (1998).
Koonin, E. V., Mushegian, A. R. & Bork, P. Non-orthologous gene displacement. Trends Genet. 12, 334–336 (1996).
Koonin, E. V. & Galperin, M. Y. Sequence — Evolution — Function. Computational Approaches in Comparative Genomics (Kluwer Academic, New York, 2002).
Gil, R. et al. The genome sequence of Blochmannia floridanus: comparative analysis of reduced genomes. Proc. Natl Acad. Sci. USA 100, 9388–9393 (2003).
Itaya, M. An estimation of minimal genome size required for life. FEBS Lett. 362, 257–260 (1995). A prescient attempt to estimate the minimal genome size in the pre-genomic era. The estimate comes uncannily close to those based on computational and experimental analysis of complete genomes.
Venter, J. C., Levy, S., Stockwell, T., Remington, K. & Halpern, A. Massive parallelism, randomness and genomic advances. Nature Genet. 33 (Suppl.), 219–227 (2003).
Judson, N. & Mekalanos, J. J. Transposon-based approaches to identify essential bacterial genes. Trends Microbiol. 8, 521–526 (2000).
Vagner, V., Dervyn, E. & Ehrlich, S. D. A vector for systematic gene inactivation in Bacillus subtilis. Microbiology 144, 3097–3104 (1998).
Ji, Y., Woodnutt, G., Rosenberg, M. & Burnham, M. K. Identification of essential genes in Staphylococcus aureus using inducible antisense RNA. Methods Enzymol. 358, 123–128 (2002).
Hutchison, C. A. et al. Global transposon mutagenesis and a minimal Mycoplasma genome. Science 286, 2165–2169 (1999). The first attempt to identify essential genes at the whole-genome level.
Akerley, B. J. et al. A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae. Proc. Natl Acad. Sci. USA 99, 966–971 (2002).
Kobayashi, K. et al. Essential Bacillus subtilis genes. Proc. Natl Acad. Sci. USA 100, 4678–4683 (2003).
Tatusov, R. L. et al. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28 (2001).
Tatusov, R. L. et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics 2003 Sep 11; [epub ahead of print].
Yu, B. J. et al. Minimization of the Escherichia coli genome using a Tn5-targeted Cre/loxP excision system. Nature Biotechnol. 20, 1018–1023 (2002).
Mills, D. R., Peterson, R. L. & Spiegelman, S. An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. Proc. Natl Acad. Sci. USA 58, 217–224 (1967).
Jordan, I. K., Rogozin, I. B., Wolf, Y. I. & Koonin, E. V. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 12, 962–968 (2002).
Lazcano, A. & Forterre, P. The molecular search for the last common ancestor. J. Mol. Evol. 49, 411–412 (1999). Introduction to a special issue on the last universal common ancestor, which provides an excellent overview of the state of this field at the end of the twentieth century.
Woese, C. The universal ancestor. Proc. Natl Acad. Sci. USA 95, 6854–6859 (1998). A profound discussion of the nature of the last universal common ancestor. The two principal ideas are that the last universal common ancestor did not comprise a unique species, but rather a community of organisms that engaged in rampant gene exchange, and that the different cellular systems 'crystallized' asynchronously during the early evolution of life.
Snel, B., Bork, P. & Huynen, M. A. Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res. 12, 17–25 (2002). The first earnest attempt to construct evolutionary scenarios on the basis of genome comparisons, taking into account gene loss and HGT.
Mirkin, B. G., Fenner, T. I., Galperin, M. Y. & Koonin, E. V. Algorithms for computing parsimonious evolutionary scenarios for genome evolution, the last universal common ancestor and dominance of horizontal gene transfer in the evolution of prokaryotes. BMC Evol. Biol. 3 [online], (cited 22 Sept. 2003), <http://www.biomedcentral.com/1471-2148/3/2> (2003). A detailed analysis of parsimony algorithms for reconstruction of ancestral life forms and an attempt to use the feedback from examination of essential functional niches to adjust the parameters of the algorithms — the relative rates of gene loss and HGT.
Kunin, V. & Ouzounis, C. A. The balance of driving forces during genome evolution in prokaryotes. Genome Res. 13, 1589–1594 (2003).
Doolittle, W. F. Phylogenetic classification and the universal tree. Science 284, 2124–2129 (1999).
Doolittle, W. F. Lateral genomics. Trends Cell Biol. 9, M5–M8 (1999).
Gogarten, J. P., Doolittle, W. F. & Lawrence, J. G. Prokaryotic evolution in light of gene transfer. Mol. Biol. Evol. 19, 2226–2238 (2002). A veritable manifesto for HGT. Makes the case for numerous instances of hidden HGT.
Doolittle, W. F. Uprooting the tree of life. Sci. Am. 282, 90–95 (2000).
Pennisi, E. Genome data shake tree of life. Science 280, 672–674 (1998).
Pennisi, E. Is it time to uproot the tree of life? Science 284, 1305–1307 (1999).
Kurland, C. G., Canback, B. & Berg, O. G. Horizontal gene transfer: a critical view. Proc. Natl Acad. Sci. USA 100, 9658–9662 (2003). A useful counterpoint to reference 46. Makes the argument that numerous apparent cases of HGT are artefacts.
Clarke, G. D., Beiko, R. G., Ragan, M. A. & Charlebois, R. L. Inferring genome trees by using a filter to eliminate phylogenetically discordant sequences and a distance matrix based on mean normalized BLASTP scores. J. Bacteriol. 184, 2072–2080 (2002).
Wolf, Y. I., Rogozin, I. B., Grishin, N. V., Tatusov, R. L. & Koonin, E. V. Genome trees constructed using five different approaches suggest new major bacterial clades. BMC Evol. Biol. 1 [online], (cited 22 Sept. 2003), < http://www.biomedcentral.com/1471-2148/1/8> (2003).
Wolf, Y. I., Rogozin, I. B., Grishin, N. V. & Koonin, E. V. Genome trees and the tree of life. Trends Genet. 18, 472–479 (2002).
Korbel, J. O., Snel, B., Huynen, M. A. & Bork, P. SHOT: a web server for the construction of genome phylogenies. Trends Genet. 18, 158–162 (2002).
Daubin, V., Gouy, M. & Perriere, G. A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history. Genome Res. 12, 1080–1090 (2002).
Nei, M. & Kumar, S. Molecular Evolution and Phylogenetics (Oxford Univ. Press, Oxford, 2001).
Moran, N. A. Microbial minimalism: genome reduction in bacterial pathogens. Cell 108, 583–586 (2002).
Glansdorff, N. About the last common ancestor, the universal life-tree and lateral gene transfer: a reappraisal. Mol. Microbiol. 38, 177–185 (2000).
Brochier, C., Philippe, H. & Moreira, D. The evolutionary history of ribosomal protein RpS14: horizontal gene transfer at the heart of the ribosome. Trends Genet. 16, 529–533 (2000).
Matte-Tailliez, O., Brochier, C., Forterre, P. & Philippe, H. Archaeal phylogeny based on ribosomal proteins. Mol. Biol. Evol. 19, 631–639 (2002).
Makarova, K. S., Ponomarev, V. A. & Koonin, E. V. Two C or not two C: recurrent disruption of Zn-ribbons, gene duplication, lineage–specific gene loss, and horizontal gene transfer in evolution of bacterial ribosomal proteins. Genome Biology 2 [online], (cited 11 Sept. 2003), <http://genomebiology.com/2001/2/9/research/0033> (2001).
Woese, C. R. On the evolution of cells. Proc. Natl Acad. Sci. USA 99, 8742–8747 (2002).
Harris, J. K., Kelley, S. T., Spiegelman, G. B. & Pace, N. R. The genetic core of the universal ancestor. Genome Res. 13, 407–412 (2003).
Leipe, D. D., Aravind, L. & Koonin, E. V. Did DNA replication evolve twice independently? Nucleic Acids Res. 27, 3389–3401 (1999).
Forterre, P. The origin of DNA genomes and DNA replication proteins. Curr. Opin. Microbiol. 5, 525–532 (2002).
Forterre, P. Displacement of cellular proteins by functional analogues from plasmids or viruses could explain puzzling phylogenies of many DNA informational proteins. Mol. Microbiol. 33, 457–465 (1999).
Delaye, L., Vazquez, H. & Lazcano, A. in First Step in the Origin of Life in the Universe (ed. Chela-Flores, J.) 223–230 (Kluwer Academic, Amsterdam, 2001).
Dworkin, J. P., Lazcano, A. & Miller, S. L. The roads to and from the RNA world. J. Theor. Biol. 222, 127–134 (2003).
Olsen, G. J., Woese, C. R. & Overbeek, R. The winds of (evolutionary) change: breathing new life into microbiology. J. Bacteriol. 176, 1–6 (1994).
Giaever, G. et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature 418, 387–391 (2002).
Kamath, R. S. et al. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature 421, 231–237 (2003).
Acknowledgements
I gratefully acknowledge my intellectual debt to A. Mushegian (minimal gene-set analysis) and B. Mirkin (reconstruction of evolutionary scenarios) and constructive discussions with F. Doolittle and P. Forterre on the nature of the last universal common ancestor. I thank A. Osterman for useful discussions on minimal gene-sets and for providing me with his data prior to publication.
Author information
Authors and Affiliations
Related links
Glossary
- ESSENTIAL GENE
-
A gene for which knockout is lethal under certain conditions.
- ORTHOLOGUES
-
Homologous genes in different species that originate from the same ancestral gene in the last common ancestor of the species compared.
- NON-ORTHOLOGOUS GENE DISPLACEMENT
-
Displacement of a gene responsible for a particular biological function in a certain set of species by a non-orthologous (unrelated or paralogous) gene in a different set of species.
- PHYLETIC PATTERN
-
The pattern of presence or absence (representation by orthologues) of a gene in different lineages across the species tree.
- SYNTHETIC LETHALS
-
Genes for which simultaneous knockout is lethal, whereas individual knockouts are viable.
- SPECIES TREE
-
A phylogenetic tree that represents evolutionary relationships between species as a whole, as opposed to phylogenetic trees for individual genes.
- EVOLUTIONARY PARSIMONY
-
A methodological approach in evolutionary biology that aims to explain an observed distribution of character states (for example, the phyletic pattern of a gene in a species tree) by postulating the minimal number of events in the course of evolution that could have led to that distribution.
Rights and permissions
About this article
Cite this article
Koonin, E. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat Rev Microbiol 1, 127–136 (2003). https://doi.org/10.1038/nrmicro751
Issue Date:
DOI: https://doi.org/10.1038/nrmicro751
This article is cited by
-
Conservation and similarity of bacterial and eukaryotic innate immunity
Nature Reviews Microbiology (2024)
-
Disclosing the hidden nucleotide sequences: a journey into DNA barcoding of raptor species in public repositories
Genes & Genomics (2024)
-
UniAligner: a parameter-free framework for fast sequence alignment
Nature Methods (2023)
-
A paralog of Pcc1 is the fifth core subunit of the KEOPS tRNA-modifying complex in Archaea
Nature Communications (2023)
-
A prebiotically plausible scenario of an RNA–peptide world
Nature (2022)