Introduction: the fundamental difference in recombination biology of bacteria and eukaryotes

Living organisms are of two very different kinds: bacteria and eukaryotes. This basic dichotomy is seen in their fundamentally different cell organisation and radically different life cycles and mechanisms of genetic exchange. Although eukaryotes are typified by the presence of a cell nucleus that multiplies by mitosis, still more fundamental is the presence of an endomembrane system and internal cell skeleton: the cytoskeleton. The eukaryotic cytoskeleton comprises actin microfilaments that mediate the division of the cytoplasm and tubulin microtubules involved in nuclear multiplication; the associated mechanical movements are produced by molecular motors: myosin having arms that walk along actin filaments and dynein and kinesin with arms that walk along microtubules. These cytoskeletal elements and molecular motors are absent from bacteria though distant homologues exist. Instead bacteria have an exoskeleton, ancestrally comprised of peptidoglycan, with a key role in cell division and the segregation of their (typically circular) chromosome that is attached directly to the cytoplasmic membrane. The endomembrane system includes the nuclear envelope and other endoplasmic reticulum (ER) membranes and the Golgi apparatus, both topologically separate from each other and the cell surface membrane, the plasma membrane. The bacterial cytoplasmic membrane grows by the direct insertion of individual lipid and protein molecules, as does the ER/nuclear envelope in eukaryotes; but neither the Golgi nor the plasma membrane grow thus. Instead coated vesicles, entirely absent from bacteria, bud from the ER and are carried by molecular motors to the Golgi membranes where they fuse with its membranes allowing them to grow. Other kinds of coated vesicles bud from the Golgi and fuse with other endomembranes (eg, lysosomes – the digestive organelles) or with the plasma membrane to cause their growth.

True sex – syngamy, nuclear fusion and meiosis – is found only in eukaryotes. The fundamental enabling mechanism is cell fusion, almost unknown in bacteria. Gamete cell fusion is mediated by fusogenic cell surface glycoproteins; these proteins are made co-translationally by ribosomes attached to the rough ER where the first sugars are also added to make a core oligosaccharide, which is further modified and extended in the Golgi complex en route to the plasma membrane. Bacteria lack gamete fusion and sex but have parasexual mechanisms of gene exchange in which individual DNA molecules are transferred from cell to cell: by plasmid conjugation, transformation and viral transduction. The extreme rarity of cell fusion in bacteria and the absence of a cytoskeleton able to support a cell internally are probably the major reasons why sex never evolved in bacteria. One purpose of the present paper is to outline the evidence that sex evolved during the very origin of the eukaryotic cell about 850 million years ago and to discuss how, and more briefly why, this happened. The ability to make surface glycoproteins, on which sex depends, evolved shortly before this in the immediate common ancestor of eukaryotes and archaebacteria (Cavalier-Smith, 1987a, 2002a). This ancestor evolved by the drastic modification of a Gram-positive bacterium with a cell wall of peptidoglycan, like other eubacteria. The replacement of eubacterial peptidoglycan by archaebacterial/eukaryotic glycoprotein that ultimately enabled sex to evolve was just one of numerous associated evolutionary changes that generated the common ancestor of eukaryotes and archaebacteria. To emphasise the key importance of this change in cell wall chemistry I use the term ‘neomura’ (Cavalier-Smith, 1987a) to refer collectively to eukaryotes and their archaebacterial sisters. The ‘neomuran revolution‘ when their common features arose was the most radical upheaval in all bacterial history, with repercussions on all the DNA-handling machinery and the chemistry and organisation of chromosomes (Cavalier-Smith, 2002a).

Although the recombination enzymes were greatly modified during the neomuran revolution, this did not alter their fundamental properties that arose about 3000 million years earlier – during or even prior to the origin of the first cell (Cavalier-Smith, 2001). Before discussing the origins of the DNA breakage and rejoining machinery, outline key features of the history of cell evolution significant for placing the origins of recombination and sex in a realistic phylogenetic context and the palaeontological evidence for their timing. As detailed arguments concerning the timing and mechanisms of the origin of bacteria, the ‘neomuran revolution’, the origin of archaebacteria, and the even more dramatic origin of eukaryotes can be found elsewhere (Cavalier-Smith 2001, 2002a, 2002b), I shall confine myself to a simple overview and not repeat the references to the extensive evidence here.

The great antiquity of eubacteria and the relative recency of neomura

Figure 1 summarises the main features of cell evolution and phylogeny relevant to the origins of recombination and sex. Bacteria are grouped into two subkingdoms: Negibacteria with cells bounded by an envelope of two membranes, as in Escherichia coli, and Unibacteria bounded by a single cytoplasmic membrane, as in eukaryotes (Cavalier-Smith, 1998). In negibacteria the peptidoglycan wall, when present, lies between the two membranes. Cell fusion has never been observed in any negibacteria and may be made impossible by this unique envelope structure (Cavalier-Smith, 2001). Interestingly, mitochondria and chloroplasts, which evolved from proteobacteria and cyanobacteria respectively, have both retained the double negibacterial envelope and evolved the ability to fuse following syngamy – an important feature of sex in many eukaryotes. In chloroplasts such fusion occurs in green algae (Cavalier-Smith, 1970), which like red algal chloroplasts and mitochondria lost the intervening peptidoglycan layer, but not in glaucophytes that have retained it. Thus it is not the doubleness of the negibacterial envelope per se that precludes cell fusion, but the overall complexity of the negibacterial envelope, including the presence of the intervening rigid peptidoglycan layer – an effective chastity belt – that has remained in eukaryotes only in the chloroplast envelope of glaucophyte algae, which are not known to undergo chloroplast fusion or even sex. Chloroplast fusion has never been demonstrated in sexual chromalveolates, where the extra membranes surrounding the chloroplasts (Cavalier-Smith, 2000) would make its evolution much more complex.

Figure 1
figure 1

The tree of life based on an integration of evidence from palaeontology, cell biology and molecular phylogeny. For reasons detailed elsewhere (Cavalier-Smith, 2001, 2002a), the tree is rooted within the negibacterial eubacteria (traditional Gram-negative bacteria and their relatives) not between eubacteria and archaebacteria as widely thought by many molecular biologists who ignore the decisive fossil evidence. The position of the root of the eukaryote part of the tree is a little uncertain. That shown is most likely, but it might actually be within Amoebozoa, between them and other eukaryotes, or even just below the Rhizaria; however, it is not within the amitochondrial excavates, which are secondarily, not primarily amitochondrial as used to be thought (Cavalier-Smith, 2002b). Rhizaria are a new major protozoan group (infrakingdom) comprising Radiolaria, Foraminifera, Cercozoa, Apusozoa and Heliozoa (Cavalier-Smith, 2002b). Dashed lines indicate the symbiogenetic origin of chloroplasts in the ancestral plant and the secondary acquisition of a red algal chloroplast® by the common ancestor of chromalveolates (kingdom Chromista plus the protozoan infrakingdom Alveolata: see Cavalier-Smith, 2000) the most complex of all cells.

Unibacteria comprise two phyla: Archaebacteria, with glycoproteins and prenyl ether lipid membranes that adapt them to hot acid conditions, and Posibacteria (Gram-positive bacteria, heliobacteria, Togobacteria, all with peptidoglycan, and mycoplasmas without it). Posibacteria have an acyl ester membrane like those of Negibacteria, from which they evolved by the loss of the outer membrane, as do eukaryotes which evolved from them. Posibacteria comprise two subphyla: Endobacteria, which typically have endospores like the well studied Bacillus and Clostridium, and Actinobacteria, which have exospores and often much more complex aerobic morphology as in the actinomycetes and Streptomycetes. The presence in some Streptomycetes of chitin, widespread in eukaryotes but absent from other bacteria, histone-H1 like proteins, numerous serine/threonine protein kinases like those involved in eukaryotic cell cycle regulation, calmodulin-like proteins, inositol-phospholipids, and in Mycobacterium of cholesterol biosynthesis, together strongly indicate that actinobacteria are ancestral to eukaryotes, and to neomura as a whole (Cavalier-Smith, 2002a).

Fossil isotopic evidence indicates that as long ago as 3500 million years ago, and possibly even earlier, ecosystems were based on photosynthetic carbon fixation by the enzyme rubisco, an ability restricted nowadays to Negibacteria – general in the phyla Cyanobacteria and Proteobacteria and more rarely in green non-sulphur bacteria belonging to the Eobacteria. Most Eobacteria either use other carbon fixation enzymes with different isotopic fractionation or are heterotrophs such as the highly radiation-resistant bacterium Deinococcus or the thermophile Thermus that first made PCR possible. The different isotopic fractionation of most photosynthetic green non-sulphur bacteria, the exceptional radiation resistance of Deinococcus, and the marked divergence of the phylum on molecular trees suggest that Eobacteria may be the oldest extant phylum. Evidence for proteobacteria as long ago as 3500 million years ago comes from isotopic ratios in thermally unstable mineral deposits of this age indicating biogenic sulphur reduction at moderate temperatures, known only from this phylum. Good morphological fossil evidence for cyanobacteria is absent before the major snowball earth glaciation of 2400 million years ago (Kirshvink et al, 2000), so I suggest they may have arisen immediately afterwards. The origin of oxygenic photosynthesis only then may explain why an oxidising atmosphere, witnessed by the red beds and manganese deposits from 2000 million years ago was so long delayed after the origin of life.

Fossil steranes dating back to 2700 million years ago have been attributed to eukaryotes (Brocks et al, 1999). But I consider this a misinterpretation, since not only the posibacterium Mycobacterium, but two groups of proteobacteria make sterols, so it is much more likely that these are eubacterial in origin (eubacteria is an ancestral grade of organisation that embraces Negibacteria and Posibacteria, ie all bacteria with acyl ester lipids and usually also the peptidoglycan murein (the latter was independently lost in mycoplasmas and Planctobacteria). In my view all morphological fossils so far described between 2.4 and 0.85 Gy ago are probably cyanobacteria, reports of red and other eukaryotic algae all being optimistic misinterpretations. The most convincing early eukaryotic fossils are the flask-shaped fossils and others with complex surface structures dated 800 million years ago (Porter and Knoll, 2000). Only such fossils have characteristics that seem genuinely indicative of both endomembranes and a cytoskeleton, the twin hallmarks of eukaryotes. There is no direct fossil evidence for archaebacteria (chemical evidence for their lipids) until very much later still. We cannot directly estimate the age of posibacteria from the fossil record, but in view of the suggestive, but not decisive, evidence from sequence trees that they may be sisters of cyanobacteria it is possible that they date only from about 2500 million years ago and that all bacteria were negibacteria for roughly the first 1000 million years of life.

In view of the compelling evidence that eukaryotes and archaebacteria are sisters, it is hard to escape the conclusion that the neomuran revolution took place only around 850 My ago, ending an immensely long period of about 1.3 Gy of ecological stability dominated by cyanobacteria and with no global glaciations or other major innovations following a step rise in atmospheric oxygen (possibly even to present levels) just before 2 Gy ago. Figure 2 summarises this historical interpretation of the palaeontological and other evidence.

Figure 2
figure 2

Major features of the fossil record interpreted in the light of cell and molecular biology. For further details and references see Cavalier-Smith, 2002a.

Many molecular evolutionists have assumed that both eukaryotes and archaebacteria originated much earlier than suggested by this critical re-evaluation of the geological record. However, such views are based on overconfidence in the theoretically and empirically unsound notion of a molecular clock and naïve extrapolation backwards from known fossil dates. As argued elsewhere in detail (Cavalier-Smith, 2002a), quantum evolution (temporary hyper-acceleration of rates of molecular change) distorted the dimensions of all molecular trees used to root the tree of life even more grossly than I earlier argued (Cavalier-Smith, 1991). Sustained major rate increases in certain lineages also misleadingly altered the topology of many molecular trees. If these distortions are allowed for, one can harmonise the more reliable features of molecular sequence trees with the dates from a critically interpreted fossil record and with the groupings of organisms revealed by comparative cell biology and ultrastructure. The key is a critical integration of many lines of evidence (Cavalier-Smith, 2002a) rather than undue reliance on the invalid assumption that a single molecule provides a Rosetta stone for the interpretation of the history of life (Pace et al, 1986). This reappraisal strengthens my earlier arguments for a eubacterial root of the universal tree, but has overturned the view that eukaryotes were primitively amitochondrial (Cavalier-Smith, 1991); sex and mitochondria probably evolved virtually simultaneously at almost the same time as the universal features of the eukaryotic cell (Cavalier-Smith, 2002a), not successively as earlier thought (Cavalier-Smith, 1995).

The molecular machinery of recombination

Recombination has two aspects: (1) the mechanisms whereby genes from cells of different parentage become combined in a single cell, which are very different in bacteria and eukaryotes, and (2) those within cells that make chimaeric DNA molecules having genes from both sources, which are fundamentally the same in all cells.

Most basic is the cutting and rejoining of DNA. As the enzymes that catalyse this are present in all cells and evolutionarily related in sequence, they must have evolved very early in the evolution of life – significantly before the last common ancestor, also known as the cenancestor. But as the enzymes are proteins they must have evolved after the evolution of coded protein synthesis (Cavalier-Smith, 2001). Cutting is thermodynamically spontaneous (but too slow without an enzyme to be biologically useful), whereas joining (ligation) is not and requires a source of free energy that may be provided in two fundamentally different ways. The first of these is mediated by DNA ligase enzymes, which use ATP or NAD nucleotides as energy source; they catalyse the joining of adjacent 3′OH and 5′phosphate ends of two separate oligonucleotides or polynucleotides that are correctly base paired with a continuous DNA template. Such adjacent ends without any missing intervening nucleotides (technically a nick) can be made by the cutting of a previously intact double helical nucleotide on one strand only by an endonuclease enzyme. They are also always formed during DNA replication since DNA polymerisation cannot be continuous on both antiparallel strands (Figure 3). It has to be discontinuous on at least one of them, the so-called lagging strand, which has to grow in short segments in the opposite direction to the movement of the replication fork, since DNA polymerase can only join nucleotides to the 3′end of an oligo/polynucleotide. DNA ligase therefore probably evolved to join together these segments in the earliest stages of DNA replication before the origin of cells; it is simpler than many enzymes, consisting of a single polypeptide chain.

Figure 3
figure 3

Roles in DNA replication of enzymes later recruited for recombination. For simplicity the helical character of the DNA is not shown. In relaxed DNA each strand would be twisted round its partner once every 10 base pairs. In cells it is very slightly untwisted, either passively by being wrapped round nucleosomes (most eukaryotes and many archaebacteria) or actively by DNA gyrase (a type II topoisomerase that can reversibly cut and rejoin both strands found in all eubacteria, chloroplasts and mitochondria and a few secondarily mesophilic archaebacteria) and is thus thrown into negative supercoils. The replication fork is moved from left to right by the DNA helicase which prises apart the parental strands, followed by the leading strand polymerase which catalyses the addition of nucleotides to the growing 3′OH end of the continuously growing daughter strand (marked by arrowheads). Helicases and polymerases are associated and fixed to a membrane (in bacteria) or nuclear matrix (in eukaryotes). The consequent positive supercoiling and tightening of the unreplicated parental duplex would quickly prevent further progression of the fork but for the DNA topoisomerase I that cuts one strand allowing it to relax before rejoining it. DNA primase that makes the RNA primers to initiate daughter molecule synthesis (dashed line) is bound to the DNA helicase in eubacteria but to the DNA polymerase in neomura. A lagging strand DNA polymerase adds DNA nucleotides to the RNA primer to make a daughter DNA strand; a 5′exonculease activity (either apart of this polymerase as in eubacteria or separate molecules as in neomura) removes the RNA primer. When the polymerase has added enough nucleotides to the growing end for it to abut this newly created 5′ phosphate end DNA ligase joins the two ends to make a continuous daughter strand. Parental strands are black and daughter strands grey.

The second type of DNA joining is by DNA topoisomerase enzymes that combine properties similar to an endonuclease and a DNA ligase in a single molecule. Topoisomerases both cut DNA and rejoin the ends. But unlike an endonuclease, cutting is achieved not by hydrolysis but by the covalent joining of the protein to a phosphate of the DNA backbone, thus releasing a free 3′OH end. Since energy is conserved in the phosphodiester bond between the 5′ end of the cut chain and the enzyme, the latter can religate that end to the 3′OH end at the same time as cleaving itself free of the DNA. This reversible reaction is not pointless to the cell but has several useful features. Type I topoisomerases, which cut only one strand of the DNA by introducing nicks, allow free rotation of the cut strand around the uncut one. This relieves the tension caused by positive supercoiling of DNA which occurs ahead of replication forks as a result of the unwinding of the parental strand to create two daughter templates and also locally in front of sites of transcription. For circular DNA as in bacteria or linear DNA attached at intervals to a nuclear skeleton, such tension would quickly bring replication to a stop but for the ability of topoisomerase I to allow changes in supercoiling. It is the release of tension and changes in coiling that is universally biologically important, not the interconversion between different topological isomers for which the enzyme's name was given and that is only true for circular molecules.

Topoisomerase Ia is involved in recombination, as are type II DNA topoisomerases that temporarily cut both strands of the DNA. If they rejoin the ends from the same molecule they simply recreate the same molecule, or in the case of circular molecules very often a topological isomer of it. If, however, two separate topoisomerase II enzymes cut two different DNA molecules it would in principle be possible for the ends of the two different molecules to be joined to form recombinant chimaeras. Recombination is seldom, if ever, this simple, for only if the cuts were at exactly homologous positions would the recombinant molecules be viable. Even though topoisomerase II binds preferentially to certain nucleotide sequences these are too short to provide the requisite specificity.

Therefore viable recombination requires an homology search in addition to cutting and rejoining. Homologous regions are found by base pairing; free 3′OH ends can each unpair from its own sister strand and invade the double-strand of the other molecule and pair with the complementary region there instead (Figure 4a). Because of the stability of the double helices, such a search would be immensely slow except at temperatures too high for non-thermophiles to survive. Therefore all organisms catalyse the first step by a DNA helicase protein that actively unwinds a long stretch of DNA with a free 3′OH end using ATP hydrolysis energy; a second protein that can also bind double-stranded DNA coats the single strand and helps it invade the target double-stranded DNA and pair with it to form a three-stranded complex (Gupta et al, 2001). In eubacteria the enzyme that promotes homologous pairing is RecA protein; its descendants in neomura are called Rad3 in archaebacteria and Rad51 in mitotic and DmcI in meiotic cells of eukaryotes. These enzymes bind to DNA in the form of helical filaments (Yu et al, 2001). I shall refer to them collectively as synaptases (they are sometimes called recombinases but this is potentially confusing as the entirely different cutting and rejoining enzymes that mediate site-specific recombination of bacterial episomes are also called recombinases). If the invading strand becomes improperly paired, the synaptase can actively unpair it, hydrolysing ATP in the process to allow it to try again and pair correctly (Zhang et al, 2001).

Figure 4
figure 4

The roles of key recombination enzymes. (a) After an endonuclease cuts a DNA strand it is coated by RecA and ancillary proteins that help it invade a homologous duplex and pair with its complement. (b) When this has happened to both molecules they are held together by two crossing polynucleotide chains; this Holliday junction is recognised by a specific endonuclease (the resolvase) that cuts two more strands. Cutting those marked × allows the free ends to swap, re-pair and be rejoined by DNA ligases, yielding reciprocal recombinants. Cutting the other two strands and exchange instead may cause gene conversion but will yield no crossing over of flanking markers. In principle endonuclease and ligase activities can be combined in a single DNA topoisomerase enzyme. Typically the cuts on the two recombining molecules are not precisely opposite, causing temporary gaps and overlaps, which are repaired by polymerases and nucleases respectively prior to religation.

In this way DNA cutting and mutual strand invasion by only two synaptase-coated free ends, as would be generated by a topoisomerase I, precisely aligns the two parental DNA molecules which are held together by two crossing single strands (Figure 4b): this key intermediate structure – the Holliday junction – can be separated into two distinct molecules by an endonuclease enzyme known as the Holliday junction resolvase (RuvC in bacteria), which recognises and cuts two of the unpaired strands at the junction. Cutting the previously uncut strands gives reciprocal recombination, whereas cleaving the previously cut strands regenerates the parental molecules and may lead to gene conversion in the region of hybrid DNA if their sequences were not identical. The cuts made by the resolvase are repaired by DNA ligase. As it is highly improbable that both cuts will be in homologous positions, ligation will not be possible until a repair DNA polymerase fills gaps containing missing nucleotides and redundant overlapping DNA tails are removed by repair endonucleases and/or exonucleases. Thus certain components of the excision repair systems that proof read newly replicated DNA and replace mismatched nucleotides (and repair damaged DNA) are also generally required for recombination.

In sum, recombination depends on seven successive enzyme activities: (1) cutting by an endonuclease or DNA topoisomerase; (2) unwinding the donor strand by a DNA helicase; (3) invasion of a double helix by the single strand for homology search, where spontaneous base pairing is accelerated by a synaptase, to form a Holliday junction at homologous regions; (4) cutting two strands at the Holliday junction by the resolvase; (5) repair of gaps by repair polymerases; (6) removal of overlaps by repair nucleases; and (7) ligation of the remaining nicks by DNA ligase or DNA topoisomerase. Most steps are actually more complex than indicated and helped by accessory proteins, eg, single-stranded binding proteins that coat single-stranded DNA help both strand separation/invasion by the synaptase and repair by DNA polymerase; and recombination mediator proteins help assemble the synaptase on single strands precoated by single strand-binding protein (Gasior et al, 2001).

Overall the recombination pathway is very complex and must have evolved in stages. In many cells there is functional redundancy between alternative enzymes. For example in eubacteria the main enzyme that helps load RecA protein onto the donor double-stranded DNA adjacent to the 3′OH cut end is the RecBCD DNA helicase, which like other helicases prises double-stranded DNA apart while hydrolysing ATP as energy source. But there is also another DNA-binding ATPase (RecFOR) that can function similarly.

To complicate matters further most cells can also undergo non-homologous crossing over by different minor machinery but with logically equivalent functions. It is rather common for cells to have more than one near-equivalent way of doing things; often this redundancy has arisen by gene duplications of parts of pre-existing machinery rather than through radical innovation. Moreover most cells harbour transposons or DNA viruses (eg, phage mu in Escherichia coli) that can insert themselves into cellular DNA by illegitimate recombination and usually encode key elements of their own DNA cutting and joining machinery often substantially different from that of their cellular hosts.

Understanding the evolution of the recombination machinery is often also complicated by the facts that many of its components have multiple roles in different cellular processes, none completely understood. Such pleiotropy and likely shifts of function, as well as differences among organisms, means that guesses as to the original function of particular components can only be tentative. It is reasonable to suppose that the transposases that catalyse insertion of transposons and viruses were selected specifically for integrating them into foreign DNA by their successful selfish spread through host populations. Presumably they arose from ancestral molecules originally useful to their hosts. I shall discuss them only briefly here and will first concentrate on the evolution of the cellular recombination machinery – specifically the major RecA-related pathway.

Origin of the basic machinery for general recombination

In contrast to transposases, it is unlikely that any components of the basic RecA-related recombination machinery originated specifically for recombination between DNA from different individual organisms, since all components are essential simply for the reliable replication and segregation of cellular DNA. All were present in a highly evolved state in the cenancestor. Most basic ones are unlikely to have first evolved for DNA repair as sometimes suggested (Bernstein et al, 1987). Replication is even more fundamental than repair. So-called repair polymerases, which may be less processive than the replication polymerases and specialise in filling gaps produced by the excision of DNA damage or replication errors, probably originated to fill the gaps formed at intervals on the lagging strand by the removal of RNA primers. Almost from the beginning of replication DNA ligase was essential to join these lagging strand polynucleotides into a continuum. Neither would have been necessary in an imaginary precellular stage where all DNA molecules were linear if the original replicative DNA polymerase were able to initiate polynucleotide synthase, as DNA primase now can (Frick and Richardson, 2001), as postulated must have been true when DNA replication first evolved (Cavalier-Smith, 1987b). Bocquier et al (2001) discovered in the thermophilic archaebacterium Pyrococcus furiosus a DNA primase homologue with just this ability to initiate and extend DNA chains. This shows that a simple DNA polymerase consisting of a single small polypeptide chain (41 kD) with the requisite properties to replicate DNA without the added complexities of RNA primers and attendant excision properties is mechanistically possible. Although such an enzyme must have evolved when DNA replication began in precellular organisms, it would be a mistake to see this archaebacterial enzyme as a relic of a primitive system. Archaebacteria almost certainly evolved from a posibacteria 3 Gy after their ancestors evolved the RNA priming system using DnaG primase (Cavalier-Smith, 2002a). As soon as organisms evolved such a differentiation between a DNA primase making short primers and a more complex processive DNA polymerase unable to initiate chains, RNA excision and gap filling by a polymerase were essential. It has been argued that such differentiation evolved to ensure replication fidelity, on the grounds that a DNA polymerase could be selected for higher fidelity if it has only to extend chains and not initiate them (Kornberg and Baker, 1992). If this is correct the P. furiosus primase should have a markedly lower fidelity than a standard DNA polymerase. RNA primers may also be interpreted as relics of an earlier stage when the genome itself was RNA (Cavalier-Smith, 1987b).

DNA topoisomerases also must date back to the time of the first circular DNA chromosomes. Topoisomerase Ia was required for releasing strain ahead of the replication fork, while topisomerase II was essential for decatenating daughter circles after the termination of replication to allow their separation into separate daughters. Since such decatenation often produces circular dimers, additional special termination enzymes able to cut and rejoin DNA probably evolved during the origin of efficient circle replication. Since circles are the simplest way of avoiding both the end replication problem caused by the origin of separate primase and replication polymerases and avoiding digestion of ends by exonucleases (whether the organism's own or those of predators or competitors: Cavalier-Smith, 2001), it is likely that both kinds of topoisomerases and the DnaG DNA primase evolved almost simultaneously during a huge bout of enzymatic innovation of many basic DNA-handling enzymes.

Clear support for this comes from a remarkable relationship between several key DNA handling enzymes; several enzymes that interact with double-stranded DNA to catalyse apparently distinct reactions share a 100-amino acid domain called the Toprim domain because it is found in most topoisomerases and bacterial DnaG primases (Aravind et al, 1998). The Toprim domain carries the active centre that cuts and joins the DNA in topoisomerases of both the Ia and II classes; in DnaG primase this domain is flanked by longer DNA regions: an upstream one with a zinc finger domain and a downstream one with a domain for interaction with DnaB, a DNA helicase protein that helps load it onto the lagging strand. The larger and more complex topoisomerases have quite different domains associated with the Toprim domain. In topoisomerase Ia (the eukaryote homologue is confusingly called topoisomerase III) the single polypeptide chain has a Toprim domain at its N-terminal end and three C4 ‘little finger’ DNA-binding domains near the C-terminus.

By contrast each of the two dissimilar (but evolutionarily related) polypeptides of topoisomerase II has two other domains upstream from the Toprim domain: an N-terminal MutL/Hsp90 ATPase domain is separated from the Toprim domain by an S5 domain shared with the S5 protein of the small ribosome subunit. MutL is centrally important to the universal long patch mismatch excision repair mechanism, acting as a scaffold to connect the ATPase (MutS) that recognises the mismatch to the endonuclease that nicks the DNA near the damage to allow the repair exonuclease to remove it. It also loads DNA helicase II onto the DNA so it may be unwound. This ATPase domain is essential for the active negative supercoiling of DNA by DNA gyrase, a type II topoisomerase universal in eubacteria, which also probably evolved when chromosomes first became circular, as its underwinding of DNA is essential for efficient transcription of circles and segregation of compact nucleoids in eubacteria.

A Toprim domain is also found in another ancient eubacterial recombination protein, RecR a non-enzymatic component of the minor (RecFOR) recombination pathway. In this small single polypeptide it is downstream of a C4 finger. Other kinds of proteins with Toprim domains are less universal and may be less ancient. The scattered distribution of OLD endonucleases with Toprim domains, often virally encoded, and of various phage proteins with such domains suggests that they may have arisen secondarily and been distributed by lateral gene transfer after the cenancestor (Aravind et al, 1998).

The key catalyst for homology search during recombination, RecA, is evolutionarily related to the eubacterial replicative DNA helicase, DnaB, and probably evolved from it by gene duplication and divergence prior to the cenancestor. As replication is much more basic and essential than recombination, the reverse suggestion that DnaB evolved from RecA (Leipe et al, 2000) is highly improbable and based on the widespread, but palaeontologically refuted, view that the universal tree is rooted between eubacteria and neomura and reluctance to accept the loss of DnaB by neomura (explained below).

Thus common protein domains are found in a remarkably wide variety of DNA-handling proteins, which must have been formed during precellular evolution by duplication and shuffling by illegitimate recombination of a fairly small number of domains. Note that contrary to some assumptions (Gilbert, 1987) there is no good reason to think that introns were involved in this early domain shuffling. Nor is there any reason to think that it occurred in an RNA world – indeed the very existence of an RNA world is doubtful (Cavalier-Smith, 2001). There was however very likely to have been either a temporary RNA-protein world before the evolution of DNA replication or a NA/protein world with mixed nucleotides. Probably most of the basic DNA handling proteins evolved by gene duplication and gene chimaerisation by domain shuffling of those that interacted originally with RNA or which did not discriminate strongly between the two types of nucleotides. Distant similarities can even be detected between DNA and RNA polymerases and reverse transcriptase (Joyce and Steitz, 1994).

So many different enzymes are now needed for DNA replication that it is hard to see which came first. What we can be sure of is that the present complexity of the DNA handling systems must have been preceded by a far simpler system with many fewer components. It is a fallacy to suppose that any modern cells are relics of such an early stage of evolution. Phylogenetic evidence makes it absolutely clear that the cenancestor was a very complex cell with well in excess of a thousand different genes. It was almost certainly a eubacterium that must have already evolved a DNA handling machinery equivalent in complexity and redundancy to that of any modern eubacterium. We do not know whether all proteins evolved from a single common ancestor or from a tiny handful of separately arising proteins. However the major fraction of their diversity must have arisen through gene duplication and gene chimaerisation by domain shuffling.

Aravind et al (1998) detected a family of small proteins of unknown function in archaebacteria and a few eubacteria that consist of little more than a single Toprim domain, and suggest that they may be ancestral to the others bearing this domain. However simplicity can also arise by secondary simplification. They recognise this for the small Toprim protein of mycoplasma, where they have direct phylogenetic evidence for it being a simplified derivative of DnaG primase. I suggest that the archaebacterial small Toprim proteins also arose in the same way by degeneration of DnaG in their common ancestor after DnaG was functionally replaced in the neomuran cenancestor by a DNA polymerase subunit (reasons for this replacement are discussed below). I suggest also that the Toprim domain first evolved in DnaG itself and that DNA topoisomerases did not evolve until after RNA primer removal allowed reasonably accurate DNA replication. The complexity of DNA topoisomerase II, with two dissimilar polypeptides each with multiple domains having different enzymic activities, seems too great to have preceded topoisomerase I. The fact that DNA topoisomerase I is essential for bacterial DNA seqregation (Zhu et al, 2001) is consistent with an early origin.

The recombination events that shuffled the domains making up the more complex enzymes such as topoisomerase II must themselves have depended on the prior evolution of DNA ligase at least, so DNA ligase may have been almost the earliest DNA handling enzyme – after the origin of the first DNA polymerase itself. The first DNA ligase was probably the universal ATP-dependent type. I have argued that it must have evolved in precellular evolution (Cavalier-Smith, 1987b, 2001). The NAD-dependent type restricted to eubacteria and viruses probably evolved only after the evolution of the first protocell allowed the origin of secondary metabolism and the biosynthesis of more complex nucleotide cofactors like NAD (Cavalier-Smith, 2001). Possibly it was first adopted by a DNA virus to help its replication and/or was only later recruited by the host cell as a secondary enzyme (before the cenancestor); it was probably lost by the neomuran cenancestor prior to the divergence of eukaryotes and archaebacteria.

When the first endonuclease evolved is less clear. Because DNA is made in pieces and because hydrolysis of misincorporated ribonucleotides and accidental mechanical breakage may have been quite frequent before high fidelity replication and repair, they might initially not have been needed; uncontrolled they would have been more a hazard than a benefit. The distinction between a nuclease and a DNA topoisomerase can be evolutionarily slight. Thus the restriction endonuclease NaeI can be converted into a topoisomerase II (unrelated to the natural ones) by a single amino acid substitution (Huai et al, 2000). The fact that early eukaryotes evolved a DNA topoisomerase I entirely unrelated in sequence or 3D structure (hence called Ib) to topoisomerases Ia and II also indicates that it is mechanistically relatively easy to evolve a topoisomerase from other enzymes: topoisomerase Ib is related to the site-specific integrase (sometimes called recombinase) of phage λ (Aravind et al, 1998). As it is found also in eukaryotic viruses like vaccinia it was probably secondarily acquired by an early eukaryote from an infecting virus – a nice example of lateral gene transfer. In keeping with its independent origin, the integrase/topoisomerase Ib family cuts the DNA differently from topoisomerase Ia and II, generating a 5′0H free end.

Recently our perspective on the evolution of the recombination machinery has been dramatically changed by the discovery that the primary function of RecA and Ruv and probably their eukaryotic homologues is to rescue broken or stalled replication forks (Cox, 2001). When a fork meets a nick in the parental DNA one daughter becomes detached from the fork. Unless this broken strand is reattached one daughter cell will have only part of a chromosome and die. Reattachment is mediated by the RecA and RecBCD machinery that allows the free end to invade an homologous region of its intact sister to form a Holliday junction that is converted into a functional fork by the resolvase and a special primosome. When a fork meets an obstacle, whether an attached protein, a lesion in the DNA or simply a hard to replicate sequence it stalls, and needs to be actively restarted. This is done by the special DNA helicase RecG that unpairs the nascent strands each side of the fork (McGlynn et al, 2001), allowing the RecA and RecFOR machinery to pair them with each other to form a structure topologically equivalent to a Holliday junction, which is converted into an active fork by resolvase and the rescue primosome (Cox, 2001). In bacteria the majority of cell cycles suffer from such problems, so viability would be over two-fold lower in the absence of the generalised recombination machinery even in the absence of external sources of DNA damage. In eukaryotes with substantially larger genomes probably every cell cycle must require such rescue of damaged replication forks. We must therefore regard the so-called general recombination machinery as a fundamental part of the basic replication machinery of all cells, rather than as an auxiliary adaptation for much rarer repair of DNA damaged by external agents or the even rarer genetic exchange among cells. As argued above it is likely that the replicative helicase DnaB evolved first to move forks actively and RecA evolved later to resuscitate stalled forks after genome size increased sufficiently to make their stalling cause serious reductions in viability. The RecA-based homologous recombination machinery is really a fundamental mechanism essential for replication fidelity and high viability under all conditions. I am sceptical of the view that it played an important role in gene assembly and large-scale evolutionary change; the idea that DNA's greater suitability for homologous recombination was an important factor in its replacement of RNA genomes (Shibata et al, 2001) could hardly be right if, as I have argued, DNA and DNA helicases evolved before RecA.

Origin of site-specific recombination

Site-specific recombination probably evolved for a similar reason. Chromosome circularity and the active generalised recombination machinery means that an odd number of crossovers will inevitably generate dimers (Cavalier-Smith, 1975) that will be broken at cell division, probably causing death of at least one daughter. All bacteria use a site-specific cutting and rejoining machinery involving the recombinases XerC and XerD (and their homologues) that recognise a special sequence near the terminus to reform two daughter circles (Perals et al, 2001). These enzymes also had probably evolved before the cenancestor. Homologous enzymes have a similar role in a variety of plasmids and viruses such as the phage λ integrase and also act as the transposases of certain transposons. Thus there is little doubt that these various selfish parasites recruited their integrases and transposases from the host enzymes that evolved simply to maintain the viability of the cell. There is a considerably wider range of transposase families among bacterial transposons and insertion sequences. I suggest that they all evolved from useful host enzymes but will not attempt to trace their sources. The reverse process of the incidental acquisition of viral proteins by bacteria has also certainly occurred now and again in evolution and been useful to the bacterial host. In some cases, as in the DNA cutting/rejoining invertases that mediate reversible flagellar phase changes in bacteria by inverting DNA segments and have homologues in phages, it is not obvious which was recruited from which.

Origin of gene transfer machinery: transduction and transformation

Viruses can transfer bacterial genes from cell to cell, but such accidental transduction of bacterial genes did not evolve to help the bacterium. It is merely the accidental consequences of viral gene transfer mechanisms evolved by selection for selfish viral genomes at the expense of their cellular hosts.

By contrast, the less widespread genetic transformation is almost certainly an adaptation of the bacterial cell. However, the widespread assumption that it is an adaptation for recombination and the generation of bacterial diversity is almost certainly wrong. Redfield (2001) has presented convincing arguments that it is a trophic, not a genetic adaptation, a way of getting extra nucleotides and energy. Bacteria develop competence for DNA uptake to get nucleotides for food – their sequences are not what fundamentally matters; genetic transformation is an inevitable low frequency by-product of the way they feed, just as is the occasional incorporation of the genes of their food by eukaryotes (Doolittle, 1999). Protozoa did not evolve phagotrophy so that one of their descendants could evolve a chloroplast to form the first plant (Cavalier-Smith, 2000). This was an accidental but serendipitous consequence of something that evolved for other reasons. So also – probably – is the rapid exchange of short DNA segments that characterises those relatively few bacteria that have a highly developed mechanism for absorbing DNA as food. Competence for transformation should really be called competence for DNA absorption; it is primarily a novel mechanism of saprotrophy – a refined cannibalism – eating the dead, not sex with the dead.

I think that predation on the nucleic acids of others was a major feature of precellular evolution in the inside-out cell or obcell phase of cell evolution that I have argued necessarily preceded cells with a bounding membrane (Cavalier-Smith, 2001). An obcell in which genes were attached to the outside of membranes may have been essential for the early evolution of complex genetic systems in the prebiotic soup. Many nucleases may have evolved in this phase of evolution to digest foreign nucleic acids and nucleic acid binding proteins to help protect one's own DNA. A major reason for the evolution of DNA circularity and the folding or fusion in pairs of obcells to form the first cells bounded by a double envelope may have been protection from such digestion (Cavalier-Smith, 2001).

Ever since Weismann (1886) evolutionists have been tempted to argue that recombination has an evolutionary ‘function’ because it generates diversity on which selection may act, and have assumed that it must have been positively selected for this reason. But this is to put the cart before the horse. Variation precedes selection and is not directly caused by it. The whole history of bacterial evolution is dominated by remarkable stasis over thousands of millions of years (Cavalier-Smith, 2002a); the dominant force in the evolution of bacterial genomes appears to have been powerful selection to avoid foreign DNA and protect one's own and increase the fidelity of replication. Unlike elephants, bacteria have no need of homologous recombination to generate diversity. Even with mutation rates selected to be almost as low as physically possible, bacteria are so uncountably numerous that mutation is generating variation on a huge scale, enough to fuel much more evolution than has actually occurred. A spoonful of rich mud or culture medium may contain more bacteria of one kind than individual humans in the entire world (one of the most abundant vertebrates) with genetic variants in and accidental duplication of every single gene. Bacteria do not need homologous recombination to generate diversity or to avoid Muller's ratchet, which is irrelevant not only to bacteria but also to unicellular eukaryotes (despite claims to the contrary: Bell, 1988) because of their immense populations and rapid generations that allow mutations to spread immensely faster than in macroorganisms when subject to comparable selective intensity.

Calling the accidental ‘parasexual’ mechanisms of transduction and transformation ‘sex’ (Bernstein et al, 1987) is thoroughly unsound. The mechanisms are not homologous and they evolved for different reasons. The rarity of occasional lateral gene transfers by these mechanisms is such that they would have been quite ineffective as a selective force for the origin of the basic cellular recombination machinery compared with the selection every cell generation for faithful DNA replication, including the correction of replication errors and such regular and omnipresent mishaps as stalled replication forks.

Origin of bacterial gene transfer machinery: plasmid conjugation

It is also misleading and confusing to call plasmid conjugation ‘sex’, for its mechanisms are fundamentally different, as are many of its population genetic consequences. It does not combine whole genomes biparentally but typically transfers only a handful of genes unidirectionally. Transmissible bacterial plasmids are sometimes seen as selfish DNA favouring their own evolutionary ends: spreading within the host cell population, sometimes incidentally benefiting the host, eg by adding genes for drug resistance. On such a view viruses and transmissible plasmids use protein assemblies as alternative ways for genetic parasites to get around the prokaryotic world: a viral capsid or a conjugation pilus. The argument that the plasmid transfer machinery evolved for the transfer of plasmid genes rather than to mediate the very much lower frequency transfer of the main bacterial chromosome (Redfield, 2001) is convincing. But that alone does not make it correct to view plasmids as selfish parasites infecting a host, as viruses undoubtedly are. This would be true only if the quantitatively strongest selection acting on them is that for their own horizontal transmission. Present evidence makes this highly questionable (Bergstrom et al, 2000). Unless infection rates in nature are much higher than in laboratory experiments, it would appear that most plasmids are maintained though temporary benefits to the whole cell not merely by infection. However, although transformation is unlikely to be primarily a mechanism for genetic exchange, the studies of Bergstrom et al (2000) make it premature to draw the same conclusion for plasmid conjugation as Redfield (2001) has. It may be more realistic to regard infectious plasmids as auxiliary chromosomes specialised for the intercellular transfer of small numbers of optional genes: infectious heredity rather than true parasites. Some plasmids, especially those lacking useful genes, might be more transient parasites analogous to the B-chromosomes of eukaryotes. Irrespective of whether they are now mainly beneficial or not, their origin might have been purely selfish. Non-infectious plasmids are simply small subsidiary chromosomes that may have been present in some bacteria ever since the origin of chromosomes. They are as much a real part of the cell as the main chromosome.

Because infectious plasmids can be transferred across very wide phylogenetic distances it may not be possible to use cladistic reasoning to work out when they originated. Such elements could have originated at any time during the long evolution of bacteria. It is possible that they did so almost as soon as cells began. If they did, they might have played useful roles in the origin of the first autotrophic cell that must rapidly have evolved thousands of novel genes (Cavalier-Smith, 2001).

For 3,000,000,000 years there was relatively little major innovation after the origin of the photosynthetic negibacterial cenancestor. Yet the pace of evolution during the origin of that cenancestor was immensely faster. It is generally true that when a major novelty like the first car, aeroplane, PC, fish, eukaryote or bacterium arises there is an explosion of invention and innovation, after which change is much slower and less fundamentally innovative. Illegitimate recombination must have played a major role in this origin by generating a great diversity of genes. In principle this could have occurred merely by gene duplication and chimaerisation. Neither homologous recombination nor lateral gene transfer need have been involved. However, we should consider the possibility that lateral gene transfer by plasmids was involved in assembling the genes of the cenancestor. Horizontal transfer might have helped add many genes quickly by combining by illegitimate recombination entirely novel ones that originated independently in distantly related cells. Thus virtually from the beginning, plasmids might have been selected for a cellular benefit and, unlike viruses, need not have been totally selfish.

The DNA pump that transfers DNA in single-stranded form from cell to cell is structurally related in one domain not only to RecA but also to ring-shaped DNA helicases (Egelman, 2001) and could partially have evolved from one. Whether its quaternary similarity to F1 ATPase is phylogenetically significant or coincidental is unclear. Initially this conjugal DNA pump might purely selfishly have transferred only its own gene to other cells, but after acquiring useful genes by illegitimate recombination it could have been additionally selected at the cellular level, necessarily in recipients only, which is sufficient explanation for why all the transfer genes are on the plasmid and not the main chromosome: it is fallacious to regard this is evidence for selfishness (Redfield, 2001) since what matters in this respect is whether selection acts on cell reproduction/survival or solely by virtue of the transfer process. However, the fact that the DNA pump works by coupling to the rather complex type IV secretion channel involving pili (Gomis-Ruth and Coll, 2001) suggests that plasmid transfer may not have originated really early in precellular evolution. The fact that type IV secretion is distantly related to type II secretion involved in flagellar development (Patenge et al, 2001) suggests a common origin. Possibly therefore plasmid conjugation and flagella evolved at a similar time. I have argued that one extant bacterial phylum, Eobacteria, may primitively lack flagella and be the most divergent of all (Cavalier-Smith, 2001). If this is so (it might prove not to be), type IV secretion and conjugal plasmids might not have evolved until after the divergence of Eobacteria from the flagellated bacterial phyla. Although some Eobacteria undoubtedly have plasmids (Meima and Lidstrom, 2000) or conjugation (Ramirez-Arcos et al, 1998), their conjugation machinery might have been acquired later by lateral gene transfer.

Changes to the recombination machinery during the neomuran revolution

While all the DNA handling enzymes are essentially similar in all seven eubacterial phyla, most of which probably diverged from each other around 3.5 Gy ago, they are radically different in all neomura. Often the neomuran enzymes are more complex with a larger number of subunits, some apparently novel; in some cases, as in the DNA polymerases, enzymes were lost and replaced by others. The contrast between these dramatic difference and the fundamental similarity of typical metabolic enzymes across the whole living world has been a great puzzle. But it has a simple explanation. All these changes are direct or indirect coadaptive consequences of the origin of core histones in the common ancestor of archaebacteria and eukaryotes (Cavalier-Smith, 2002a). In eukaryotes nuclear DNA is wound around nucleosome core particles of octameric histones in negative supercoils that are maintained by the tight binding of the highly basic histone tails. In eubacteria the DNA is instead continuously actively negatively supercoiled by DNA gyrase and only loosely associated with much sparser DNA binding proteins apparently unrelated to histones. DNA helicases, polymerases and other DNA binding molecules therefore have to work in a very different molecular environment in eukaryotes from eubacteria. It is therefore not at all surprising that they have to be very different to achieve the same function. DNA replication forks move about 50 times more slowly in eukaryotes because the DNA can only be separated into two strands after being partially dissociated from the nucleosomes. The origin of nucleosomes was the most dramatic change in DNA organisation since cells began, so it is to be expected that it was accompanied by drastic modification to the entire DNA handling machinery; but metabolic enzymes would have been unaffected and thus continued to evolve in their habitual gradualistic fashion though the neomuran revolution which affected only chromatin structure, cell envelope chemistry, ribosome structure and protein synthesis, assembly and secretion (Cavalier-Smith, 2002a).

Eukaryotic nucleosome cores have four different histones: two copies each of the highly conserved H3 and H4 forming the inner core and two each of the more variable H2a and H2b. Archaebacteria have only two kinds of histones, resembling H3 and H4; as their replication forks travel at eubacterial speeds possibly their nucleosomes are more easily dissociated, perhaps because they are only tetramers and their histones lack tails. Histones have been secondarily lost in eukaryotes in peridinean dinoflagellates. Given the homology of the histone fold between eukaryotes and archaebacteria, the absence of histones from crenarchaeotes and some non-thermophilic euryarchaeotes must also be a secondary loss if eukaryotes are sisters of archaebacteria, not derived from them (Cavalier-Smith, 2002a). I have argued that histones originated in the neomuran ancestor as an adaptation to thermophily to stabilise the chromatin more effectively than by active supercoiling by DNA gyrase involving continual ATP hydrolysis (Cavalier-Smith, 2002a).

The ancestral neomuran therefore lost eubacterial DNA gyrase activity and evolved a novel type II DNA topoisomerase (topoisomerase VI), not found in eubacteria; its eukaryotic homologue is restricted to meiosis, making the double stranded breaks for crossing over. The B subunit of topoisomerase VI is distantly related to the DNA gyrase B subunit, but the A subunit (Spo11 in eukaryotes) is much shorter than that of DNA gyrase. DNA gyrase can be artificially converted to a conventional type II topoisomerase by deleting the C-terminal region responsible for the active wrapping of DNA (Kampranis and Maxwell, 1996). I think that this happened naturally in the ancestral neomuran because of the evolution of passive negative supercoiling by histones; as negative supercoiling by DNA gyrase became redundant, mutational truncation of GyrA ceased to be disadvantageous, so the former gyrase rapidly evolved into topoisomerase VI.

Even larger changes occurred in DNA polymerases. Eubacteria have distinct type C replicative polymerases and less processive type A and B (eg DNA polymerase II of Escherichia coli) repair polymerases. The ancestral neomuran replaced the aphidicolin-resistant type C replicative DNA polymerase (polymerase III) alpha subunit by an aphidicolin-resistant type B polymerase. It might have proved better than the old replicator for handling DNA wound round histones. The origin of histones probably affected the sliding clamp that ensures processivity by moving along the DNA with the polymerase. In eubacteria the beta subunit of the replicative polymerase is the sliding clamp, but in neomura the clamp is PCNA (proliferating nuclear antigen and its archaebacterial homologue), a torus-shaped molecule with three identical subunits and little sequence similarity to the eubacterial clamp. The PCNA sliding clamp may have been substituted because of its direct interactions with B-type polymerases (Bruck and O’Donnell, 2001). Neomuran replication factor C is a heteropentameric complex responsible for loading the PCNA sliding clamp onto primed DNA for which no homologues are known in eubacteria. Concomitantly the function of DnaG primase that interacts with the eubacterial replicative DNA helicase (DnaB) in making RNA primers was replaced by a heterodimeric eukaryotic primase that interacts instead with the DNA polymerase. The hexameric DnaB was either replaced by or evolved by drastic modification into the hexameric neomuran replicative helicase Mcm (Matsunaga et al, 2001). A drastic change in the replicative helicase was probably an inevitable concomitant of the origin of nucleosomes.

The neomuran cenancestor also significantly modified its DNA repair machinery, evolving novel Flap endonuclease (FEN-1) and RAD2 DNA repair enzymes. FEN-1 shares an octapeptide that binds the interdomain region of PCNA with neomuran PolB DNA polymerases and several other eukaryotic proteins, and is stimulated by PCNA (Frank et al, 2001). It is involved together with the helicase/endonuclease DNA2 in RNA primer removal as well as excision repair. RadA(archaebacteria)/Rad51 and Dmc (eukaryotes) are the neomuran descendants of RecA which also became hugely changed when nucleosomes evolved. A neomuran DNA helicase must also have taken over the function of the eubacterial RecG helicase, which is absent from neomura.

Pre-eukaryotes and pre-archaebacteria then diverged. Once the originally thermophilic pre-eukaryote began to evolve phagotrophy and perfect the cytoskeleton and endomembrane system, it reverted rapidly to mesophily, since such environments provide more food and the moderate temperatures would be more compatible with the relatively fluid cell surface that phagocytosis entails (Cavalier-Smith, 2002b). Although histones H3 and H4 probably first evolved in the neomuran cenancestor, the larger eukaryotic histone H1, now essential for folding the nucleosomes into solenoids and compact chromosomes for segregation by mitosis or meiosis, was probably derived from a homologue already present in the actinobacterial ancestors of neomura (Cavalier-Smith, 2002b). The archaebacterial cenancestor took the extra step into hyperthermophily, further modifying its chromatin and losing H1.

Origin of sex: cell fusion and syngamy

‘Conjugation first only occurred under unfavourable conditions, and assisted the species to overcome such difficulties’ (Weismann, 1886, p 294)

For over a century biology and genetics have been profoundly influenced by Weismann's (1886) basic misconceptions about the evolutionary significance of sex. His ideas arose when mutation and the physically unavoidable instability of genes were not understood. Like Darwin (1868) he rejected the common view that organisms have ‘an innate tendency to vary’ irrespective of the environment, asserting that in multicellular organisms the genetic material was so intrinsically stable that sex is vital to provide variation on which selection can act. Because Nägeli's idea that the genetic material (idioplasm to him) has a spontaneous tendency to vary and cause evolution was combined with an antipathy to natural selection, without which adaptation is incomprehensible, Weismann (1886) vigorously opposed it. Instead he implausibly asserted that what we now call mutation (a word once used by Darwin (1868, p 396) for changes affecting his gemmules) never occurs in multicellular organisms and that all their variation and evolution arises by sexual reshuffling of a vast store of pre-existing mutations that had all occurred long ago in their unicellular ancestors. Textbook writers seem unaware of this or that Weismann's denial of the inheritance of acquired characters applied only to multicells; in unicells he saw no distinction between germ line and soma and therefore accepted that mutation could readily occur in them by direct effects of the environment, which Darwin considered the main cause of mutation in both unicells and multicells. Weismann's false belief that multicell germ lines are immutable was part and parcel of his rigid theory of the germ plasm and its mistaken theory of multicell differentiation. He did not realise that the examples of long-term stability and stasis he cited were the result of natural selection (purifying or centripetal selection: Simpson, 1944) continually eliminating variants thrown up by an unstable genetic material. Centripetal selection, now usually called stabilizing selection (Schmalhausen, 1949), was understood by Maupertuis and Blyth even before the Darwins (Erasmus and Charles) and Wallace established directional selection as the decisive factor in adaptive evolutionary change. Although Weismann was aware of stabilising selection, correctly invoking its removal as the basic cause of degeneration through disuse, its fundamental importance to the long term continuity of life and the genetic system itself only become fully apparent after the formulation of the idea of mutation/selection equilibria (Haldane, 1927) and the rise and recent flowering of molecular and microbial genetics. Although Darwin was mistaken that use and misuse could cause mutations, and in one assumption of his pangenesis theory designed to explain it, as well as in rejecting spontaneous mutation (like Weismann), his other views on heredity, notably recognising the distinction between mutation and recombination and transmission and developmental genetics and the primary evolutionary importance of mutation in all organisms, were superior to those of Weismann. Darwin's profound contribution to genetics was recognised by de Vries (1889), originator of the idea that individual genes have a pedigree (the very word gene comes via his renaming Darwin's gemmules as pangens) and the great transformer of Darwin's ideas into the modern theory of evolution by mutation and selection (De Vries, 1912).

Belief in the benefits of sex for variability stemmed from evidence in plants of adaptations to outbreeding (Knight, 1799 and Kölreuter, 1809, as cited by Darwin, 1868, p 175). While probably relevant to its maintenance in multicells, this can hardly be the fundamental explanation of its origin, as Weismann (1886) recognised. How do we explain the origin of cell fusion, which halves cell numbers, contrary to the fundamental evolutionary drive to increase them? An early idea was the nutrition theory (Cienkowsky, 1873). In protozoa where sex began sex is often induced by starvation, as also in most of their unicellular fungal and algal descendants. Using the principle that regulation of a biological process may reveal why it evolved (Redfield, 2001), numerous authors have supposed that doubling a protozoan cell's food storage reserves may greatly increase survival of starvation. Under critical famine conditions a mere 10% increase in food store might in principle raise survival rates from zero to 100%; much less improvement than this could make the halving in organismal numbers entailed by fusion an entirely trivial cost.

More generally the average doubling in cell volume caused by syngamy and its subsequent quartering by meiosis may be key factors in the timing of these events in the life cycle, since cell size affects many functions related to growth rate, eg nutrient uptake, predation and viability. An alternation between the benefits of larger and smaller cells (Cavalier-Smith, 1978) may be quantitatively more significant evolutionary determinants of ploidy cycles than those relating to genetic recombination; however the issues are too complex (Maynard Smith and Szathmáry, 1995) to be discussed in detail here. What is clear is that understanding the origin of sex depends on a better understanding of the life cycles of the first eukaryotes. Two neglected factors must enter any realistic picture: cysts and syncytia. Most protist groups form dormant cysts that are often resistant to drying and often pigmented to reduce radiation damage by UV. As dormancy can be prolonged and the cysts may be wafted into the air or exposed on dry surfaces, resistance to radiation damage may be as important as resistance to starvation; survival would be higher if the cysts were diploid not haploid, since in cells that are not actively replicating and thus have no separate daughter DNAs this would allow recombinational repair of double-strand breaks. But why when food reappears revert to haploidy? Because, at a time when survival is intrinsically high making the premium on numbers not survival, there is an immediate two-fold benefit in cell numbers if excystment yields four cells by meiosis not two by mitosis (Cavalier-Smith, 2002b).

Although most writings on the nutritional and repair theories of sex (eg Michod, 1993) are phylogenetically naïve, this is equally true of many that consider only the more conventional role of genetic recombination. My view is that all three factors were probably involved in the origin of sex (Cavalier-Smith, 2002b), even though simulations suggest that faster recombinational removal of recurrent deleterious mutations in haploids and the greater radiation resistance of diploids alone could have driven the origin of a haploid/diploid cycle (Maynard Smith and Szathmáry, 1995). The maintenance of sex in multicells is yet another question: or rather questions, since the phenotypes of animals, vascular plants, bryophytes and their ploidy cycles and tendencies to dioecy or hermaphroditism are so different that the balance of selective forces will differ among them as well as between them and their protozoan ancestors. Also in all cases the nature of the ancestral life cycle, phylogenetic inertia and epigenetic constraints (eg genetic imprinting) affect the ease of losing sex in ways outside the usual population genetics paradigm and thus complicate the issue.

Recent rerooting of the eukaryotic tree near to the Amoebozoa (Cavalier-Smith, 2002b) makes it likely that syngamy and meiosis had already evolved by the time of the eukaryotic cenancestor. Unless the root is actually within the Amoebozoa this is certainly true, since the cenancestor of animals and plants would be one and the same as the cenancestor of all eukaryotes, and there would be no earlier diverging extant eukaryotes. If the root is within Amoebozoa there might be an earlier diverging eukaryote branch of it that might be primitively asexual, but I think it highly probable that sex evolved in the eukaryotic cenancestor almost as soon as the origin of phagotrophy created a flexible cell surface. Absence of a cell wall and the presence of an internal cytoskeleton would have facilitated the membrane fusion and cell merger that constitutes syngamy. Most likely, as conventionally thought, the cenancestral sexual life cycle was an alternation between haploid phagotrophic cells and dormant diploid cysts or zygospores with meiosis during excystation. But as sex is currently unknown in Choanozoa and virtually unstudied in Amoebozoa other than in a few slime moulds, we are far too ignorant of the critical groups to be sure.

A further complication is that cell fusion is relatively widespread among protozoa and more diverse in character than usually appreciated. Some protozoan life cycles involving cell fusion also have nuclear fusion and meiosis, but others apparently do not and are referred to as agamic (Seravin and Goodkov, 1999; Cavalier-Smith, 2002b). The widespread examples of cell fusion apparently not part of sexual cycles are important for understanding the origin of sex since they imply selective advantages for evolving cell fusion in protozoa that have nothing to do with recombination. They also indicate that for early protozoa with a soft naked surface cell fusion is mechanistically easy to evolve and probably did so frequently. It is oversimplified to view organisms as either unicellular or multicellular. Protozoa can be uninucleate or else multinucleate plasmodia or syncytia (the latter distinction is arbitrary). Thus there are three kinds of eukaryote organisms: unicells, multicells and plasmodia, adapted to different broad adaptive zones. We do not know if sex evolved in a unicell or in a plasmodial protozoan. But the trophic advantages of plasmodia are such that even in the prekaryote phase there would probably have been niches for them as well as for uninucleate cells. Syncytia form by cell fusion or by nuclear division without cytokinesis. We may regard primitive eukaryote life cycles as being made up of four processes: cell growth, nuclear division, cytokinesis, and cell fusion. Early protozoan life cycles diversified by coupling or partially uncoupling them. An organism with a syncytial phase must have a more flexible facultative coupling between them than a strictly uninucleate unicell. But even the latter must sometimes accidentally form multinucleate cells by errors in cytokinesis; without a correction mechanism, such errors would accumulate and syncytia evolve. The yeast and mammalian cell cycles involve numerous checkpoints to ensure that one does not end up with anucleate or multinucleate daughters. A syncytial life cycle must also have checkpoints.

Whether the cenancestor was unicellular or syncytial, it probably had a walled resting cyst, so karyogamy and meiosis were probably inserted into a relatively complex life cycle adapted to a feast and famine existence, not into a simple binary fission cell cycle always supplied with food. The common association in protists of syngamy and starvation suggests that its pooling of resources into larger cells may increase survival by more than two-fold to counterbalance the halving of cell numbers it entails. When good growth conditions resume, cell multiplication becomes at a premium and the survivors can divide into a larger number of smaller daughters by multiple fission. Thus the timing of syngamy, encystment and excystment became developmentally controlled by food supply.

A failure in mitosis or an accidental nuclear fusion would make polyploid nuclei. As the fundamental and universal function of meiosis is ploidy reduction it probably evolved to repair such mistakes (Cavalier-Smith, 1995). Mathematical modelling shows that ploidy reduction could have been an effective selective force (Kondrashov, 1994). Its advantage would be stronger the more often polyploidy occurred. Did meiosis evolve before or after the nuclear envelope? If it evolved beforehand (Cavalier-Smith, 1975) there was no distinction between multinuclearity and polyploidy and early sex had meiosis and cell fusion but no nuclear fusion. Nuclear fusion need not have evolved till later.

I agree with Redfield (2001) that because of their huge populations it is naïve to argue that protozoa need sex to provide variety any more than do bacteria. Therefore promotion of genetic exchange can hardly be the primary or sole reason why sex evolved so belatedly in eukaryotes alone. The primary reason was the prior evolution of a soft surface and the internal cytoskeleton that made cell fusion mechanistically easy and the various immediate physiological advantages of cell fusion. But if such fusion evolved in the first eukaryote that was simultaneously making thousands of novel genes and radical cellular innovations, it is possible that the success of this first hopeful monster able to devour other cells by phagocytosis was helped by an ability to use cell fusion to combine rare mutations that had never before been advantageous in a synergistic way in a single cell. Sex may have hitchhiked on their success during this ‘orgy of variation’, to use Haldane's (1932, p 104) felicitous phrase. But its loss in bdelloid rotifers refutes Weismann's view that asexual animals cannot mutate and evolve. This secondary loss was probably an indirect consequence of their secondary miniaturisation and consequent immense populations; they are not rare big fierce animals but secondary microorganisms, showing the same cosmopolitanism and insensitivity to evolutionary forces dependent on small populations – in marked contrast to macroorganisms (Finlay, 1998).

Origin of meiosis

Ancestral meiosis was almost certainly two-step. The idea that single-step meiosis was the ancestral state (Cleveland, 1947) is almost certainly incorrect. Although Miozoa (dinoflagellates, sporozoa and protalveolates) were once thought to have single-step meiosis, they actually have a normal two-step one. Whether microsporidia or the amitochondrial flagellates studied by Cleveland (1956) have a single-step meiosis or a normal two-step one is unclear (Cavalier-Smith, 1995), but irrelevant to the origin of meiosis, as we now know that microsporidia are highly derived fungi and those amitochondrial flagellates are all derived not early diverging eukaryotes (Cavalier-Smith, 2002b). Whether single-step meiosis exists at all is doubtful. Meiosis is a modification of a mitotic cell cycle, the controls of which would make it much easier to evolve a two-step meiosis than a single step one (Cavalier-Smith, 1981).

Ploidy reduction by meiosis requires four things: (1) homology search to enable chromosome pairing; (2) a delay in splitting sister centromeres until the second meiotic division; (3) blocking DNA replication between meiosis I and II; and (4) reorienting sister centromeres towards the same pole. Homology search is mediated by base pairing between homologues – fundamentally a DNA renaturation process that is the fundamental determinant of the proportionality between meiotic duration and genome size (Cavalier-Smith, 1995). The key step in the evolution of meiosis was the blocking of centromere separation in meiosis I (Cavalier-Smith, 1981). This would ensure that the second meiotic division proceeds without an intervening meiosis, if centromere splitting is a necessary signal for switching from the mitotic state, where replication is inhibited, to the growth state where it is allowed (Cavalier-Smith, 1981). The molecular basis of this is now partially elucidated. Sister chromatids are held together by cohesin proteins, which cross-link them by binding to Mcm proteins (Hirano, 2000). Mitotic chromosome and centromere splitting are caused by a protease that digests cohesins. The cell cycle switch from mitosis to the growth state (G1 phase) where replication is allowed is also by proteolytic digestion – of cyclin B attached to the cyclin-dependent kinase which phosphorylates the proteins whose activity has to change (Iwabuchi et al, 2000). A direct causal connection between centromere splitting and cyclin digestion has not yet been demonstrated, but if there is one, then meiosis could have originated by a single key change (Cavalier-Smith, 1981): blocking centromeric cohesin digestion in meiosis I.

All eukaryotes have homologous cohesins, which must have arisen in their cenancestor during the origin of mitosis. Animals and fungi at least have distinct meiotic cohesins. These necessarily assemble during premeiotic S-phase (Smith et al, 2001), which makes meiosis necessarily two-step (Watanabe et al, 2001). It would have been much easier to insert meiosis-specific cohesins onto sister centromeres by the normal mechanisms used for mitosis than to evolve an entirely novel cell cycle control allowing a reduction division without a preceding S-phase. In meiosis I these cohesins are digested in the chromosome arms, but not at the centromeres, which therefore remain unsplit (Buonomo et al, 2000). Likewise cyclin B remains partially undigested between meiosis I and II (Iwabuchi et al, 2000). Thus this interval is not a true interphase and is incompetent to initiate replication; as assumed (Cavalier-Smith, 1981) the cell remains in M phase until anaphase II when the centromeres split, cyclin B is digested and it reverts to G1. Thus evolution of a block to centromere splitting is a necessary and sufficient mechanism for the ancestral mechanism of meiosis (Cavalier-Smith, 1981) and explains why it is two-step. In baker's yeast centromere reorientation to ensure that sister centromeres bind only spindle fibres attached to the same pole is mediated by the protein monopolin (Toth et al, 2000): as other organisms lack clear homologues, it is unclear whether this is a general mechanism or yet another simplified characteristic of baker's yeast.

Chiasmata and/or a synaptonemal complex are also usually essential for accurate meiotic chromosome disjunction during ploidy reduction. The odd cases where one or both of these is dispensed with seem phylogenetically derived, so both would have been present in the cenancestral meiosis. Chiasmata are initiated by double-strand breaks and require exonucleases to generate single-stranded DNA and a RecA protein homologue to allow this to invade an homologous duplex to form hybrid DNA and a Holliday junction resolvase and repair enzymes to complete the crossing over. The double-strand breaks are made by Spo11, part of a heterodimeric enzyme homologous with topoisomerase VI of archaebacteria, which probably evolved from DNA gyrase in the neomuran ancestor (Cavalier-Smith, 2002a). The RecA homologue is Rad51 or Dmc1, which also evolved from a eubacterial enzyme in the neomuran ancestor, as did the characteristic neomuran Holliday junction resolvase. Thus the basic molecular machinery for crossing over was already present in the neomuran ancestor, where it was used for recombinational repair. Although many of these genes are now meiosis-specific and must have special meiosis promoters, this need not have been so initially; they could have been constitutive. The precise role for the synaptonemal complex is unclear. Since ploidy reduction can occur in its absence it is probably there to increase the efficiency and reliability of meiosis, and so could have been added after it began, being apparently less centrally important than cohesins (Pelttari et al, 2001).

It is likely that recombination caused by sex is an incidental consequence of crossing over that evolved primarily to ensure disjunction during ploidy reduction, rather than the major selective advantage for the origin of sex. Although recombination has some advantages they are probably small compared with those of ploidy reduction – whether through correcting the errors in division or accidental fusion. The high frequency of loss of sex in protists is consistent with the theoretical conclusions that recombination has a clear advantage over clonal reproduction only under special conditions (Lenormand and Otto, 2000). It can speed up evolution by combining rare mutations together. Meiosis probably originated as a cell cycle repair mechanism to correct accidental polyploidy when the eukaryotic cell cycle was first evolving. It became temporally coupled to the induction of Spo11 to make double-strand breaks and allow recombinational repair prior to the germination of resting cysts. The incidental recombinational effect of this could itself also have been beneficial if this occurred during the other processes of eukaryogenesis, since mutations in hundreds, perhaps thousands of genes might simultaneously have been subject to directional selection (Cavalier-Smith, 1987a, 2002b). Thus sex could have begun during the largest transformation in cell organisation in the history of life, being swept to fixation by the success of the phagotrophs that its recombinational side effect helped to establish by combining valuable novelties in a single cell and enabling the cost-efficient culling of hopeless monsters. Its wide persistence in protists may have as much to do with epigenetic constraints caused by its causal linking to encystment and excystment as to its recombinational advantages, too slender to prevent frequent losses.

Envoi: continuity and discontinuity in the evolution of recombination

It is now abundantly clear that the recombination machinery arose in an ancestor of the first bacterial cell simply to ensure accurate DNA replication and segregation without significant loss of viability and was rapidly perfected. For 3 Gy it was maintained with little change whilst stasis rather than progressive evolution generally prevailed. Its consequences for genetic exchange among cells and their genetic parasites were incidental side effects – though extensive, relatively minor in overall biological impact. The recombination machinery, through repairing stalled replication forks, played a fundamental role in its own perpetuation without radical change over thousands of millions of years.

Then around 850 million years ago in an orgy of variation, the neomuran revolution markedly changed the molecular details of the molecular machinery to adapt it to the newly formed nucleosomes, but without radically changing its fundamental physical basis, or its biological context in the archaebacterial products of this revolution, which remained fundamentally bacteria in all major respects. Only in their eukaryote sisters was there a really radical upheaval in the cellular environment of the machinery. For the first time, cell fusion and ploidy cycles became a regular part of cell life histories; there was no phylogenetic continuity between the bacterial machinery of plasmid conjugation and eukaryotic syngamy. But the biophysical basis of DNA cutting, homology search, and rejoining remained fundamentally unaltered from their 3 Gy older ancestors. The major innovations apart from cell and nuclear fusion were in cell cycle controls over replication and segregation and in the mechanochemical basis for segregation and cell division that yielded the mitotic and meiotic cell cycles. Yet despite this dramatic discontinuity in cell biology, the most fundamental in the entire history of life, one can still plausibly reconstruct how a bacterial cell and reproductive cycle could have viably changed stepwise into eukaryotic ones (Cavalier-Smith, 2002b).

Furthermore, despite a novel cell cycle and structure, there was also fundamental continuity in the main selective forces behind the origin of true eukaryotic sex, if (as argued here and elsewhere: Cavalier-Smith, 2002a, Cavalier-Smith, 2002b) the primary function of sexual fusion and meiosis in the first protozoa was not to mediate genetic exchange, but to allow a compromise between selection for reproductive efficiency and viability in life cycles dominated by a periodic alternation between actively multiplying predators of other cells and starved dormant cysts, for which viability rather than reproductive rate was all important. Consequences for population genetics, transposon and intron spread (Cavalier-Smith, 1993) and the first origin of ‘biological species’ (Cavalier-Smith, 1991) were epiphenomena independent of, and perhaps largely unnecessary for, the great increases in organismic complexity and diversity that the origin of the eukaryote cytoskeleton and endomembrane system stimulated during the Cambrian explosion of protozoa, animals, fungi, plants and chromists.