Main

Biological sulphate reduction is part of the global sulphur cycle, ubiquitous in the earth's anaerobic environments, and is essential to the basal workings of the biosphere. Growth by sulphate reduction is restricted to relatively few groups of prokaryotes; all but one of these are Eubacteria, the exception being the archaeal sulphate reducers in the Archaeoglobales1,2. These organisms are unique in that they are unrelated to other sulphate reducers, and because they grow at extremely high temperatures3. The known Archaeoglobales are strict anaerobes, most of which are hyperthermophilic marine sulphate reducers found in hydrothermal environments2,4 and in subsurface oil fields5. High-temperature sulphate reduction by Archaeoglobus species contributes to deep subsurface oil-well ‘souring’ by producing iron sulphide, which causes corrosion of iron and steel in oil- and gas-processing systems5.

Archaeoglobus fulgidus VC-16 (refs 2, 4) is the type strain of the Archaeoglobales. Cells are irregular spheres with a glycoprotein envelope and monopolar flagella. Growth occurs between 60 and 95 °C, with optimum growth at 83 °C and a minimum division time of 4 h. The organism grows organoheterotrophically using a variety of carbon and energy sources, but can grow lithoautotrophically on hydrogen, thiosulphate and carbon dioxide6. We sequenced the genome of A. fulgidus strain VC-16 as an example of a sulphur-metabolizing organism and to gain further insight into the Archaea7,8 through genomic comparison with Methanococcus jannaschii9.

General features of the genome

The genome of A. fulgidus consists of a single, circular chromosome of 2,178,400 base pairs (bp) with an average of 48.5% G+C content (Fig. 1). There are three regions with low G+C content (<39%), two rich in genes encoding enzymes for lipopolysaccharide (LPS) biosynthesis, and two regions of high G+C content (>53%), containing genes for large ribosomal RNAs, proteins involved in haem biosynthesis ( hemAB), and several transporters (Table 1). Because the origins of replication in Archaea are not characterized, we arbitrarily designated base pair one within a presumed non-coding region upstream of one of three areas containing multiple short repeat elements.

Figure 1: Circular representation of the A.fulgidus genome.
figure 1

The outer circle shows predicted protein-coding regions on the plus strand classified by function according to the colour code in Fig. 2 (except for unknowns and hypotheticals, which are in black). Second circle shows predicted protein-coding regions on the minus strand. Third and fourth circles show IS elements (red) and other repeats (green) on the plus and minus strand. Fifth and sixth circles show tRNAs (blue), rRNAs (red) and sRNAs (green) on the plus and minus strand, respectively.

Table 1 Genome features

Open reading frames. Two independent coding analysis programs and BLASTX10 searches (see Methods) predicted 2,436 ORFs (Figs 1 , 2 (PDF File: 431k), Tables 1, 2 (PDF File: 125k)) covering 92.2% of the genome. The average size of the A. fulgidus ORFs is 822 bp, similar to that of M. jannaschii (856 bp), but smaller than that in the completely sequenced eubacterial genomes (949 bp). All ORFs were searched against a non-redundant protein database, resulting in 1,797 putative identifications that were assigned biological roles within a classification system adapted from ref. 11. Predicted start codons are 76% ATG, 22% GTG and 2% TTG. Unlike M. jannaschii , where 18 inteins were found in coding regions, no inteins were identified in A. fulgidus . Compared with M. jannaschii , A. fulgidus contains a large number of gene duplications, contributing to its larger genome size. The average protein relative molecular mass ( M r) in A. fulgidus is 29,753, ranging from 1,939 to266,571, similar to that observed in other prokaryotes. The isoelectric point (pI) of predicted proteins among sequenced prokaryotes exhibits a bimodal distribution with peaks at pIs of approximately 5.5 and 10.5. The exceptions to this are Mycoplasma genitalium in which the distribution is skewed towards high pI (median, 9.8) and A. fulgidus where the skew is toward low pI (median, 6.3).

Multigene families. In A. fulgidus 719 genes (30% of the total) belong to 242 families with two or more members (Table 1). Of these families, 157 contained genes with biological roles. Most of these families contain genes assigned to the ‘energy metabolism’, ‘transport and binding proteins’, and ‘fatty acid and phospholipid metabolism’ categories (Table 2 (PDF File: 125k)). The superfamily of ATP-binding subunits of ABC transporters is the largest, containing 40 members. The importance of catabolic degradation and signal recognition systems is reflected by the presence of two large superfamilies: acyl-CoA ligases and signal-transducing histidine kinases. A. fulgidus does not contain a homologue of the large 16-member family found in M. jannaschii9.

Repetitive elements. Three regions of the A. fulgidus genome contain short (<40 bp) direct repeats (Table 1). Two regions (SR-1A and SR-1B) contain 48 and 60 copies, respectively, of an identical 30-bp repeat interspersed with unique sequences averaging 40 bp. The third region (SR-2) contains 42 copies of a 37-bp repeat similar in sequence to the SR-1 repeat and interspersed with unique sequence averaging 41 bp. These repeated sequences are similar to the short repeated sequences found in M. jannaschii .

Nine classes of long (>500 bp) repeated sequences with 95% sequence identity were found (LR1-LR9; Table 1). LR-3 is a novel element with 14-bp inverted repeats and two genes, one of which has weak similarity to a transposase from Halobacterium salinarium . One copy of LR-3 interrupts AF2090, a homologue of a large M.jannaschii gene encoding a protein of unknown function. LR-4 and LR-6 encode putative transposases not identified in M. jannaschii that may represent IS elements. The remaining LR elements are not similar to known IS elements.

Central intermediary and energy metabolism

Sulphur oxide reduction may be the dominant respiratory process in anaerobic marine and freshwater environments, and is an important aspect of the sulphur cycle in anaerobic ecosystems12. In this pathway, sulphate (SO42−) is first activated to adenylylsulphate (adenosine-5′-phosphosulphate; APS), then reduced to sulphite and subsequently to sulphide1,13 (Fig. 3). The most important enzyme in dissimilatory sulphate reduction, adenylylsulphate reductase, reduces the activated sulphate to sulphite, releasing AMP. In A.fulgidus , the APS reductase has a high degree of similarity and identical physiological properties to APS reductases in sulphate-reducing delta proteobacteria14. A desulphoviridin-type sulphite reductase then adds six electrons to sulphite to produce sulphide. As in the Eubacteria, three sulphite-reductase genes, dsrABD , constitute an operon. The genes for adenylylsulphate reductase and sulphate adenylyltransferase reside in a separate operon. In A. fulgidus , sulphate can be replaced as an electron acceptor by both thiosulphate (S2O32−) and sulphite (SO32−), but not by elemental sulphur.

Figure 3: An integrated view of metabolism and solute transport in A.fulgidus .
figure 2

Biochemical pathways for energy production, biosynthesis of organic compounds, and degradation of amino acids, aldehydes and acids are shown with the central components of A. fulgidus metabolism, sulphate, lactate and acetyl-CoA highlighted. Pathways or steps for which no enzymes were identified are represented by a red arrow. A question mark is attached to pathways that could not be completely elucidated. Macromolecular biosynthesis of RNA, DNA and ether lipids have been omitted. Membrane-associated reactions that establish the proton-motive force (PMF) and generate ATP (electron transport chain and V1V0-ATPase) are linked to cytosolic pathways for energy production. The oxalate-formate antiporters ( oxlT) may also contribute to the PMF by mediating electrogenic anion exchange. Each gene product with a predicted function in ion or solute transport is illustrated. Proteins are grouped by substrate specificity with transporters for cations (green), anions (red), carbohydrates/organic alcohols/acids (yellow), and amino acids/peptides/amines (blue) depicted. Ion-coupled permeases are represented by ovals ( mae1 , exuT , panF , lctP , arsB , cynX , napA / nhe2 , amt , feoB, trkAH , cat and putP encode transporters for malate, hexuronate, pantothenate, lactate, arsenite, cyanate, sodium, ammonium, iron (II), potassium, arginine/lysine and proline, respectively). ATP-binding cassette (ABC) transport systems are shown as composite figures of ovals, diamonds and circles ( proVWX , glnHPQ , dppABCDF , potABCD , braCDEFG , hemUV , nrtBC , cysAT , pstABC , rbsAC , rfbAB correspond to gene products for proline, glutamine, dipeptide, spermidine/putrescine, branch-chain amino acids, iron (III), nitrate, sulphate, phosphate, ribose and polysialic acid transport, respectively). All other porters drawn as rectangles ( glpF , glycerol uptake facilitator; copB , copper transporting ATPase; corA , magnesium and cobalt transporter). Export and import of solutes is designated by arrows. The number of paralogous genes encoding each protein is indicated in brackets for cytoplasmic enzymes, or within the figure for transporters. Abbreviations: acs , acetyl-CoA synthetase; aor , aldehyde ferredoxin oxidoreductase; aprAB , adenylylsulphate reductase; aspBC , aspartate aminotransferase; cdh , acetyl-CoA decarbonylase/synthase complex; cysC , adenylylsulphate 3-phosphotransferase; dld , d-lactate dehydrogenase; dsrABD , sulphite reductase; eno , enolase; fadA/acaB , 3-ketoacyl-CoA thiolase; fadD , long-chain-fatty-acid-CoA ligase; fad , enoyl-CoA hydratase; fadE ( acd), acyl-CoA dehydrogenase; glpA , glycerol-3-phosphate dehydrogenase; glpK , glycerol kinase; gltB , glutamate synthase; hbd , 3-hydroxyacyl-CoA dehydrogenase; ilvE , branched-chain amino-acid aminotransferase; iorAB , indolepyruvate ferredoxin oxidoreductase; korABDG , 2-ketoglutarate ferredoxin oxidoreductase; lldD , l-lactate dehydrogenase; mcmA , methylmalonyl-CoA mutase; mdhA , l-malate dehydrogenase; oadAB , oxaloacetate decarboxylase; orAB , 2-oxoacid ferredoxin oxidoreductase; pflD , pyruvate formate lysase 2; porABDG , pyruvate ferredoxin oxidoreductase; ppsA , phosphoenolpyruvate synthase; prsA , ribose-phosphate pyrophosphokinase; sucAB , 2-ketoglutarate dehydrogenase; sat , sulphate adenylyltransferase; TCA, tricarboxylic acid cycle; vorABDG , 2-ketoisovalerate ferredoxin oxidoreductase.

A. fulgidus VC-16 has been shown to use lactate, pyruvate, methanol, ethanol, 1-propanol and formate as carbon and energy sources2. Glucose has been described as a carbon source1, but neither an uptake-transporter nor a catabolic pathway could be identified. Although it has been reported that A. fulgidus is incapable of growth on acetate6, multiple genes for acetyl-CoA synthetase (which converts acetate to acetyl-CoA) were found. The organism may degrade a variety of hydrocarbons and organic acids because of the presence of 57 β-oxidation enzymes, at least one lipase, and a minimum of five types of ferredoxin-dependent oxidoreductases (Fig. 3). The predicted β-oxidation system is similar to those in Eubacteria and mitochondria, and has not previously been described in the Archaea. Escherichia coli requires both the fadD and fadL gene products to import long-chain fatty acids across the cell envelope into the cytosol15. A. fulgidus has 14 acyl-CoA ligases related to FadD, but as expected given that it has no outer membrane, no FadL. In E. coli , FadB has several metabolic functions, but in A. fulgidus these functions seem to be distributed among separate enzymes. For example, AF0435 encodes an orthologue of enoyl-CoA hydratase and resembles the amino-terminal domain of FadB. This gene is immediately upstream of a gene encoding an orthologue of 3-hydroxyacyl-CoA dehydrogenase that resembles the carboxy-terminal domain of FadB.

Acetyl-CoA is degraded by A. fulgidus through a C1-pathway, not by the citric acid cycle or glyoxylate bypass6,16,17. This degradation is catalysed through the carbon monoxide dehydrogenase (CODH) pathway that consists of a five-subunit acetyl-CoA decarboxylase/synthase complex (ACDS) and five enzymes that are typically involved in methanogenesis18. In A. fulgidus , however, reverse methanogenesis occurs, resulting in CO2 production. All of the enzymes and cofactors of methanogenesis from formylmethanofuran to N5-methyltetrahydromethanopterin are used, but the absence of methyl-CoM reductase eliminates the possibility of methane production by conventional pathways. Production of trace amounts of methane (<0.1 µmol ml−1)19 is probably a result of the reduction of N5-methyltetrahydromethanopterin to methane and tetrahydromethanopterin by carbon monoxide (CO) dehydrogenase.

A. fulgidus also contains genes suggesting it has a second CO dehydrogenase system, homologous to that which enables Rhodospirillum rubrum to grow without light using CO as its sole energy source. Genes were detected for the nickel-containing CO dehydrogenase (CooS), an iron–sulphur redox protein, and a protein associated with the incorporation of nickel in CooS. These represent elements of a system that could catalyse the conversion of CO and H2O to CO2 and H2.

In contrast to M. jannaschii , A. fulgidus contains genes representing multiple catabolic pathways. Systems include CoA-SH-dependent ferredoxin oxidoreductases specific for pyruvate, 2-ketoisovalerate, 2-ketoglutarate and indolepyruvate, as well as a 2-oxoacid with little substrate specificity20,21. Four genes with similarity to the tungsten-containing aldehyde ferredoxin oxidoreductase were also found22.

Biochemical pathways characteristic of eubacterial metabolism, including the pentose-phosphate pathway, the Entner–Doudoroff pathway, glycolysis and gluconeogenesis, are either completely absent or only partly represented (Fig. 3). A. fulgidus does not have typical eubacterial polysaccharide biosynthesis machinery, yet it has been shown to produce a protein and carbohydrate-containing biofilm23. Nitrogen is obtained by importing inorganic molecules or degrading amino acids (Fig. 3); neither a glutamate dehydrogenase nor a relevant fix or nif gene is present.

The F420H2:quinone oxidoreductase complex24 is recognized as the main generator of proton-motive force. However, our analysis indicates the presence of heterodisulphide reductase and several molybdopterin-binding oxidoreductases, with polysulphide, nitrate, dimethyl sulphoxide, and thiosulphate as potential substrates, which might contribute to energizing the cell membrane. A. fulgidus contains a large number of flavoproteins, iron–sulphur proteins and iron-binding proteins that contribute to the general intracellular flow of electrons (Fig. 3). Detoxification enzymes include a peroxidase/catalase, an alkyl-hydroperoxide reductase, arsenate reductase, and eight NADH oxidases, presumably catalysing the four-electron reduction of molecular oxygen to water, with the concurrent regeneration of NAD.

Transporters

A. fulgidus may synthesize several transporters for the import of carbon-containing compounds, probably contributing to its ability to switch from autotrophic to heterotrophic growth5. Both M. jannaschii and A. fulgidus have branched-chain amino-acid ABC transport systems and a transporter for the uptake of arginine and lysine. A. fulgidus encodes proteins for dipeptide, spermidine/putrescine, proline/glycine-betaine and glutamine uptake, as well as transporters for sugars and acids, rather like the membrane systems described in eubacterial heterotrophs. These compounds provide the necessary substrates for numerous biosynthetic and degradative pathways (Fig. 3).

Many A. fulgidus redox proteins are predicted to require iron. Correspondingly, iron transporters have been identified for the import of both oxidized (Fe3+) and reduced (Fe2+) forms of iron. There are duplications in functional and regulatory genes in both systems. The uptake of Fe3+ may depend on haemin or a haemin-like compound because A. fulgidus has orthologues to the eubacterial hem transport system proteins, HemU and HemV. A. fulgidus may also use the regulatory protein Fur to modulate Fe3+ transport; this protein is not present in M. jannaschii . Fe2+ uptake occurs through a modified Feo system containing FeoB. This is the third example of an isolated feoB gene: M. jannaschii and Helicobacter pylori also appear to lack feoA , implying that FeoA is not essential for iron transport in these organisms.

A complex suite of proteins regulates ionic homeostasis. Ten distinct transporters facilitate the flux of the physiological ions K+, Na+, NH4+, Mg2+, Fe2+, Fe3+, NO3, SO42− and inorganic phosphate (Pi). Most of these transporters have homologues in M. jannaschii and are therefore likely to be critical for nutrient acquisition during autotrophic growth. A. fulgidus has additional ion transporters for the elimination of toxic compounds including copper, cyanate and arsenite. As in M. jannaschii , the A. fulgidus genome contains two paralogous operons of cobalamin biosynthesis-cobalt transporters, cbiMQO .

Sensory functions and regulation of gene expression

Consistent with its extensive energy-producing metabolism and versatile system for carbon utilization, A. fulgidus has complex sensory and regulatory networks. These networks contain over 55 proteins with presumed regulatory functions, including members of the ArsR, AsnC and Sir2 families, as well as several iron-dependent repressor proteins. There are at least 15 signal-transducing histidine kinases, but only nine response regulators; this difference suggests there is a high degree of cross-talk between kinases and regulators. Only four response regulators appear to be in operons with histidine kinases, including those in the methyl-directed chemotaxis system (Che), which lies adjacent to the flagellar biosynthesis operon. Although rich in regulatory proteins, A. fulgidus apparently lacks regulators for response to amino-acid and carbon starvation as well as to DNA damage. Finally, A. fulgidus contains a homologue of the mammalian mitochondrial benzodiazepine receptor, which functions as a sensor in signal-transduction pathways25. These receptors have been previously identified only in Proteobacteria and Cyanobacteria25.

Replication, repair and cell division

A. fulgidus possesses two family B DNA polymerases, both related to the catalytic subunit of the eukaryal delta polymerase, as previously observed in the Sulfolobales26. It also has a homologue of the proofreading ε subunit of E. coli Pol III, not previously observed in the Archaea. The DNA repair system is more extensive than that found in M. jannaschii , including a homologue of the eukaryal Rad25, a 3-methyladenine DNA glycosylase, and exodeoxynuclease III. As well as reverse gyrase, topoisomerase I (ref. 9), and topoisomerase VI (ref. 27), the genes for the first archaeal DNA gyrase were identified.

A. fulgidus lacks a recognizable type II restriction-modification system, but contains one type I system. In contrast, two type II and three type I systems were identified in M. jannaschii . No homologue of the M. jannaschii thermonuclease was identified.

The cell-division machinery is similar to that of M. jannaschii , with orthologues of eubacterial fts and eukaryal cdc genes. However, several cdc genes found in M. jannaschii , including homologues of cdc23 , cdc27 , cdc47 and cdc54 , appear to be absent in A. fulgidus .

Transcription and translation

A. fulgidus and M. jannaschii have transcriptional and translational systems distinct from their eubacterial and eukaryal counterparts. In both, the RNA polymerase contains the large universal subunits and five smaller subunits found in both Archaea and eukaryotes. Transcription initiation is a simplified version of the eukaryotic mechanism28,29. However, A. fulgidus alone has a homologue of eukaryotic TBP-interacting protein 49 not seen in M. jannaschii , but apparently present in Sulfolobus solfactaricus .

Translation in A. fulgidus parallels M. jannaschii with a few exceptions. The organism has only one rRNA operon with an Ala-tRNA gene in the spacer and lacks a contiguous 5S rRNA gene. Genes for 46 tRNAs were identified, five of which contain introns in the anticodon region that are presumably removed by the intron excision enzyme EndA. The gene for selenocysteine tRNA (SelC) was not found, nor were the genes for SelA, SelB and SelD. With the exception of Asp-tRNAGTC and Val-tRNACAC, tRNA genes are not linked in the A. fulgidus genome. The RNA component of the tRNA maturation enzyme RNase P is present. Both A. fulgidus and M. jannaschii appear to possess an enzyme that inserts the tRNA-modified nucleoside archaeosine, but only A. fulgidus has the related enzyme that inserts the modified base queuine.

Both A. fulgidus and M. jannaschii lack glutamine synthetase and asparagine synthetase; the relevant tRNAs are presumably aminoacylated with glutamic and aspartic acids, respectively. An enzymatic in situ transamidation then converts the amino acid to its amide form, as seen in other Archaea and in Gram-positive Eubacteria30. Indeed, genes for the three subunits of the Glu-tRNA amidotransferase ( gatABC) have been identified in A. fulgidus . The Lys aminoacyl-tRNA synthetase in both organisms is a class I-type, not a class II-type31. A. fulgidus possesses a normal tRNA synthetase for both Cys and Ser, unlike M. jannaschii in which the former was not identifiable and the latter was unusual9.

M. jannaschii has a single gene belonging to the TCP-1 chaperonin family, whereas A. fulgidus has two that encode subunits α and β of the thermosome. Phylogenetic analysis of the archaeal TCP-1 family indicates that these A. fulgidus genes arose by a recent species-specific gene duplication, as is the case for the two subunits of the Thermoplasma acidophilum thermosome32 and the Sulfolobus shibatae rosettasome33. As in M. jannaschii , no dnaK gene was identified.

Biosynthesis of essential components

Like most autotrophic microorganisms, A. fulgidus is able to synthesize many essential compounds, including amino acids, cofactors, carriers, purines and pyrimidines. Many of these biosynthetic pathways show a high degree of conservation between A. fulgidus and M. jannaschii . These two Archaea are similar in their biosynthetic pathways for siroheme, cobalamin, molybdopterin, riboflavin, thiamin and nictotinate, the role category with greatest conservation between these two organisms being amino-acid biosynthesis. Of 78 A. fulgidus genes assigned to amino-acid biosynthetic pathways, at least 73 (94%) have homologues in M. jannaschii . For both archaeal species, amino-acid biosynthetic pathways resemble those of Bacillus subtilis more closely than those of E. coli . For example, in A. fulgidus and M. jannaschii , tryptophan biosynthesis is accomplished by seven enzymes, TrpA, B, C, D, E, F, G as in B. subtilis , rather than by five enzymes, TrpA, B, C, D, E (including the bifunctional TrpC and TrpD) as found in E. coli .

No biotin biosynthetic genes were identified, yet biotin can be detected in A. fulgidus cell extracts34, and several genes encode a biotin-binding consensus sequence. Similarly, A. fulgidus lacks the genes for pyridoxine biosynthesis although pyridoxine can be found in cell extracts (albeit at lower levels than seen in E. coli and several Archaea34). No gene encoding ferrochelatase, the terminal enzyme in haem biosynthesis, has been identified, although A. fulgidus is known to use cytochromes34. These cofactors may be obtained by mechanisms that we have not recognized. Although all of the enzymes required for pyrimidine biosynthesis appear to be present, three enzymes in the purine pathway (GAR transformylase, AICAR formyltransferase and the ATPase subunit of AIR carboxylase) have not been identified, presumably because they exist as new isoforms.

The Archaea share a unique cell membrane composed of ether lipids containing a glycerophosphate backbone with a 2,3- sn stereochemistry35 for which there are multiple biosynthetic pathways36. In the case of Halobacterium cutirubrum , the backbone is apparently obtained by enantiomeric inversion of sn -glycerol-3-phosphate; in Sulfolobus acidocaldarius and Methanobacterium thermoautotrophicum , sn -glycerol-1-phosphate dehydrogenase builds the backbone from dihydroxyacetonephosphate. An orthologue of sn -glycerol-1-phosphate dehydrogenase has been identified in A.fulgidus , suggesting that the latter pathway is present.

Conclusions

Although A. fulgidus has been studied since its discovery ten years ago1, the completed genome sequence provides a wealth of new information about how this unusual organism exploits its environment. For example, its ability to reduce sulphur oxides has been well characterized, but genome sequence data demonstrate that A. fulgidus has a great diversity of electron transport systems, some of unknown specificity. Similarly, A. fulgidus has been characterized as a scavenger with numerous potential carbon sources, and its gene complement reveals the extent of this capability. A. fulgidus appears to obtain carbon from fatty acids through β-oxidation, from degradation of amino acids, aldehydes and organic acids, and perhaps from CO.

A. fulgidus has extensive gene duplication in comparison with other fully sequenced prokaryotes. For example, in the fatty acid and phospholipid metabolism category, there are 10 copies of 3-hydroxyacyl-CoA dehydrogenase, 12 copies of 3-ketoacyl-CoA thiolase, and 12 of acyl-CoA dehydrogenase. The duplicated proteins are not identical, and their presence suggests considerable metabolic differentiation, particularly with respect to the pathways for decomposing and recycling carbon by scavenging fatty acids. Other categories show similar, albeit less dramatic, gene redundancy. For example, there are six copies of acetyl-CoA synthetase and four aldehyde ferredoxin oxidoreductases for fermentation, as well as four copies of aspartate aminotransferase for amino-acid biosynthesis. These observations, together with the large number of paralogous gene families, suggest that gene duplication has been an important evolutionary mechanism for increasing physiological diversity in the Archaeoglobales.

A comparison of two archaeal genomes is inadequate to assess the diversity of the entire domain. Given this caveat, it is nevertheless possible to draw some preliminary conclusions from the comparison of M. jannaschii and A. fulgidus . A comparison of the gene content of these Archaea reveals that gene conservation varies significantly between role categories, with genes involved in transcription, translation and replication highly conserved; approximately 80% of the A. fulgidus genes in these categories have homologues in M. jannaschii . Biosynthetic pathways are also highly conserved, with approximately 80% of the A. fulgidus biosynthetic genes having homologues in M. jannaschii . In contrast, only 35% of the A. fulgidus central intermediary metabolism genes have homologues, reflecting their minimal metabolic overlap.

Over half of the A. fulgidus ORFs (1,290) have no assigned biological role. Of these, 639 have no database match. The remaining 651, designated ‘conserved hypothetical proteins’, have sequence similarity to hypothetical proteins in other organisms, two-thirds with apparent homologues in M. jannaschii . These shared hypothetical proteins will probably add to our understanding of the genetic repertoire of the Archaea. Analysis of the A. fulgidus and other archaeal and eubacterial genomes will provide the information necessary to begin to define a core set of archaeal genes, as well as to better understand prokaryotic diversity.

Methods

Whole-genome random sequencing procedure. The type strain, A. fulgidus VC-16, was grown from a culture derived from a single cell isolated by optical tweezers37 and provided by K. O. Stetter (University of Regensburg). Cloning, sequencing and assembly were essentially as described previously for genomes sequenced by TIGR9,38,39,40. One small-insert and one medium-insert plasmid library were generated by random mechanical shearing of genomic DNA. One large-insert lambda (λ) library was generated by partial Tsp 509I digestion and ligation to λ-DASHII/ Eco RI vector (Stratagene). In the initial random sequencing phase, 6.7-fold sequence coverage was achieved with 27,150 sequences from plasmid clones (average read length 500 bases) and 1,850 sequences from λ-clones. Both plasmid and λ-sequences were jointly assembled using TIGR assembler41, resulting in 152 contigs separated by sequence gaps and five groups of contigs separated by physical gaps. Sequences from both ends of 560 λ-clones served as a genome scaffold, verifying the orientation, order and integrity and the contigs. Sequence gaps were closed by editing the ends of sequence traces and/or primer walking on plasmid or λ-clones clones spanning the respective gap. Physical gaps were closed by combinatorial polymerase chain reaction (PCR) followed by sequencing of the PCR product. At the end of gap closure, 90 regions representing 0.33% of the genome had only single-sequence coverage. These regions were confirmed with terminator reactions to ensure a minimum of 2-fold sequence coverage for the whole genome. The final genome sequence is based on 29,642 sequences, with a 6.8-fold sequence coverage. The linkage between the terminal sequences of 2,101 clones from the small-insert plasmid library (average size 1,419 bp) and 8,726 clones from the medium-insert plasmid library (average size 2,954 bp) supported the genome scaffold formed by the λ-clones (average size 16,381 bp), with 96.9% of the genome covered by λ-clones. The reported sequence differs in 20 positions from the 14,389 bp of DNA in a total of 11 previously published A. fulgidus genes.

ORF prediction and gene family identification. Coding regions (ORFs) were identified using a combination strategy based on two programs. Initial sets of ORFs were derived with GeneSmith (H.O.S., unpublished), a program that evaluates ORF length, separation and overlap between ORFs, and with CRITICA (J.H.B. & G.J.O., unpublished), a coding region identification tool using comparative analysis. The two largely overlapping sets of ORFs were merged into one joint set containing all members of both initial sets. ORFs were searched against a non-redundant protein database using BLASTX10 and those shorter than 30 codons ‘coding’ for proteins without a database match were eliminated. Frameshifts were detected and corrected where appropriate as described previously40. Remaining frameshifts are considered authentic and corresponding regions were annotated as ‘authentic frameshift’. In total, 527 hidden Markov models, based upon conserved protein families (PFAM version 2.0), were searched with HMMER to determine ORF membership in families and superfamilies42. Families of paralogous genes were constructed as described previously40. TopPred43 was used to identify membrane-spanning domains in proteins.