Main

Ralstonia eutropha H16 is a Gram-negative lithoautotrophic bacterium belonging to the β-subclass of the Proteobacteria. It is a ubiquitous inhabitant of soil and freshwater biotopes and is well adapted to life in habitats subject to transient anoxia1. One of the keys to the organism's lifestyle is its ability to use—alternatively or concomitantly—both organic compounds and molecular H2 as sources of energy (Fig. 1). R. eutropha H16 can make use of the transiently available supplies of H2 arising, for example, from the metabolic activity of N2-fixing microbes, because it is equipped with two energy-conserving hydrogenases2. These NiFe metalloproteins catalyze the oxidation of H2, providing the organism with energy and reductant. In the absence of environmental O2, R. eutropha H16 can switch to anaerobic respiration; a complete denitrification pathway allows the organism to exploit alternative electron acceptors such as NO3 and NO2. Correlated with the flexible bioenergetics of the organism is its capacity to shift between heterotrophic and autotrophic growth modes. It fixes CO2 via the Calvin-Benson-Bassham (CBB) cycle3. In addition, the bacterium can stockpile organic carbon in the form of poly[R-(–)-3-hydroxybutyrate] (PHB) in specialized storage granules4. This represents an adaptation to fluctuating O2 levels, because PHB granules are formed whenever an abundance of carbon is available, but other factors such as O2, bound nitrogen or phosphate are growth-limiting.

Figure 1: The main growth modes of R. eutropha H16.
figure 1

Schematic representation illustrating the key aspects of lithoautotrophic and heterotrophic metabolism. The yellow circles represent the processes of central metabolism, whereas the yellow/green circle is the Calvin-Benson-Bassham cycle. The red squares symbolize the two energy-conserving hydrogenases. The gray circles indicate polyhydroxyalkanoate (PHA) storage granules.

Aside from its biological significance, this versatile, nonpathogenic organism is also of biotechnological interest. Perhaps the best known application of R. eutropha strains is the commercial production of the biodegradable thermoplastic Biopol5,6,7 (http://www.metabolix.com). Production of biomolecules labeled with stable isotopes has been carried out via lithoautotrophic fermentation of R. eutropha H16 (ref. 8). Whether cultivated under lithoautotrophic or heterotrophic conditions, R. eutropha H16 reaches high cell densities (200 g dry weight per liter)6,9. Very recently an investigation demonstrated the potential application of the O2-tolerant, CO-resistant, membrane-bound hydrogenase of R. eutropha H16 for the construction of biological fuel cells10,11 and for designing light-driven H2 production complexes12. Another promising experimental approach documented the usefulness of R. eutropha H16 hydrogenase for the construction of an H2-sensing device13.

As a basis for future studies we undertook the sequencing and annotation of the tripartite R. eutropha H16 genome. The sequence of the smallest of the three replicons, the megaplasmid pHG1, has been reported14. Here we present the sequence analysis of the two main chromosomes.

Results

Organization and general features of the genome

The genome of R. eutropha H16 consists of three circular replicons: chromosome 1 (4,052,032 bp), chromosome 2 (2,912,490 bp) and megaplasmid pHG1 (452,156 bp), adding up to a total size of 7,416,678 bp (Supplementary Fig. 1 online). The number and size of the genomic replicons is in agreement with the physical mapping data obtained by pulsed-field electrophoresis15. General features of the three replicons are listed in Table 1. The two chromosomes have an almost identical G+C content and nearly the same proportion of coding regions. Both parameters differ from the corresponding values for pHG1. A cumulative GC skew analysis of chromosome 1 pointed out a region with a typical 'backbone' configuration suggestive of an origin of replication: a dnaA ortholog with an adjacent consensus DnaA-binding site is flanked by the genes dnaN and gyrB. A cumulative GC skew analysis of chromosome 2 indicated an inflection point in the vicinity of a repA gene. Eight direct repeats are present in the 5′-flanking region of repA, representing typical iteron-like repeats that may be involved in RepA binding. Immediately adjacent to repA are orthologs of parA and parB for a plasmidlike partitioning system. Thus, the putative chromosome 2 origin has plasmidlike characteristics (for details see Supplementary Fig. 1). The R. eutropha H16 genome contains 59 transfer RNA (tRNA) genes. Of these 59 genes, 51, a complete set representing all possible codons, are carried on chromosome 1. Seven duplicates are located on chromosome 2 and one on pHG1. Five complete ribosomal RNA (rRNA) operons were identified in the sequence. Three of these operons are on chromosome 1 and two on chromosome 2 (Table 1).

Table 1 General features of the Ralstonia eutropha H16 genome

A minimum of 32 coding sequences (CDSs) on the two chromosomes resemble transposase genes. Some of these are contained in complete insertion sequence elements or in remnants thereof. The genome of R. eutropha H16 contains 14 superficially intact insertion elements. Seven of these are located on megaplasmid pHG1. The latter also carries an extensive 'junkyard' region encompassing 39 remnants of transposases and phage-type integrases14. No comparable region was found on the chromosomes. Three copies of the insertion sequence ISAE1 are located on chromosome 1 and the other two on pHG1 (ref. 16). One of the plasmid-borne copies of ISAE1 and one chromosomal copy create in-frame fusions of ORF2 of the mobile element and a CDS at the target site. In the case of the chromosome 1 insertion, this is a leucyl-tRNA synthetase gene (H16_A3139). Since additional leucyl-tRNA synthetase genes are not found in the genome, a putative fusion protein (with >400 extra amino acid residues at the N terminus) must have leucyl-tRNA–charging activity.

Analysis of the distribution of genes representing major functional categories (Supplementary Fig. 1) revealed that chromosome 1 encodes most key functions of DNA replication, transcription and translation, including the ribosomal proteins. Chromosome 2 harbors genes for central steps of the 2-keto-3-deoxy-6-phosphogluconate (KDPG) sugar and sugar acid catabolism path, the decomposition of aromatic compounds and the utilization of alternative nitrogen sources.

Genes for the synthesis of cell appendices and for chemotaxis proteins are located on all three replicons of R. eutropha H16. Biosynthesis of flagella and the corresponding functions for chemotaxis are encoded by 72 genes arranged in four clusters on chromosome 2. Genes for 10 methyl-accepting sensor proteins were found on chromosome 2, including one gene for a putative aerotaxis sensor. Three additional putative sensor proteins are encoded on chromosome 1. Synthesis and secretion of type IV pili, necessary for twitching motility, are encoded by 41 genes on chromosome 1 and pHG1. Genes for Flp-like pili, which are responsible for tight nonspecific adherence to surfaces, were identified in two clusters on chromosome 1 and one cluster on chromosome 2 (Supplementary Table 1 online).

Lithoautotrophic and organoautotrophic metabolism

Autotrophic fixation of CO2 in R. eutropha H16 proceeds via the CBB cycle. Ribulose-1,5-bisphosphate carboxylase/oxygenase and the other enzymes of this pathway are encoded in duplicate cbb operons3. One set of cbb genes (PHG416-428, cbbp) maps on the megaplasmid pHG1, and the second (H16_B1384-1396, cbbc) is located on chromosome 2. A regulatory gene cbbR, encoding a transcriptional activator, belongs to the chromosomal locus. A plasmid-borne copy of cbbR is present but defective, so the chromosomal copy controls the coordinate expression of both operons17. Two substrates supply energy for autotrophic CO2 fixation. H2 is oxidized by two NiFe hydrogenases that are encoded together with accessory and regulatory functions on pHG1 (ref. 14). Alternatively, formate is used as an energy source for organoautotrophic growth18. The R. eutropha H16 genome encodes at least four different formate dehydrogenases (Supplementary Fig. 2 online). A soluble, NAD+-reducing, molybdenum-containing enzyme has been characterized19. This molybdoenzyme is encoded by the fds cluster on chromosome 1 (H16_A0639-0644). The genomic sequence revealed determinants for three other membrane-bound formate dehydrogenases. Indeed, three distinct membrane-bound activities have been detected in R. eutropha H16 (ref. 20). The three additional formate dehydrogenases are exported across the cytoplasmic membrane by the twin-arginine-translocator (TAT) system and are bound to the periplasmic site of the membrane. They are encoded by the fdh genes (H16_A2932-2937) on chromosome 1 and the fdo (H16_B1452-1455) and fdw genes (H16_B1700-1701), respectively, on chromosome 2. The latter enzyme is likely to be a tungsten-containing formate dehydrogenase.

Heterotrophic carbon metabolism

As a facultative lithoautotroph R. eutropha is able to use various organic carbon and energy sources for heterotrophic growth. Typical substrates permitting high specific growth rates include tricarboxylic acid cycle intermediates, sugar acids like gluconic acid, fatty acids or other organic acids and amino acids; various alcohols and polyols also support growth21. The genes coding for organic acid metabolism are located on chromosome 1. The ability of strain H16 to metabolize sugars is restricted to fructose and N-acetylglucosamine21. Fructose is probably imported by an ABC-type transporter (H16_B1498-1500), and catabolized via the Entner-Doudoroff (KDPG) pathway. The corresponding genes are located on chromosome 2 within three gene clusters coding for enzymes for the degradation of fructose, glucose, 2-ketogluconate and glucosaminate (see Supplementary Fig. 2 for details). Previous biochemical analysis have failed to detect activities of the key enzymes of the Embden-Meyerhoff-Parnas and the oxidative pentose phosphate pathways, phosphofructokinase and 6-phosphogluconate dehydrogenase, respectively, in strain H16 (ref. 22). In agreement with these findings the respective genes were not identified in the R. eutropha H16 genome. Nevertheless genes for an anabolically operating Embden-Meyerhoff-Parnas pathway (gluconeogenesis) are present in R. eutropha H16 and scattered on chromosome 1 (Supplementary Fig. 2). A large set of genes located on chromosome 1 determines the utilization of the amino sugar N-acetylglucosamine, whose uptake is probably mediated by a phosphotransferase-type transport system (H16_A0311-0316).

Aerobic and anaerobic respiration

As expected for a strictly respiratory organism dwelling in an environment with fluctuating O2 levels, R. eutropha H16 can draw on an extensive inventory of genes for respiratory chain components. Key determinants of aerobic energy metabolism include a cluster of 14 genes located on chromosome 1 (H16_A1050-1063) encoding a typical NADH dehydrogenase. All of these genes are highly conserved and share similarities to the closest relatives of R. eutropha H16 (R. solanacearum GMI100023, Burkholderia mallei24, Burkholderia pseudomallei25, R. eutropha JMP134 and R. metallidurans CH34; for the latter two genomic sequences see http://www.jgi.doe.gov). An alternative type 2 NADH dehydrogenase (H16_A2740) might play a role in supplying electrons to the respiratory chain during growth of the cells on highly reduced substrates under aerobic conditions. The genes for succinate dehydrogenase (H16_A2629-2632) and the cytochrome bc1 complex (H16_A3396-3398) were identified on the same replicon. Strain H16 is equipped with genes coding for eight distinct terminal oxidases (details given in Supplementary Fig. 2). Two gene clusters for cytochrome c oxidases of the heme-copper oxidase superfamily are located on chromosome 1 and chromosome 2, respectively. The gene cluster on chromosome 1 includes genes for heme o and heme a synthases, suggesting that an aa3-type oxidase is formed. Chromosome 1 also carries genes for a cbb3-type cytochrome oxidase (H16_A2315-2320) and two bo3-type quinol oxidases (H16_A1071-1074, H16_A1640-1643). An additional bo3 quinol oxidase as well as two alternative bd-type quinol oxidases are encoded on chromosome 2.

Under anoxic conditions R. eutropha H16 forms a complete denitrification pathway. Context analyses of the relevant gene regions revealed a unique distribution of genes for key enzymes and transcriptional regulators of the denitrification pathway and alternative enzymes involved in heme and deoxynucleotide synthesis under anaerobic conditions (Supplementary Fig. 2). Nearly identical copies of nitrate reductase and nitric oxide reductase genes are present on chromosome 2 and pHG1 (refs. 26,27). Three putative transcriptional regulators (Fnr, DnrD, NarL), which may be involved in the control of anaerobic metabolism, are encoded in the pHG1-borne nitrate reductase cluster. These genes are missing in the chromosomal nitrate reductase gene cluster. On the other hand, cytochrome cd1 nitrite reductase and nitrous oxide reductase are encoded only on pHG1.Footnote 1 In addition to the earlier reported flavohemoglobin gene located on pHG1 (ref. 14), a second flavohemoglobin gene (hmp, H16A_3533) was identified in the R. eutropha H16 genome on chromosome 1. The latter is probably regulated by a colocalized NsrR-like transcriptional regulator (Supplementary Fig. 2).

In the context of anaerobic metabolism, the genes for ribonucleotide reductase (RNR) deserve mention. RNRs provide deoxyribonucleotides by reduction of ribonucleotides and thus are essential for DNA synthesis. Three classes of RNRs are known, roughly classifiable as 'aerobic (NrdAB or NrdEF)', 'coenzyme B12–dependent (NrdJ)' and 'anaerobic (NrdDG)' enzymes. Comparison with the ribonucleotide reductase database (http://rnrdb.molbio.su.se) revealed that strain H16 is one of only a few prokaryotes that encodes copies of all three classes of RNR: One aerobic class I ribonucleotide reductase (nrdAB) on chromosome 1, one anaerobic class III enzyme (nrdG and nrdD) on pHG1 (ref. 28) and one coenzyme B12–dependent class II enzyme (nrdJ) on chromosome 1. Three other coenzyme B12–dependent enzymes are encoded in the H16 genome: ethanolamine ammonium lyase (H16_B0096-0097) on chromosome 2, methylmalonyl CoA mutase (H16_B1841-1842) on chromosome 2 and methionine synthase (H16_A0151) on chromosome 1. R. eutropha H16 is unable to synthesize coenzyme B12 de novo. Nevertheless, the cells are able to assimilate cobalamin and its precursors. The 12 genes necessary for uptake and assimilation of cobalamin are clustered on chromosome 1 (H16_A2961-H16_A2972; Supplementary Fig. 2). R. eutropha H16 is able to bypass coenzyme B12–dependent reactions. It catabolizes, for example, propionyl-CoA via the methylcitric acid cycle instead of the methylmalonyl-CoA pathway29.

Aside from coenzyme B12, R. eutropha H16 contains genes for the de novo synthesis of all essential cofactors. The reconstructed pathways for the biosynthesis of biotin, thiamine, riboflavin, NAD+ and genes for the assembly of Fe-S clusters are presented in the Supplementary Figure 2.

Transport

The R. eutropha H16 genome encodes a multitude of different transport systems (Supplementary Table 2 online). In this context the numerous representatives of the extracytoplasmic solute-binding proteins deserves special mention. Extracytoplasmic solute-binding proteins were first described as periplasmic binding proteins of a new family of transporters called tripartite tricarboxylate transporter (TC 2.A.80). Subsequently, numerous orthologs, known as 'Bug receptors', have been identified in the genomes of various β-subclass proteobacteria, for example, Bordetella pertussis30. Notably, the majority of these genes are not linked to transporter genes.

A total of 154 representatives of this family were identified in the R. eutropha H16 genome (See Supplementary Table 3 online). With two exceptions, all members of this group appear to be periplasmic proteins. About one-fourth (21.4%) have a putative TAT leader sequence predicted by the TatP algorithm (http://www.cbs.dtu.dk). Context analyses showed that a majority (100 CDSs; 64.1%) of the tripartite tricarboxylate transporter extracytoplasmic solute-binding proteins in the H16 genome are located immediately adjacent to (42.3%) or one to two CDSs away from (21.8%) the genes for transcriptional regulators. This suggests a regulatory role for the extracytoplasmic solute-binding proteins. In the case of the B. pertussis Bug protein BctC, it has been shown that citrate-liganded BctC regulates expression of citrate uptake genes by interacting with a two-component regulatory system31.

Polyester synthesis

Like many bacteria, R. eutropha H16 can accumulate polyhydroxyalkanoates (PHAs) such as PHB in intracellular storage granules. Since the initial publication by Schlegel and co-workers describing enzymes involved in the synthesis of PHB in R. eutropha H16 (ref. 32), a tremendous amount of work has been done on this subject, making R. eutropha the model organism for the investigation of microbial polyoxoester production. Many key components of PHA metabolism have been identified in previous studies. The analysis of the R. eutropha H16 genome has added to the list of known genes, providing a comprehensive view of PHA metabolism in R. eutropha (Fig. 2).* The key players in PHB synthesis are (i) PHB synthase (phaC1), which catalyzes the polymerization of R-(–)-3-hydroxybutyryl-CoA, (ii) β-ketothiolase (phaA), which condenses two molecules of acetyl-CoA to acetoacetyl-CoA and (iii) NADPH-dependent acetoacetyl-CoA reductase (phaB1, phaB2 and phaB3), which reduces acetoacetyl-CoA to R-(–)-3-hydroxybutyryl-CoA. A surprising finding of the genome analysis was the existence of a second gene for a PHB synthase, phaC2. Isologs of phaA (for example, bktB), which encode alternative β-ketothiolases with differing substrate spectra, have been reported33. The genomic sequence of R. eutropha H16 revealed 37 additional phaA isologs. In addition, 15 phaB isologs were identified. The contribution of the respective gene products to PHB synthesis remains to be investigated. Four phasin genes have been identified: phaP1, phaP2, phaP3 and phaP4, all of which are expressed34. The phasin proteins build up a layer at the surface of the PHB granules, inhibiting their coalescence and thus regulating their number and size. Phasins may also play a role in the mobilization of the storage polymer.

Figure 2: Overview of polyhydroxybutyrate (PHB) metabolism in R. eutropha H16.
figure 2

The light gray field represents a PHB storage granule inside a bacterial cell. Arrows indicate the major enzymatic reactions of PHB synthesis and degradation. The boxes, circles and other shapes symbolize the enzymes and proteins involved directly in PHB metabolism, whereas stacked symbols indicate the existence of multiple alleles in the genome. Symbols outlined in blue represent proteins known to be associated with the PHB granule. See text for details.

Mobilization of PHB during carbon starvation requires specific enzymes for the depolymerization of PHB35. Seven genes in the R. eutropha H16 genome encode PHB depolymerase isoenzymes. The genes phaZ1 and phaZ2 are located on chromosome 1, phaZ3, phaZ5, phaZ6 and phaZ7 on chromosome 2 and phaZ4 on pHG1. In addition, two PHB-oligomer hydrolases, which degrade the trimeric products of the PHB depolymerases, are encoded on chromosome 2 (phaY1 and phaY2)36. The foregoing data and the results of systematic protein studies sketch a fascinating picture of the PHB granule as a complex and dynamic organelle34.

Biodegradation

Traditionally, strains of the genus Ralstonia have played a key role in research on microbial degradation of aromatic compounds. Although R. eutropha strain JMP134 was used in several of these studies, it has been known for many years that R. eutropha H16 can grow on a similar spectrum of aromatic compounds37. It is therefore not surprising that the genomic sequence contains an impressive array of genes related to the metabolism of aromatics. Details are given in Supplementary Figure 3 online.

Toxin genes

A number of genes encoding potential toxins are encoded in the genome (Supplementary Table 1). H16_B1353 might encode an insecticidal toxin similar to those found in Photorhabdus luminescens38. This putative toxin (3,406 amino acid residues, 374.2 kDa) is encoded by the largest CDS in the R. eutropha H16 genome. Two gene clusters for repeat-in-toxin type toxins are present on chromosome 2. The tolC gene, which is essential for toxin secretion, is present in both clusters but is defective in one. Two additional tolC-like genes are located on chromosome 1 (H16_A2296, H16_A3731). A putative hemagglutinin/adhesin gene cluster (fhaBC; H16_B0247-0248) was identified on chromosome 2. R. eutropha H16 has never been recognized as a human, animal or plant pathogen, and therefore has good potential for use in production processes.

Discussion

With a total size of 7,416,677 bp, R. eutropha H16 possesses one of the larger bacterial genomes sequenced to date. The genomes of the other sequenced members of the family Burkholderiaceae range between 5.8 and 8.6 Mbp23,24,25. Comparisons of the genomes of R. eutropha H16 and its close relatives reveal some important organizational similarities. A common feature of these organisms is a bipartite genome consisting of two chromosome-sized replicons. In addition, one or more plasmids may be present. Analysis of synteny for the large chromosomes showed a considerable degree of conservation of gene order (Supplementary Fig. 4 online). In contrast, gene order was much less conserved among the small chromosomes. Reciprocal BLAST analyses revealed that chromosome 1 of R. eutropha H16 shares 76% and 54% orthologous genes with the corresponding replicons of R. eutropha JMP134 and B. pseudomallei, respectively. Chromosomes 2 of the two R. eutropha strains still have 58% orthologs in common, whereas the small chromosome of B. pseudomallei only shares 29% orthologs with that of R. eutropha H16.

Genes for H2 oxidation and CO2 fixation are present in R. eutropha H16 but not in R. eutropha JMP134 and R. solanacearum GM1000. The latter two organisms do not grow litho- or organoautotrophically39. In addition, genes for a nonribosomal polypeptide synthetase and putative repeat-in-toxin type toxin gene clusters are present in R. eutropha H16 but not in the other Ralstonia strains.

In several respects the R. eutropha H16 genome is typical of free-living bacteria. It contains 690 genes for potential regulatory and 83 genes for additional signaling proteins (Supplementary Table 4 online). This represents 11.7% of its genes and is comparable to the statistics for Pseudomonas putida KT2440 and Streptomyces coelicolor A3(2). A large repertory of regulatory genes is not surprising in an organism that must respond to the changing conditions of a highly variable habitat. Another characteristic of the soil biotope is its complex content of organic substances. A typical adaptation to this spectrum of diverse substrates is the large inventory of transport systems found in soil bacteria. A total of 832 CDS (12.3%) in the R. eutropha genome encode transport proteins (Supplementary Table 2). This is significantly higher than the average found in Eubacteria (9.2%) or Archaea (6.7%)40. An notable feature of the R. eutropha H16 genome is the scattering of genetic information. Genes for many metabolic processes (for example, denitrification) are distributed on all three replicons.

The enzymatic machinery for the synthesis of PHAs is robust and tunable; the composition and, hence, the physicochemical properties of the resulting polymer can be modified over a wide range by varying the carbon substrates in the feedstock7. The commercially produced Biopol, a 3-hydroxybutyrate/3-hydroxyvalerate copolymer, is one of >140 different PHAs described thus far. Recently it was discovered that R. eutropha H16 can, if fed with 3-mercaptopropionate or other organic thiochemicals, produce polythioesters, a novel class of bioplastics41,42. The variety of polymers that can be produced by this system is dependent on, among other things, the substrate spectra of the biosynthetic enzymes. The genome sequence revealed the presence of 38 candidate genes for β-ketothiolases and 15 candidate genes for acetoacetyl-CoA reductases, pointing to a potential for the production of many other polymer types. This information may lead to novel PHAs containing substantial amounts of unusual building blocks such as 3-hyroxypropionic acid, 4-hydroxybutyric acid, 3-hydroxyvaleric acid, 4-hydroxyvaleric acid, 2-methyl-3-hydroxyvaleric acid, and 3-hydroxy-4-pentenoic acid. This is just one of the areas in which the biotechnological potential of R. eutropha remains to be further exploited. Moreover, the availability of the R. eutropha H16 genome sequence opens new perspectives for H2-based bioprocesses.

Methods

The R. eutropha H16 (DSM 428, ATCC 17699) genome sequence was determined by a whole-genome shotgun approach. Shotgun sequencing was done by Integrated Genomics. A plasmid library of 67,200 clones containing inserts of approximately 2.5 kilobases was generated in the vector pGEM-3Z. The vector Lorist6 was used to generate a cosmid library of 2,688 clones with insert sizes of 25–45 kilobases. All clones were sequenced using dye-terminator chemistry on ABI PRISM 3700 (Applied Biosystems) and MegaBACE 1000 (GE Healthcare) analyzers. Then, 101,060 reads were processed with Phred and assembled into contigs using the Phrap assembly tool (P. Green, University of Washington, Seattle; http://www.phrap.org). Assembly was guided by separate sequence data sets obtained from preparations of chromosome 1 and chromosome 2 isolated via pulsed-field gel electrophoresis as follows: Cassettes containing a homing site for the intron-encoded endonuclease I-SceI and an antibiotic resistance gene (kanamycin, gentamicin) were introduced into the structural genes for oxygen-independent coproporphyrinogen III oxidase (hemN) on chromosome 1 and nitric oxide reductase (norB2) on chromososme 2 as described15. Agarose-embedded genomic DNA was digested with I-SceI, subsequently separated by contour-clamped homogeneous field (CHEF; BioRad) electrophoresis and eluted from CHEF gels. The DNA was amplified and sequenced by GenomiPhi DNA Amplification Kit (GE Healthcare). Primer walking and PCR in conjunction with a cosmid library were used to close remaining gaps and to resolve assembly errors due to repetitive sequences. Sequence editing was done in GAP4 (ref. 43). Automatic CDS prediction was done using YACOP44 and manually compared with a FrameD prediction45. The ERGO software suite46 (Integrated Genomics) was used in the first phase of this study to obtain a preliminary automated annotation. All annotations were edited manually using the GeneSOAP annotation workbench14 written by one of us (R.C.) and custom Perl scripts written by another of us (F.R.). GeneSOAP is a proprietary, Windows-based front-end software, which integrates a panel of established search and analysis tools including BLAST, FrameD, SMART and KEGG. The GenBank nonredundant protein database was the main database used for annotation. Annotations were checked by searches against curated databases including PFAM, PROSITE, PRODOM and COGS. Significant database hits were added to the note section of the GenBank entry of the corresponding CDS. Specialized databases such as the TCDB (http://www.tcdb.org) were used to classify functional groups of proteins. Context analysis and pathway reconstruction were done using the SEED47 and KEGG resources. Putative frameshifts were checked and corrected manually. Genomic comparisons were carried out by bidirectional BLAST comparisons of whole-genome protein databases. Insertion sequences were located by computer-based and visual inspections. The flanking sequences of CDSs with similarity to transposases were analyzed for direct and inverted repeats. Global searches were done with the help of the online tool IS FINDER (http://www-is.biotoul.fr). Circular GC skew plots were created using the program GenomeViz48.

The complete genome sequence of R. eutropha genome is available under EMBL accession numbers AM260479 (chromosome 1), AM260480 (chromsosome 2) and AY305378 (megaplasmid pHG1).

Note: Supplementary information is available on the Nature Biotechnology website.