Main

Growing environmental concerns over the use and depletion of nonrenewable energy resources, together with the recent price increases and instabilities in the international oil markets have stimulated an increasing interest in the use of fermentation processes for the large-scale production of alternative fuels such as ethanol. As such, ethanol-producing microorganisms, such as the Gram-negative bacterium Z. mobilis, have potential for the production of fuel ethanol.

Z. mobilis, which is used in the tropics to produce pulque and alcoholic palm wines, uses the Entner-Doudoroff (ED) pathway to metabolize glucose, which results in only 1 mole of ATP being produced per mole of glucose1. The potential advantages of using Z. mobilis for ethanol production include: (i) its high and specific rates of sugar uptake and ethanol production, (ii) its production of ethanol at yields close to the theoretical maximum with relatively low biomass formation, (iii) its high ethanol tolerance of up to 16% (vol/vol) and (iv) its facility for genetic manipulation2,3,4,5,6. However, wild strains of Z. mobilis can use only glucose, fructose and sucrose as carbon substrates, so recent research has focused on the development of recombinant strains capable of using pentose sugars7,8 for the conversion of cheaper lignocellulosic hydrolysates to ethanol. Improved mutants9,10,11 as well as the application of metabolic flux analysis, site-directed mutagenesis, specific gene deletion/insertion and metabolic engineering for strain developlment12,13 have also been reported. A physical map of Z. mobilis ZM4 genome and the ribosomal transcriptional unit have been previously reported14,15. In the current paper, the features of the complete sequence of the Z. mobilis ZM4 genome are presented and genomic characters are compared with those of another Z. mobilis strain, ZM1.

Results

General features

The complete genome of Z. mobilis ZM4 consists of a single circular chromosome of 2,056,416 bp with an average G+C content of 46.33% (Table 1 and Supplementary Table 1 online). The 1,998 predicted coding ORFs cover 87% of the genome, and each ORF has an average length of 898 bp. Among these, 1,346 (67.4%) could be assigned putative functions, 258 (12.9%) were matched to conserved hypothetical coding sequences of unknown function and the remaining 394 (19.7%) showed no similarities to known genes. The functions of the predicted ORFs were categorized by comparison with the COG database (Table 2).

Table 1 General features of the Z. mobilis genome
Table 2 Functional categories of predicted genes in Z. mobilis genome

Of the 0.84% of the genome that encodes stable RNA, 51 genes encode transfer RNAs, corresponding to 42 different isoacceptor-tRNA species. Of these ribosomal RNA transcriptional units, rrnA is located at coordinate 140,000, rrnB at 360,000 and rrnC at 520,000, all three being transcribed in the same predicted direction of replication.

The replication origin predicted by calculating GC skew (G−C/G+C) values16 (Fig. 1) closely coincided with a 656-bp region containing one copy of a likely site (5′-GATCTNTTNTTTT-3′) for initial DNA unwinding, and eight copies of probable sites (5′-TTATNCANA-3′) for DnaA binding. We also found that genes such as parA and parB, which are involved in chromosome partitioning, and gidA and gidB, the glucose-inhibited division genes, were also located near the origin, which has often been observed in other bacterial genomes17.

Figure 1: Overall features of the Z. mobilis ZM4 genome.
figure 1

The putative origin of replication is around 0 kb. The outer scale indicates the coordinates in base pairs. The distribution of genes is shown on the first two rings within the scale according to the direction of the reading frame. The locations of rRNA and tRNA genes are shown by green dots and black dots, respectively. Putative transposases are shown by red dots. The next circle shows GC content values. Cyan and green colors indicate positive and negative signs, respectively. The central circle shows GC-skew values (G−C/G+C) of the third bases of codons measured over the genome. Yellow and purple colors denote positive and negative signs, respectively. The window size was 10,000 nucleotides and the step size was 1,000 nucleotides.

Comparison with other sequenced genomes

Comparison of the Z. mobilis ZM4 ORFs (amino acid sequences) with those of other organisms revealed that 768 out of 1,668 ORFs listed in the COG database have the closest similarity to the corresponding ORFs of Novosphingobium aromaticivorans (Supplementary Table 2 online) in line with a previous phylogenetic study on Z. mobilis ZM4 based on the 16S ribosomal RNA sequence, where it was found that Z. mobilis ZM4 belonged to the Sphingomonas spp. group15. In particular, the ORFs classified into COG category J (translation, ribosomal structure and biogenesis) and category D (cell division and chromosome partitioning) showed high similarities to N. aromaticivorans. In contrast, only 2 out of 40 total ORFs classified into the COG category N (cell motility) and 5 out of 25 in category V (defense mechanisms) matched ORFs of N. aromaticivorans.

General metabolism

Z. mobilis uses glucose, fructose and sucrose anaerobically through the ED pathway, leading to the production of ethanol and CO2 (ref. 1). Analysis of the Z. mobilis genome sequence revealed the determinants of hexose-metabolizing enzymes such as invertase (ZMO0375, ZMO0942), levansucrase (ZMO0374), glucokinase (ZMO0369), glucose-6-phosphate isomerase (ZMO1212) and glucose-fructose oxidoreductase (ZMO0689) that would enable Z. mobilis to use sucrose, fructose and glucose as well as probably mannose, raffinose and sorbitol. However, there are no obvious genes for using lactose, maltose or cellobiose.

In the ED pathway, glucose-6-phosphate dehydrogenase (zwf, ZMO0367) oxidizes glucose-6-phosphate to 6-phosphonolactone. The lactone is dehydrated to 6-phosphogluconate by lactonase (ZMO1478). 6-phosphogluconate is dehydrated by 6-phosphogluconate dehydratase (edd, ZMO0368) to yield 2-keto-3-deoxy-6-phosphogluconate (KDPG). KDPG aldolase (eda, ZMO0997) cleaves KDPG to form pyruvate and glyceraldehyde-3-phosphate (Fig. 2). Glyceraldehyde-3-phosphate is then metabolized via the triose phosphate common to the Embden-Meyerhof-Parnass (EMP) pathway to yield ethanol and carbon dioxide. All the genes for all of the enzymes of the EMP pathway except 6-phosphofructokinase are present in Z. mobilis (Fig. 2). The zwf and edd genes are clustered with glf (ZMO0366; encodes facilitated diffusion protein for glucose) and glk (ZMO0369; glucokinase), whereas eda is separately located. This contrasts with Escherichia coli, in which zwf, edd and eda are closely linked although regulation of the zwf and edd-eda operon is independent17. By using the ED pathway instead of the EMP pathway, Z. mobilis yields only 1 mole of ATP per mole of fermented hexose, and produces ethanol at a theoretical yield of 2 moles/mole of substrate. Rapid production and high yield of ethanol as the only sugar fermentation product can be attributed to the presence of pyruvate decarboxylase (ZMO1360), an enzyme not frequently observed in bacteria, and two highly specific alcohol dehydrogenases (ZMO1236, ZMO1596).

Figure 2: Central metabolic pathways of sugars.
figure 2

Enzymes missing from Z. mobilis are represented by red dotted arrows.

The genes encoding two enzymes in the tricarboxylic acid cycle—the 2-oxoglutarate dehydrogenase complex and malate dehydrogenase—were not found. However, all the key building blocks, including oxaloacetate, malate, fumarate and succinate have been detected by means of high-performance liquid chromatography, and Z. mobilis is known to be able to synthesize all essential amino acids except for lysine and methionine. These results strongly indicate that other metabolic pathways are involved in producing oxaloacetate, malate, fumarate and succinate. Oxaloacetate can be produced from phosphoenolpyruvate and CO2 by phosphoenolpyruvate carboxylase (ZMO1496) or citrate lyase (ZMO0487: citrate ↔ oxaloacetate + acetate). Malate can be synthesized by pyruvate carboxylation with malic enzyme (ZMO1955). Fumarate can be produced by fumarate dehydratase (ZMO1307). However, evidence for an alternative metabolic pathway for succinate production, such as the glyoxylate cycle, has not yet been found.

Although most genes for the pentose phosphate pathway are missing, all genes encoding enzymes necessary for the synthesis of phosphoribosyl-pyrophosphate, a precursor for purine/pyrimidine metabolism, are present. We also identified all genes required for the de novo biosynthesis of RNA and DNA. Z. mobilis possesses a complete set of genes for the sulfate reduction pathway as well as all the genes required for the synthesis of all amino acids, except for one gene in the lysine (yfdZ) and one gene in the methionine (metB) pathways. For vitamins, all genes for riboflavin and folate synthesis and most genes for thiamin, ubiquinone, NAD+ and pyridoxal are present. The absence of genes for pantothenate and biotin biosynthesis genes is in accordance with the known nutritional requirement of Z. mobilis for these vitamins.

Transport systems and motility

We recognized 180 genes encoding transport-related membrane proteins, on the basis of a search of the Transport Protein Database (http://tcdb.ucsd.edu/index.php). The largest number (83) of these proteins were electrochemical potential-driven transporters (class 2), and included 20 involved in iron metabolism, 13 multi-drug resistance exporters, three members of the resistance nodulation cell-division family, eight permeases of the major facilitator superfamily, seven cation transporters, seven amino acid transporters, three nucleoside permeases and four sugar transporters. There are several ORFs for the sec-independent protein secretion pathway and others for the TonB-ExbB-ExbD/TolA-TolQ-TolR (TonB) family of auxiliary proteins for energization of outer membrane receptor–mediated active transport systems. The second most numerous class (55) contained primary active transporters (class 3), including 41 members of the ATP-binding cassette (ABC) transporter superfamily. There were five ORFs for the sec-dependent general secretory pathway, two for type III secretory pathway proteins and four for the type IV secretory pathway. The third largest class (14 members) was the channels/pores (class 1), consisting of five capsule polysaccharide export proteins and two carbohydrate (glucose)-facilitated diffusion proteins. The four remaining classes were group translocators (class 4; 4 ORFs), transport electron carriers (class 5; 3 ORFs), accessory factors involved in transport (class 8; 1 ORF) and incompletely characterized transport systems (class 9; 20 ORFs).

The flagellar cluster consists of 32 ORFs (ZMO0602–ZMO0652: flgABCDEFGHIJKL, flhAB, fliDEFGHIKLMNPQRS, motAB) encoding flagellar structure proteins, motor proteins and biosynthesis proteins. Classical chemotaxis signal transduction genes (cheABDRWY) and methyl-accepting chemotaxis genes (mcpAJ), similar to those in E. coli, were present.

Oxidative stress and respiration

Z. mobilis is not an obligatory but a facultative anaerobe, implying that there must be a defense mechanism against oxidative stress. The most well-known reduction-oxidation cycling machinery is the glutathione system. Both glutathione reductase (ZMO1211) and glutathione synthase (ZMO1913) are present, as well as a Gamma-glutamylcysteine synthetase (ZMO1556). Genes encoding a catalase (ZMO0928), an iron-dependent superoxide dismutase (Fe-SOD; ZMO1060) and two kinds of peroxidases (ZMO1136, ZMO1573), which are thought to be responsible for protection from the toxic effects of superoxide and hydrogen peroxide in most aerobic organisms, are also present.

In addition to the genes that respond to oxidative stress, the genome contained several genes related to the electron transport system such as the Fe-S-cluster redox enzyme (ZMO1032), cytochrome b (ZMO0957), cytochrome c1 (ZMO0958), cytochrome c-type biogenesis proteins (ZMO1252–1256), electron transfer flavoprotein (ZMO1479, ZMO1480) and a ubiquinone biosynthesis protein (ZMO1189, ZMO1669). Genes for electron donor and receptor modules such as NADH dehydrogenase (ZMO1113) NADH:flavin oxidoreductase (ZMO1885), NADH:ubiquinone oxidoreductase complex (ZMO1809–1814), nitroreductase (ZMO0678) and fumarate reductase (ZMO0569) were also found. However, genes for cytochrome o and cytochrome d, which use oxygen as a final electron acceptor, appeared to be absent.

It was reported that Z. mobilis has a respiratory electron transport chain19 and that it shows elevated molar growth yield during exponential aerobic growth20. Relative to anaerobic conditions, this leads to a decrease in the yield of ethanol and an accumulation of other less reduced metabolites such as acetaldehyde, acetone and acetate21,22. These results indicate that some NADH is oxidized in the respiratory chain with the simultaneous participation of the alcohol dehydrogenase reaction in aerobic culture conditions.

Stress adaptation

Protein denaturation and aggregation, resulting from exposure to heat or other stresses such as ethanol, are severe problems for cells, and are combated by induction of highly conserved heat shock proteins, whose function is to remove or refold the damaged cellular proteins23. Z. mobilis, an efficient ethanol producer, exhibits very high ethanol tolerance3. The Z. mobilis contains ORFs for the complete sets of heat shock–responsive molecular chaperones, such as DnaK (ZMO0660), DnaJ (ZMO0661, ZMO1069, ZMO1545, ZMO1546, ZMO1690) and GrpE (ZMO0016) of the HSP-70 chaperone complex, GroES (ZMO1928; HSP-10), GroEL (ZMO1929; HSP-60) and HSP-33 (ZMO0410). ATP-dependent heat shock–responsive proteases, such as HslVU (ZMO0246, ZMO0247) and Clp (ZMO0948, ZMO0949, ZMO1424), were also found. As in the well-known E. coli system23, genes for alternative sigma factors, sigma-32 (σ32; ZMO0749) and sigma-E (σE; ZMO1404), for the pertinent responses against various stresses are present. It is known that sigma-32 of E. coli induces a 'classic' set of chaperones, proteases and other heat shock proteins, thereby playing a central role in heat shock responses, whereas sigma-E induces periplasmic protease, chaperone and sigma-32 factor by specific extracytoplasmic stress. It is also well known that the induction of sigma-32 factor is turned on when E. coli cells grown at 30 °C are shifted to 42 °C, whereas proteins encoded by the sigma-E regulon are rapidly induced when E. coli cells are exposed to a more extreme temperature (e.g., 50 °C) or 10% ethanol23. We suppose that sigma-E plays a key role in resisting high ethanol conditions in Z. mobilis. We also found genes for a sigma-E positive regulator (ZMO1842) and a transcriptional regulator of heat shock genes (ZMO0015), two tight regulators of heat shock gene expression.

The appropriate controls of gene expression are carried out by a combination of basic transcriptional machineries, including RNA polymerase and sigma factors. Genes for other sigma factors, σ70 (rpoD; ZMO1623), σ54 (rpoN; ZMO0274), and σ28 (fliA; ZMO0626) were also found in the genome of Z. mobilis. We also identified 54 transcriptional activators and repressors.

Higher G+C-content genes found only in strain ZM4

To compare the Z. mobilis ZM4 genome with the unsequenced type strain (ZM1: ATCC10988) of Z. mobilis, labeled ZM1 and ZM4 genomic DNA were cohybridized with DNA microarrays containing probes for all the ORFs of Z. mobilis ZM4. It was found that most of the probes on the microarray hybridized equally with both labeled genomic DNAs (Fig. 3a). In addition, the two strains showed similar patterns of gene expression in microarray analysis of cultures grown under various growth stages (data not shown). Probably the overall genome structure of ZM1 and ZM4 is very similar.

Figure 3: Comparison of genome structure and expression profiles between Z. mobilis ZM1 and ZM4.
figure 3

(a) Cohybridization of cy3-labeled ZM1 genomic DNA (green) and cy5-labeled ZM4 genomic DNA (red) on a Z. mobilis microarray. Arrows indicate extra sequences in strain ZM4. (b) Cohybridization of cy3-labeled ZM1 cDNA (green) and cy5-labeled ZM4 cDNA (red) on a Z. mobilis microarray. Most ORFs in the extra sequence of strain ZM4 (same locations that arrows indicate on panel a) were actively expressed (red spot). RNAs were isolated at exponential growth phase.

However, it is interesting to note that strain ZM4 contains sequences that are absent from ZM1. These sequences consist of 54 genes that are clustered separately in five regions. Among the products of the 54 ORFs, there were four kinds of membrane transport proteins, and four kinds of proteins involved in a type IV secretory system, an oxidoreductase related to short chain alcohol dehydrogenase and several transcriptional regulators (Table 3). Two genes, bcbG (ZMO1299) and bcbE (ZMO1300), encoding capsular polysaccharide biosynthesis proteins, were also peculiar to strain ZM4. One of the five clusters, spanning from 1,984,100 nt to 2,009,434 nt (25.3 kb), contains 25 ORFs and shows a higher G+C content (61.0%) (Fig. 1) than the average (46.3%) for the full genome of ZM4. The 25.3-kb sequence contains some interesting ORFs: ZMO1930 for phage-related integrase, ZMO1941 for conjugal transfer TraF protein, ZMO1954 conjugal transfer TrbL protein, and ZMO1933 and ZMO1934 for type I restriction-modification enzyme S and M subunits, respectively.

Table 3 Complete list of additional 54 ORFs in ZM4

Most of the additional 54 ORFs in ZM4 were actively transcribed during the exponential growth phase, when ethanol is vigorously produced (Fig. 3b). Global expression profiles of the ZM1 and ZM4 strains were analyzed in a sample taken when half of the glucose (50 g/l) in the medium had been consumed and the data showed that a total of 294 ORFs were upregulated more than twofold in ZM4 compared to ZM1, whereas 153 ORFs were expressed more than twice in ZM1 (Supplementary Tables 3 and 4 online).

It has been reported that strain ZM4 is more tolerant of higher alcohol concentration than the type strain ZM1 and that ZM4 shows higher specific rates for growth, ethanol production and glucose uptake5,24. Perhaps some of the genes peculiar to ZM4 and actively expressed at the higher glucose concentration will prove to be good target genes for constructing recombinant strains that ferment ethanol with higher productivity.

Discussion

Analysis of the complete sequence of the Z. mobilis ZM4 genome reveals why this is one of the most powerful ethanol-producing microbes described, and suggests potential means to improve the yield and rate of ethanol production. Because Z. mobilis produces only one mole of ATP per mol of glucose via the ED pathway, Z. mobilis requires almost twice as much glucose as microbes that use the EMP pathway to produce equivalent amounts of ATP. The higher rate for glucose utilization and ethanol production are also supported by the fact that pyruvate decarboxylase and alcohol dehydrogenases are very highly expressed in Z. mobilis.

The absence of 6-phosphoglucokinase and the consequent dependence of Z. mobilis on the ED pathway raises interesting questions about the evolution of carbohydrate metabolism. The ED pathway is active in most Gram-negative bacteria and many other microorganisms including some archeabacteria25. The ubiquity of the ED pathway suggests that it is of far greater importance in nature than was previously recognized and indeed an essay on the evolution of glycolytic pathways suggested that the ED pathway predates the EMP pathway26. Although it is also possible that Z. mobilis is not able to use the EMP pathway as a result of the loss of the gene encoding 6-phosphoglucokinase, considering the genome size and relatively simple metabolic pathways present in Z. mobilis, it is more likely that the EMP pathway present in other microorganisms is the result of acquiring the 6-phosphoglucokinase gene.

The absence of two genes for the tricarboxylic acid cycle, the 2-oxoglutarate dehydrogenase complex and malate dehydrogenase, suggests the existence of alternative pathways to the tricarboxylic acid cycle. Because essential metabolites for cell growth are provided from the tricarboxylic acid cycle, this provides an explanation for the low biomass formation of Z. mobilis compared with other microorganisms in which the tricarboxylic acid cycle is actively operating5.

The observation that Z. mobilis ZM4 contains extra DNA sequences encoding for a total of 54 ORFs, compared to the genome of the type strain ZM1, raises questions about the origin as well as the role of these ORFs. Given that 25 ORFs in these high G+C-content DNA sequences show very high identity with some genes found in phages, and that there is little sequence homology with genes from other bacteria, the possibility exists that the higher G+C content of the additional DNA sequences may have been horizontally transferred from phages. Plasmid exchange is another possible route, because the 3-kb sequence in the additional DNA sequence exhibits substantial homologous regions with the sequence of Ralstonia solanacearum that encodes conjugal proteins TraF and TraL. Transposon-mediated gene transfer is also a possibility considering that the sequences encoding TraF and TraL are also homologous with Ralstonia oxalatica transposon Tn4371.

Among the 54 predicted ORFs, four ORFs that encode transport proteins or permeases, and two genes for NAD(P)H:quinone oxidoreductase (ZMO1949) and oxidoreductase (short-chain alcohol dehydrogenases; ZMO1946) were found to be very highly expressed. It is quite likely that these genes contribute to the higher rates of glucose uptake and ethanol production in the ZM4 strain. Two genes encoding capsular carbohydrate synthesis enzymes were also found to be actively expressed in the ZM4, and it is possible that they may contribute to resistance against osmotic pressure at the high concentration of glucose media and ethanol produced during fermentation. Thus, it is plausible that several of the characteristics of ZM4 that make it attractive as an ethanol producer may be attributable to DNA acquired comparatively recently.

Methods

Sequencing and assembly.

Genomic DNA from Z. mobilis ZM4 strain ATCC 31821 was sequenced using whole genome random shotgun methods27. Mechanically sheared 2-kb and 10-kb DNA fragments were isolated, inserted into pUC18 and cloned. Template preparation reactions were done using standard protocols. DNA sequencing reactions were carried out using PE BigDye Terminator chemistry, and sequencing ladders were analyzed on PE 3700 automated DNA sequencers. Approximately 40,000 reads with PHRED scores_20 were generated, providing a 14-fold genome coverage. These sequences were assembled by using the PHRED_PHRAP_CONSED software package28 (http://www.phrap.org/). Both ends of 292 fosmid clones with an average insert size of 40 kb were also sequenced, providing a validation check of the final assembly. Sequencing gaps were closed by primer walking on gap-spanning clones and combinatorial PCR-assisted contig extension29.

Genome annotation.

ORFs were predicted with the Glimmer software30, and functional annotation of predicted ORFs was carried out by an alignment search tool (Blastx) with a nonreviewed set database. Further analyses such as Pfam and COG (Clusters of Orthologous Genes)31 were carried out to find homologous protein domains and compare protein sequences between species.

Designing and spotting of oligonucleotides for microarrays.

We designed 50-mer oligonucleotide probes representing each Z. mobilis ORF, as follows: melting temperatures were normalized within 2 °C; the G+C content of designed oligonucleotide probes was restricted to 46 ± 2% matching the 46.33% G+C content of Z. mobilis; 'no sequence homology' to other regions of the genome was restricted to a maximum of 35 bp, with no exact sequence matches of more than 15 bp32. The 2,112 oligonucloetide probes and 48 control probes, whose concentrations were normalized to 50 μM (pmole/μl) in 50% DMSO, were spotted on CMT-GAP aminosilane-coated glass slides according to the order of the ORFs in the genome.

Labeling of genomic DNA and RNA.

Genomic DNAs isolated from Z. mobilis strains ZM1 and ZM4 were fluorescently labeled with random hexamers and either Cy3-labeled dCTPs or, Cy5-labeled dCTPs respectively, using the Klenow fragment of DNA polymerase. Total RNA was extracted using an RNeasy kit (Qiagen) with the RNA stabilizing solution, RNAlater (Ambion). We labeled 50 μg of total RNA from strain ZM1 with Cy3-labeled dCTPs, and 50 μg of total RNA from strain ZM4 was labeled with Cy5-labeled dCTPs, using reverse transcriptase (Superscript II; Invitrogen) with random hexamers33.

Nucleotide sequence accession number.

The sequence reported in this paper has been deposited in GenBank with accession number AE008692.

Microarray data.

Raw data files of microarray experiment are available at http://www.macrogen.com/zymomonas/microarray and EBI ArrayExpress DB with accession number E-MEXP-217.

Note: Supplementary information is available on the Nature Biotechnology website.