Introduction

Clostridium pasteurianum is an obligately anaerobic, endospore-forming soil bacterium that is emerging as an attractive industrial host owing to its unique fermentative metabolism1,2,3 and newfound capacity to directly consume electric current4. Growth of C. pasteurianum on conventional sugars, such as glucose or sucrose, yields a butyric acid fermentation characteristic of the clostridia5,6. Conversely, growth on glycerol leads to a marked shift in metabolism distinguished by an alcohologenic profile comprised of 1,3-propanediol and butanol5,7,8. The glycerol fermentation carried out by C. pasteurianum has drawn significant attention to the organism in light of the recent growth in global biodiesel production, which has generated an abundance of crude glycerol, now considered a waste-stream, rather than a valued co-product9,10,11,12. Crude glycerol is present at approximately 10% (w/w) of the final biodiesel preparation, causing its value to sharply decline in accordance with expansion of the biodiesel industry. Consequently, abundant and inexpensive waste glycerol has found application in various processes, including animal-feeding, composting, anaerobic digestion, and a range of other thermochemical and biological conversions12,13. Owing to the vast metabolic diversity found in nature, the biotechnological route of waste glycerol valorization is often regarded as the most promising10. Fermentation of glycerol, naturally carried out by species of Klebsiella, Citrobacter, and Clostridium14, in addition to engineered E. coli15, offers an array of value-added bioproducts, including ethanol, butanol, 1,2- and 1,3-propanediol, and 2,3-butanediol. While propanediol and butanediol are important chemical building blocks, butanol serves as a prospective biofuel that is superior to ethanol in both physicochemical and fuel properties16. In nature, C. pasteurianum is the only organism known to convert glycerol as a sole carbon and energy source into butanol1. Despite its promise, however, the C. pasteurianum glycerol fermentation is currently one of the most poorly understood glycerol-to-biofuel processes.

Bacteria defend against bacteriophages (phages), plasmids, and other invading nucleic acids through the use of primitive cellular immune systems. The chief prokaryotic defence mechanisms are restriction-methylation (RM) and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) systems, both of which utilize endonuclease-mediated attack and afford host cells with self versus non-self discrimination17,18. Restriction endonucleases target DNAs for attack through recognition of short, typically palindromic, recognition sequences, whereby self-restriction is blocked by host methylation18. On the other hand, CRISPR systems provide adaptive immunity through the use of host-encoded DNA sequence tags specific to an invading element19. These sequence tags, or spacers, are flanked by short partially palindromic repeats (24–47 bp) and provide the basis for immunity against future invasions19. CRISPR arrays are dynamic in nature, as new spacers are rapidly acquired in response to predation and unused spacers are excised from the host genome. CRISPR-associated (Cas) proteins, involved in both acquisition of new spacer sequences and subsequent attack of invading elements, are often found in close proximity to CRISPR arrays within bacterial and archaeal genomes20. Gene mining tools and comprehensive online databases, such as REBASE21 and CRISRdb22, enable simple identification of putative RM and CRISPR systems within bacterial genomes. Next generation sequencing also offers an avenue for discovery of novel RM systems through sensitive detection of host-modified nucleotides23. As phage attack is often implicated as a key factor in the historic failure of large-scale clostridial acetone and butanol (AB) production24, identification and characterization of clostridial defence systems could provide a means of engineering immunity against phage predation and associated culture deterioration25. In this context, genomic analysis provides an opportunity to assess phage content of bacterial genomes26, including the presence of intact and active prophages27, which constitutes an important yet largely unexamined facet of clostridial biology.

A small number of studies have investigated the potential of C. pasteurianum to produce butanol from crude glycerol2,3,28,29,30. To complement such efforts, techniques have recently been developed allowing high level electrotransformation and chromosomal gene disruption and deletion using C. pasteurianum31,32,33, thus paving the way for rational metabolic engineering and strain optimization efforts34. A full genome sequence is available for an environmental isolate of C. pasteurianum (strain BC1) (unpublished data) and two completed genome sequences have recently been announced for the type strain [ATCC 6013 (DSM 525)]35,36. In addition to an expanding repertoire of genetic tools and genome sequencing data, it is clear that a better understanding of the central fermentative pathways of C. pasteurianum is paramount to the advancement of this organism for biotechnological valorization of crude glycerol2. In this study, we present a concurrent genome assembly for C. pasteurianum type strain ATCC 6013 (DSM 525) and provide detailed analysis of the organism’s unique fermentative metabolism. We show that the organism exhibits a highly flexible, branched central metabolism, where product distribution varies considerably between carbon sources and is dictated chiefly by redox characteristics of the fermentation substrate. To stimulate a more thorough understanding of C. pasteurianum genetics and general biology, we also provide insight into the organism’s defence mechanisms through analysis of the restriction-modification methylome and identification of a chief Type I-B CRISPR system. Finally, we provide evidence that the genome of C. pasteurianum encodes an intact prophage that is spontaneously excised under standard growth conditions and induced using mitomycin C. The detailed genomic analysis of C. pasteurianum presented herein will contribute to our understanding of substrate utilization and biofuel production by this promising organism, as well as provide a genetic and metabolic framework for rational strain engineering.

Results and Discussion

C. pasteurianum ATCC 6013 (DSM 525) genome closing and detection of an extrachromosomal circular bacteriophage excision product

We recently reported a draft genome sequence of C. pasteurianum comprised of 12 contigs37. To join contigs, we employed an additional round of SMRT sequencing using a size-selected large insert library and the RS II analyzer (Pacific Biosciences; Menlo Park, CA), resulting in a draft genome sequence comprised of two contigs of sizes 4.37 Mbp (contig 1) and 13.2 kb (contig 2). Contig 2 was analyzed and found to be comprised of two regions (approximately 6.2 kb and 7.0 kb) that were also identified within contig 1. These regions possess 28 bp of overlap within contig 2, yet are separated by approximately 29.0 kb in contig 1 (see Supplementary Fig. S1). We analyzed the 29.0 kb intervening sequence between the 6.2 kb and 7.0 kb regions of contig 1 and identified a number of genes encoding putative phage and prophage gene products. PHAST analysis26 of the C. pasteurianum genome and a concurrent genome sequencing effort33 predicted an intact prophage within this region. Hence, we hypothesized that genome sequencing reads corresponding to contig 2 could have arisen from spontaneous excision of a circular, extrachromosomal phage or phage-like product from the genome of C. pasteurianum. It is possible that contig 2 (13.2 kb) corresponds to sequence generated directly from the excised phage (i.e. the phage genome) or phage-like product. Further, the 28 bp overlap sequence between the 6.2 kb and 7.0 kb regions in contig 2 was found to be preserved within the 5′ terminus of the 6.2 kb region and the 3′ terminus of the 7.0 kb region of contig 1 (see Supplementary Fig. S1). This sequence arrangement is consistent with a chromosomal excision event, where the 28 bp overlap sequence of contig 2 represents the phage attachment site following excision (attP) and the corresponding 28 bp sites within the 6.2 kb and 7.0 kb regions of contig 1 are the respective left (attL) and right (attR) phage attachment sites within the chromosome of C. pasteurianum (i.e. prior to phage excision) (Fig. 1a). Chromosomal phage integration presumably occurred through homology between the free phage attP site and the bacterial chromosome (attB), whereas subsequent excision of the prophage proceeded via recombination between attL and attR. Based on this hypothesis, phage excision should result in a single chromosomal “scar” site (attB) within the phage-less C. pasteurianum genome. We confirmed this hypothesis by successfully amplifying the phage-less bacterial chromosome region (attB), as well as products corresponding to the unexcised prophage (attL and attR) and the excised circular phage product (attP) (Fig. 1b). All four PCR products were of the expected sizes based on the proposed phage excision event and Sanger DNA sequencing revealed the expected nucleotide sequences (data not shown). In the absence of excision, the attB PCR primer set is expected to generate a 43.3 kb product, which is beyond the amplification limits of PCR. We were also successful in PCR-amplifying overlapping 22.7 kb and 22.8 kb products, which both span the attP attachment site and together comprise the full-length excised phage genome (Fig. 1a,b). Furthermore, the orientation of primers utilized to generate these products confirms circularity of the excised phage product.

Figure 1: Identification and excision of phage φ6013 from the genome of C. pasteurianum.
figure 1

(a) Predicted excision mechanism of phage φ6013 from the genome of C. pasteurianum. Phage excision was induced by exposing exponential phase cultures of C. pasteurianum to 5 μg ml−1 mitomycin C, leading to recombination between attL and attR sites. Sequences corresponding to the core attL and attR φ6013 recombination sites are shown in uppercase. The resulting attP sequence of phage φ6013 is compared to the similar 12 nt core attP site of phage φ3626 from C. perfringens. Prophage excision leads to a circular 42,250 bp phage genome and a single attB scar site within the genome of C. pasteurianum. PCR primers for screening attL, attR, attP, and attB recombination sites are shown, as well as screening primers for long range PCR of the circular excised φ6013 genome. Genomes, genomic regions, and PCR primers are not depicted to scale. (b) PCR verification of phage φ6013 excision from the C. pasteurianum chromosome. Orientation and arrangement of PCR primers are depicted in Fig. 1a. Lane 1: marker; lane 2: 904 bp attL product (attLB.S + attL.AS); lane 3: 872 bp attR product (attRP.S + attRB.AS); lane 4: 3,154 bp attP product (attRP.S + attP.AS); lane 5: 1,076 bp attB product (attLB.S + attRB.AS); lane 6: long range PCR marker; lane 7: 22,756 bp 5′ φ6013 product (φ6013.S + attP.AS); lane 8: 22,678 bp 3′ φ6013 product (attRP.S + φ6013.AS). (c) Transmission electron microscopy image of phage φ6013 visualized at 245,000× magnification. (d) Genomic arrangement of phage φ6013 (42,250 bp). All 52 predicted genes, including some functional assignments, are depicted and are numbered consecutively. Genes in black and grey depict different directions of transcription. The predicted phage attachment site (attP) described in the main text is shown. All genes and intergenic regions are depicted to scale.

Although activation of the C. pasteurianum prophage occurred spontaneously in the aforementioned analysis, prophage excision can be artificially induced using ultraviolet irradiation or mitomycin C, a potent antibiotic38,39. To assess phage induction, we exposed growing cultures of C. pasteurianum to varying concentrations of mitomycin C (0, 1, 2.5, 5 and 10 μg ml−1) and monitored culture turbidity for signs of phage release and cell lysis, characterized by a dramatic decline in OD60027. The culture containing 5 μg ml−1 mitomycin C exhibited a dramatic decrease in OD600 approximately four hours following induction. Imaging via TEM of the 5.0 μg ml−1 mitomycin C phage lysates revealed an abundance of phages with long tails possessing short terminal fibers (Fig. 1c). We measured tail size of seven distinct, well-resolved phages, resulting in an average length of 242 ± 11 nm. Based on these observations, we hypothesize that the excised phage, which we designate φ6013, belongs to the Siphoviridae family of bacteriophages27. The excised φ6013 phage possesses a 42,250 bp circular genome with a GC content of 33.2%, which is greater than the GC content of the C. pasteurianum chromosome (29.9%). A total of 52 protein-coding genes, possessing homology to both phage and bacterial genes, are annotated in the phage genome, 48 of which are transcribed in the same direction (Fig. 1d; see Supplementary Table S1). Phage gene products can be grouped into packaging and structural proteins (capsid, tail, terminase, and portal proteins), lysis proteins (holin and endolysin), lysogeny proteins (integrase and repressor), and DNA modification, replication, and gene expression proteins (DNA polymerase and helicase). Since proteins for both cell lysis (a holin and autolysin) and lysogeny (an integrase and cI-like transcriptional repressor) could be identified, φ6013 clearly embodies a temperate phage. PHAST analysis26 of φ6013 gene products revealed a total of 39 proteins possessing significant protein identity to other phages. Of particular relevance are the C. perfringens φCP5138 (18 protein matches) and φ3626 phages39 (5 protein matches), both of which are part of the Siphoviridae family. Interestingly, φ6013, in addition to φCP51 and φ362638,39, encodes a sporulation-specific transcriptional regulator (spoIIID) and RNA polymerase sigma factor (sigE), indicating a potential relationship between φ6013 and sporulation of C. pasteurianum. Spore titers have been reported to differ between seemingly identical strains of C. pasteurianum obtained from different culture collections (ATCC and DSM)36, which could reflect variation in levels of excision of φ6013. Genome sequences previously reported for the ATCC and DSM type strains both contain the intact φ6013 prophage, as it is presumed that the authors did not generate sequencing reads corresponding to the excised phage. Since we observed relatively high-levels of spontaneous phage excision in this study, corresponding to a sequencing coverage of 35–60×, it would be advantageous to determine which set of conditions, if any, are responsible for activation of phage φ6013.

Following elucidation of the φ6013 excision mechanism, a single unclosed 4.37 Mbp contig remained (i.e. contig 1), which was found to be approximately 22 kb larger than previously reported C. pasteurianum genome sequences35,36. Based on extensive long range PCR analyses, we provide evidence that the C. pasteurianum contig gap is part of a large 93.5 kb chromosomal duplication (see Supplementary Note and Supplementary Fig. S2). This duplication, which is flanked by genes encoding transposable elements, is not present in previous C. pasteurianum genome sequences and its large size prevented closing of our draft assembly using traditional PCR methods. Hence, our current genome assembly is comprised of a single unclosed contig.

General C. pasteurianum genome characteristics

Based on the hypothesis outlined above, the expected size of the C. pasteurianum genome is 4,444,510 bp. Our current draft genome is comprised of a single 4,373,654 bp contig possessing a GC content of 29.9% (Fig. 2). No plasmids could be identified. The size of the genome is within the range of most clostridia (3–5 Mbp34). Based on 16S rRNA phylogeny, C. pasteurianum is most closely related to (from most related to least related) C. acidisoli, C. akagii, C. arbusti, and C. carboxidivorans40, yet complete genome sequences are not available for these species. C. acetobutylicum, C. botulinum, C. autoethanogenum, and C. ljundahlii are the closest relatives with fully-sequenced genomes (see Supplementary Table S2)40. Currently six C. pasteurianum genome sequencing projects are underway or have been completed35,36,37,41,42, highlighting the recent emergence of C. pasteurianum as a promising industrial producer of butanol. These genome efforts encompass three finished genome sequences and three draft sequences, collectively covering three distinct strains of the species. As outlined above, one concurrent effort recently reported two closed genome sequences of the C. pasteurianum type strain from two culture collections (ATCC 6013 and DSM 525)35,36. A brief comparison of the genome sequence reported in this study with other C. pasteurianum sequencing projects is provided in Table 1. The C. pasteurianum genome is predicted to possess 3,803 protein coding genes, 81 tRNA genes, and 30 rRNA genes (Fig. 2). Approximately 75% of genes in the genome could be assigned a function based on Cluster of Orthologous Groups (COGs), while 1,006 genes (approximately 25% of genes) were annotated as general function only (COG function R) or function unknown (COG function S). Aside from poorly characterized genes, the largest COGs, collectively comprising approximately one third of all protein-coding genes (32.3%), were ones involved in amino acid transport/metabolism (COG function E; 12% of genes), energy production/conversion (COG function C; 10% of genes), and carbohydrate transport/metabolism (COG function G; 10% of genes). The large proportion of genes involved in metabolism and energy production is testament to the exceptional metabolic flexibility exhibited by C. pasteurianum.

Figure 2: The chromosome of C. pasteurianum ATCC 6013.
figure 2

Contig 1 (4,373,654 bp) is depicted as a circular chromosome and shows the approximate location of key genomic features discussed in this study. The two outermost circles indicate locations of gene coding regions (blue) in plus (circle one) and minus (circle two) strands. Genes encoding tRNAs and rRNAs are shown in fuchsia and lavender, respectively. Circle three shows G + C content (deviation from average) and circle four depicts G + C skew in plus (green) and minus (purple) strands. Genome scale is indicated in Mbp on the innermost circle. The CGView Server120 was used to construct the genome map.

Table 1 Overview of C. pasteurianum genome sequencing projects completed or currently underway.

C. pasteurianum is a flagellated bacterium and two large (approximately 25.7 kb and 14.5 kb) flagella loci were identified in the genome and together encode all core flagellar structural genes, including genes involved in filament [fliC (CP6013_1370, CP6013_1401)], hook [flgE (CP6013_1358)], and rod [flgB (CP6013_1369), flgC (CP6013_1368), and flgG (CP6013_1343, CP6013_1344)] formation. A collection of chemotaxis genes are located immediately downstream of the flagellar loci, while motA (CP6013_3023) and motB (CP6013_3022) chemotaxis genes are encoded at a distant location in the genome. C. pasteurianum is believed to be the first isolated nitrogen-fixing organism and nitrogen fixation from cell-free lysates was first observed using C. pasteurianum43. Cell-free extracts convert atmospheric N2 into ammonia using ferredoxin as electron donor and ATP to drive the reaction44. A core cluster of key nitrogen fixation genes (nif), including genes encoding the MoFe dinitrogenase (nifDK; CP6013_1738 and CP6013_1737) and Fe dinitrogenase reductase (nifH1; CP6013_1739) protein components45, is present within a 29.9 kb region of the genome (CP6013_1731–1754) and has been described extensively in C. pasteurianum46 and other nitrogen-fixers47. This region also encodes nifE (CP6013_1736), nifN-B (CP6013_1735), nifC (CP6013_1733), nifV1 (CP6013_1731), and nifV2 (CP6013_1732) genes involved in nitrogenase assembly and Mo/Fe insertion. It has been reported that C. pasteurianum possesses a total of six nifH and nifH-like genes48,49, which we confirmed with BLAST analysis using NifH1 as a protein query. Four of the five NifH-like amino acid sequences [NifH2 (CP6013_1740), NifH4 (CP6013_3825), NifH5 (CP6013_2037), and NifH6 (CP6013_1749)] possess substantial sequence identity to NifH1 (92–99%), while NifH3 (CP6013_3385) was found to contain 64% of NifH1 amino acid identities. NifH5 (CP6013_2037) was not found to be associated with other nitrogenase components, while NifH4 (CP6013_3825) was identified within a smaller nitrogen-fixing cluster (CP6013_3825–3832) containing putative nifK (CP6013_3827) and nifE (CP6013_3826 and CP6013_3832) genes, as well as two nifB or nifB-like genes (CP6013_3829 and CP6013_3830). Interestingly, three nifH-like genes, nifH5 (CP6013_2037), nifH3 (CP6013_3385), and nifH4 (CP6013_3825), are found adjacent to genes encoding putative transposases (CP6013_2035, CP6013_3386, and CP6013_3824, respectively). NifH3 (CP6013_3385) is the only NifH-like protein that is not transcribed under nitrogen fixation conditions, as it exemplifies a Mo-independent Fe-nitrogenase50. Genes encoding the Fe-nitrogenase component (anfDGK; CP6013_3387–3389) are close to the Fe dinitrogenase reductase gene (nifH3/anfH; CP6013_3385). In addition to Mo-dependent (Nif) and Mo-independent (Anf) nitrogenases, it has been reported that C. pasteurianum harbors a vanadium-dependent nitrogen-fixing (Vnf) system51. We identified a putative vanadium-dependent nitrogenase locus, represented by vnfD (CP6013_1748), vnfG (CP6013_1747), and vnfK (CP6013_1746), which are positioned adjacent to nifH6 (CP6013_1749).

Analysis of the C. pasteurianum methylome and restriction-modification systems

To gain further insight into the cellular defence mechanisms of C. pasteurianum and guide future genetic work with this organism, we analyzed the organism’s methylome using SMRT sequencing data37. Only m6A modifications could be detected in this study, as sequence coverage was insufficient for m5C detection. Methylome analysis unveiled four distinct m6A methylation motifs (Table 2). Two such motifs are associated with experimentally-verified restriction activities. The first is CpaI18, a Type II system with a recognition sequence of 5′-GATC-3′ that is common within the Clostridium genus52. We previously identified and characterized restriction activity corresponding to the other m6A system detected using methylome analysis32. This Type I restriction-methylation-sensitivity (RMS) system, with a predicted recognition sequence of 5′-AAGNNNNNCTCC-3′ (N = any nucleotide; A, C, G, or T), was designated CpaAII. Interestingly, only 37.8% of CpaI recognition sites within the genome were found to be methylated, while 92.9% of CpaAII sites were modified. Two new m6A-specific methylation motifs were also detected in this study, with recognition sequences of 5′-GRTAAAG-3′ and 5′-CAAAAAR-3′ (R = purine; A or G). In addition to the m6A-specific RM activities outlined above, a third experimentally-verified RM system has been elucidated in the type strain of C. pasteurianum. This system, designated CpaAI (5′-CGCG-3′)52, is m5C-specific, and thus was not detected in this study.

Table 2 Overview of the C. pasteurianum methylome and RM systems.

Based on analysis by REBASE21, the genome of C. pasteurianum is predicted to encode a total of eight methyltransferase genes, which ostensibly includes the single Type I RMS system, two Type II RM systems, a single Type II protein with dual R + M activities, and two lone Type II M proteins lacking associated R activity. The CpaAII Type I RMS system (5′-AAGNNNNNCTCC-3′) is encoded by genes CP6013_0336, CP6013_0337, and CP6013_0338, respectively, while genes CP6013_2557 and CP6013_2558 represent the CpaAI (5′-CGCG-3′) Type II RM proteins, respectively. Whereas gene CP6013_0098 corresponds to the R protein of the Type II CpaI system (5′-GATC-3′), three consecutive adjacent genes (CP6013_0095–0097) could putatively encode the associated M activity. Based on REBASE21 and our methylome data, it is predicted that gene CP6013_1459 encodes a lone M protein with a recognition sequence of 5′-CAAAAAR-3′, gene CP6013_0727 encodes the dual RM protein possessing 5′-GRTAAAG-3′ recognition, and gene CP6013_0738 codes for the remaining lone M protein, with an unknown recognition sequence. Note that methylation by the CP6013_0738-encoded methyltransferase was undetected in this study though the gene shares similarity with a silent non-specific methyltransferase gene often found within prophages. Such prophage genes can only be activated upon cloning into a plasmid. Finally, it is noteworthy that C. pasteurianum appears to restrict DNA substrates possessing CpG (5′-CG-3′) or GpC (5′-GC-3′) methylation based on plasmid transformation assays31, suggesting the presence of a methylation-dependent restriction endonuclease in C. pasteurianum. While wild-type E. coli cleaves DNA substrates containing m5C53, Type IV methylation-dependent restriction endonucleases are widespread in bacteria21. However, the genetic basis corresponding to the putative methylation-dependent restriction activity observed in C. pasteurianum could not be deciphered based on the genomic analysis performed in this study.

Identification of putative CRISPR systems

Approximately 45% of bacterial genomes encode CRISPR-associated (Cas) proteins and putative CRISPR arrays comprised of repetitive repeat-spacer units22. Surprisingly, 20 of 27 (74%) clostridial genomes are predicted to encode CRISPR-Cas systems22,25, compared to only 44% of Firmicutes54. Hence, we analyzed the genome of C. pasteurianum for putative CRISPR arrays and Cas-encoding genes using CRISPRfinder55. Two putative CRISPR arrays were identified possessing 8 and 37 unique spacer sequences ranging in length from 34 to 41 bp (Fig. 3). Although the arrays are separated by 2.1 Mbp within the C. pasteurianum genome, the 30 bp direct repeat sequences between the two CRISPR loci are identical, suggesting that the same set of Cas proteins are employed for spacer acquisition and interference. The 37-spacer locus was found to be associated with several cas genes (CP6013_0534–0541), while no such genes could be identified in proximity to the 8-spacer array. Based on the proposed classification of CRISPR-Cas systems54, the C. pasteurianum CRISPR system belongs to the Type I-B subtype owing to the presence of the signature Type I cas3 gene (CP6013_0538) and Type I-B cas8b (csh1) gene (CP6013_0535). Furthermore, the C. pasteurianum CRISPR-Cas system possesses the same cas gene arrangement (cas6-cas8b-cas7-cas5-cas3-cas4-cas1-cas2) found in other prokaryotic Type I-B systems54. All eight cas genes are transcribed in the same direction and are located downstream of the 37-spacer CRISPR array. Analysis of similar Type I-B systems suggests that transcription occurs in the same direction as the cas genes, indicating existence of a CRISPR leader sequence possessing an active transcriptional promoter immediately upstream of the 37-spacer CRISPR array. We were unable to identify sequences at the 5′ or 3′ ends of the 8-spacer CRISPR array possessing homology to the presumed 37-spacer CRISPR leader. Accordingly, it is unclear if the 8-spacer CRISPR locus possesses a leader sequence, and, therefore, functionality of this CRISPR array is uncertain.

Figure 3: Genomic analysis of the central Type I-B CRISPR system of C. pasteurianum.
figure 3

Structure and orientation of CRISPR arrays and cas genes within the genome of C. pasteurianum are shown. Numbers below genes specify locus tags (CP6013 prefix is omitted). Three genes, encoding a putative histidine kinase (CP6013_0531), transposase (CP6013_0532), and a hypothetical protein (CP6013_0533), are located between the 37-spacer CRISPR array and cas genes. Genes encoding the Type I-B Cas proteins are located adjacent to a 37-spacer CRISPR array (spacers are depicted as dark gray boxes). A second 8-spacer CRISPR array (spacers are depicted as light gray boxes) possessing the same 30 nt direct repeat sequence (diamonds) was found elsewhere in the C. pasteurianum chromosome, separated from the cas genes by approximately 2.1 Mbp. The sequence of the common 30 nt direct repeat sequence is shown corresponding to the direction of transcription, which is in opposite directions. A predicted RNA folded structure of the 30 nt direct repeat is shown and compared to the 8 nt 5′ tag of mature crRNA from the C. thermocellum Type I-B system. A putative leader sequence is depicted upstream of the 37-spacer array, while the presence of a similar element within the 8-spacer array is not clear.

Processing of CRISPR RNAs (crRNAs) and subsequent binding to specific Cas proteins differs significantly between Type I and Type II CRISPR systems54. Type I and II crRNAs are first transcribed into a single large precursor RNA transcript (pre-crRNA), which is then cleaved into individual mature CRISPR RNAs by the ubiquitous RNase III enzyme in Type II systems56 and Cas6 in Type I systems57. crRNA processing involves a trans-activating RNA (tracrRNA) in Type II systems56, whereas Cas6 recognizes distinct RNA hairpin structures formed by the direct repeat sequence of Type I systems57. Mature Type I crRNAs possess a unique spacer sequence flanked by an 8 nt 5′ tag and a variable 3′ tag, both of which are derived from the CRISPR repeat sequence following processing57,58. The 5′ and 3′ tags of the crRNA are responsible for recognition by specific Cas proteins, while the unique internal spacer base-pairs to the target invading DNA and triggers endonuclease attack. Compared to Type I CRISPR-Cas machinery, interference against invading genetic elements is markedly simpler in Type II systems, which require only the Cas9 protein for endonucleolytic attack54,59. Type I systems are characterized by Cas3-mediated cleavage of invading targets60, which involves a multiprotein complex called Cascade (Cas complex for antiviral defence), comprised of Cas5, Cas6, Cas7, and Cas861. The precise crRNA processing mechanism utilized by the Type I-B CRISPR systems from C. thermocellum and Methanococcus maripaludis have been recently elucidated62. Whereas 3′ ends of mature crRNAs were short and variable, indicating a lack of specificity in 3′ trimming by Cas6, 5′ ends possessed the trademark 8 nt tag of Type I crRNAs. Using Mfold63, we analyzed the direct repeat sequence from the Type I-B CRISPR system of C. pasteurianum and, as expected, identified a putative hairpin secondary structure (Fig. 3). Moreover, the 3′ end of the C. pasteurianum direct repeat sequence possesses 6/8 nucleotides in common with the 3′ terminus of the C. thermocellum repeat sequence corresponding to the 5′ tag of mature crRNA, suggesting a similar mechanism of processing between the two Type I-B systems62. In fact, the universal 8-nt 5′ tag, and CRISPR repeats in general, is often highly conserved between related bacteria64. Analysis of the C. pasteurianum direct repeat sequence by CRISPRFinder55 revealed CRISPR systems from a range of organisms with repeats possessing less than five mismatches to the C. pasteurianum query. C. tetani was the only organism identified that employs a repeat sequence identical to that of C. pasteurianum, suggesting that horizontal gene transfer potentially occurred between C. tetani and C. pasteurianum. However, no homology could be identified between spacers from the respective species. Other organisms with similar direct repeats include a range of clostridia (e.g., C. botulinum, C. kluyveri, and C. autoethanogenum), as well as Bacillus coagulans and Eubacterium limosum. Most CRISPR arrays harbored by these organisms specify only a small number of spacers (<7), compared to 37 in C. pasteurianum. It is likely that C. pasteurianum has been subjected to a greater degree of phage predation compared to other organisms employing similar CRISPR systems, leading to extensive acquisition of novel spacer sequences by C. pasteurianum. Although BLAST analysis of the 45 C. pasteurianum CRISPR spacers provided no perfect matches to potential protospacer sequences, a number of spacers returned protospacer hits with five or fewer mismatches, which could be sufficient to confer immunity, since spacer-protospacer sequences are often imperfect65. We are currently assessing the activity and functionality of the C. pasteurianum Type I-B CRISPR-Cas machinery against plasmid-borne protospacer sequences.

Overview of central fermentative metabolism

The central fermentative metabolism of C. pasteurianum is unprecedented in nature, as the organism combines metabolic pathways found independently in other clostridia (Fig. 4). C. pasteurianum possesses the clostridial butyrate and butanol formation pathways, a characteristic of AB producers such as C. acetobutylicum and C. beijerinckii16,24,66. In contrast, the organism does not typically produce acetone, the chief co-product of the historic AB fermentative process. Furthermore, under certain culture conditions C. pasteurianum expresses a highly active 1,3-propanediol pathway in a manner similar to C. butyricum, which lacks an active butanol formation pathway7. Owing to this unique metabolic diversity, the central metabolism of C. pasteurianum is complex and highly substrate-dependent.

Figure 4: Overview of the central metabolic pathways of C. pasteurianum based on genomic analysis.
figure 4

Prevalent metabolic pathways leading to production of acids (green), alcohols (blue), and gases (red) are shown derived from commonly employed growth substrates. Many arrows represent multiple enzymatic conversions. The acetone formation pathway is depicted using dashed lines since acetone is not a common product of C. pasteurianum fermentations. The incomplete citrate cycle and other intermediary metabolic pathways are not depicted. Electron bifurcation by the Bcd-EtfAB enzyme complex is shown using 2NADH as reductant. Electron transfer via the EtfAB complex is not shown. Refer to main text for further discussion on central metabolic pathway enzymes and reactions. Abbreviations: EMPP, Embden-Meyerhof-Parnas pathway; N-O PPP, non-oxidative pentose phosphate pathway; MEDP, modified Entner-Doudoroff pathway; CFP, central fermentative pathways; PTS, phosphotransferase system; PMF, proton motive force; GFPC, glycerol facilitator protein channel; Glc, glucose; Suc, sucrose; Fru, fructose; DHA, dihydroxyacetone; 3-HPA, 3-hydroxypropionaldehyde; 1,3-PDO, 1,3-propanediol; GA, glyceraldehyde; FDOX, oxidized ferredoxin; FDRED, reduced ferredoxin.

Primary upstream pathways

C. pasteurianum oxidizes sugars to pyruvate through the ubiquitous Embden-Meyerhof-Parnas (EMP) pathway (Fig. 4). The organization of EMP pathway genes in C. pasteurianum is similar to that found in C. ljungdahlii67 and other clostridia, where two gene clusters (CP6013_0364–0368: glyceraldehyde-3-phosphate dehydrogenase, phosphoglycerate kinase, triosephosphate isomerase, phosphoglycerate mutase, and enolase; and CP6013_0418 and CP6013_0419: 6-phosphofructokinase and pyruvate kinase) comprise the bulk of the EMP pathway genes. Unlike C. ljungdahlii, however, C. pasteurianum lacks the oxidative phase of the pentose phosphate pathway (PPP). A complete complement of genes corresponding to the non-oxidative phase of the PPP, including multiple copies of ribulose-5-phosphate isomerase (CP6013_0382, CP6013_2877, CP6013_3967), transketolase (CP6013_0396, CP6013_2326, CP6013_2327), and transaldolase (CP6013_2291, CP6013_2325, CP6013_2340), were identified in the genome. Like most Gram-positive bacteria, C. pasteurianum also lacks a full Entner-Doudoroff (ED) pathway. Instead, gluconate is catabolized via a modified ED pathway through conversion into 2-keto-3-deoxy-6-phosphogluconate, which is subsequently siphoned into glycolysis via glyceraldehyde-3-phosphate and pyruvate (see section below on gluconate fermentation)68. In line with other clostridia (e.g., C. acetobutylicum, C. ljungdahlii, C. autoethanogenum)25,67,69, C. pasteurianum possesses a non-cyclic, or branched, citrate “cycle”. Cell-free extracts have been shown to generate glutamate from oxaloacetate70, representing one branch of the pathway, which is comprised of citrate synthase, aconitase (CP6013_2146), and isocitrate dehydrogenase (CP6013_1709). Although we were unable to locate a gene corresponding to citrate synthase within the genome of C. pasteurianum, it is assumed that glutamate formation from oxaloacetate proceeds via the aforementioned pathway. The second branch of the citrate cycle is exemplified by malate dehydrogenase (CP6013_0066 and CP6013_0670) and fumarate hydratase (CP6013_3554 and CP6013_3555). However, genes corresponding to α-ketoglutarate dehydrogenase, succinate dehydrogenase, and succinyl-CoA synthetase could not be identified in the genome of C. pasteurianum.

Glycerol catabolism and 1,3-propanediol-formation pathway

Compared to other fermentations carried out by C. pasteurianum, the fermentation of glycerol is unique owing to production of 1,3-propanediol, a signature product of glycerol metabolism71. Glycerol is catabolized by one of two divergent pathways, deemed the reductive and oxidative routes. The former pathway involves direct reduction of glycerol to 1,3-propanediol. Glycerol dehydratase (dhaBCE; CP6013_0898–0900) converts glycerol into the toxic intermediate, 3-hydroxypropionaldehyde, which is then reduced to 1,3-propanediol by 1,3-propanediol dehydrogenase (dhaT; CP6013_0905)72. In the C. pasteurianum genome, both enzymes are encoded within a 7 kb reductive glycerol regulon, the structure and organization of which is distinct from that of other organisms capable of growth on glycerol, such as Citrobacter freundii73 (Fig. 5). Both dhaBCE and dhaT genes from C. pasteurianum have been cloned and characterized72,74 and the corresponding proteins are highly conserved between C. pasteurianum and various species of Klebsiella and Citrobacter (65–81% identity). Further, DhaT from C. pasteurianum was found to share 86% of amino acid identities with the same enzyme from C. butyricum. The reductive 1,3-propanediol pathway requires NADH, in addition to vitamin B1275, and offers a non-glycolytic route for consuming excess reducing equivalents76. Glycerol is more reduced than biomass, and thus, leads to a net production of NADH when biomass is formed from glycerol under anaerobic conditions1. Whereas flux through glycolytic pathways, such as the butanol and ethanol formation routes, results in redox balance, the NADH-consuming 1,3-propanediol pathway is the only metabolic route that affords the cell a means of oxidizing the surplus of reducing equivalents derived from biomass formation. In fact, the ability to ferment glycerol as a sole source of energy is a metabolic feature exclusive to anaerobic organisms possessing an active 1,3- or 1,2-propanediol-producing pathway77. The alternative oxidative route of glycerol catabolism in C. pasteurianum involves conversion of glycerol into the glycolytic intermediate dihydroxyacetone phosphate by the concerted action of glycerol dehydrogenase (DhaD) and dihydroxyacetone kinase (DhaK). Dihydroxyacetone phosphate is then further oxidized to pyruvate via the standard glycolytic pathway. The genome of C. pasteurianum encodes at least five putative dhaD genes (CP6013_0378, CP6013_1584, CP6013_1937, CP6013_3371, and CP6013_3819) and one potential dhaK gene (CP6013_1936), whereby the chief dhaDK regulon (CP6013_1936 and CP6013_1937) precedes a glpF gene (CP6013_1935) encoding the glycerol uptake facilitator protein (Fig. 5). An abundance of glycerol-catabolizing enzymes presumably enables C. pasteurianum to tolerate exceptionally high concentrations of the substrate (up to 170 g L−1) without detectable growth inhibition5.

Figure 5: Genomic arrangement of key genes and operons involved in the central fermentative metabolism of C. pasteurianum.
figure 5

C. pasteurianum genes and operons (left) are compared with corresponding regulons from related species or key bacteria possessing similar metabolic pathways (right). Select additional copies of C. pasteurianum genes and operons are also depicted (bottom). Locus tags are provided for C. pasteurianum genes (CP6013 prefix is omitted). Metabolic functions of gene products are discussed in detail in the main text. Genes in black and grey depict different directions of transcription. All genes and intergenic regions are depicted to scale.

Lactate- and hydrogen-formation pathways

Although the preferred outcome of pyruvate catabolism in C. pasteurianum involves oxidation to acetyl-CoA, certain culture conditions can result in significant accumulation of lactate via direct reduction of pyruvate by NADH, catalyzed by lactate dehydrogenase1,2,5. Alcohol dehydrogenases involved in production of ethanol and butanol contain iron, and therefore, conditions of iron limitation have been shown to impede alcohol production and trigger lactate formation5. In this sense, the lactate pathway operates as a backup valve to relieve the cell of excess reductant when preferred routes of NADH oxidation, such as ethanol and butanol production, are blocked. Like C. cellulolyticum78, the genome of C. pasteurianum harbors two l-lactate dehydrogenase genes (CP6013_1427 and CP6013_0421), which share 43% of amino acid identities. Under standard growth conditions where iron is not limiting, pyruvate is oxidized to acetyl-CoA through pyruvate:ferredoxin oxidoreductase, referred to as the phosphoroclastic reaction owing to the formation of ATP and acetate from acetyl-CoA during growth on glucose79. The genome of C. pasteurianum contains three putative pyruvate:ferredoxin oxidoreductase genes (CP6013_1431, CP6013_2634, CP6013_1432). The organism also harbors a potential alternative route of pyruvate oxidation via pyruvate formate lyase80, for which three genes are present in the genome (CP6013_3048, CP6013_2343, CP6013_2339). It has been suggested, however, that the pyruvate formate lyase reaction is predominantly anabolic in Clostridium81, rendering pyruvate:ferredoxin oxidoreductase the prevalent pathway of pyruvate oxidation. Electrons generated from the phosphoroclastic system are utilized to reduce ferredoxin, which is primarily oxidized by hydrogenase with coupled evolution of molecular hydrogen5. Ferredoxin (CP6013_3660) was first discovered in C. pasteurianum82 and has since served as a model electron transfer protein. A total of three ferredoxin iron hydrogenase-encoding genes (CP6013_3094, CP6013_3784, CP6013_3422) can be identified in the genome of C. pasteurianum, including the bidirectional hydrogenase I (CP6013_3094)83 and the H2-oxidizing, uptake hydrogenase (CP6013_3784 or CP6013_3422)84.

Acetate-formation pathway

Acetyl-CoA embodies the central branch point of clostridial fermentations for direct conversion into acetate and ethanol, or condensation to yield butyrate and butanol16,66. As discussed in detail below (refer to section on substrate redox considerations), the fate of acetyl-CoA in C. pasteurianum is largely substrate-dependent and dictated by redox. The cell relies on the analogous acetate- and butyrate-formation pathways as the major source of ATP synthesis, since production of either metabolite results in substrate-level phosphorylation (Fig. 4). C. pasteurianum harbors a single acetate-formation operon comprised of pta (CP6013_1096) and ackA (CP6013_1097) encoding phosphoacetyltransferase and acetate kinase, respectively. Interestingly, the coding sequences corresponding to pta and ackA in C. pasteurianum are separated by 134 bp, whereas the analogous genes in C. acetobutylicum are separated by only 11 bp (Fig. 5). Since a putative promoter could be identified in the 134 bp intergenic region, these two genes may not exist in an operon structure, which contrasts the genetic arrangement found in most clostridia85. This finding may extend to other strains of C. pasteurianum, as the pta and ackA genes of strain BC1 also possess a relatively large spacer region of 103 bp.

Ethanol-formation pathway

When C. pasteurianum is grown on reduced substrates, the organism relies at least partially on the ethanol formation pathway to maintain redox balance. Ethanol production has been shown to correlate with pH between values of 6.5 and 7.5 during batch fermentations of glycerol1. Overall, however, ethanol production plays only a minor role in most fermentations carried out by C. pasteurianum, since the organism prefers the butyrate- and butanol-formation pathways for regeneration of NAD+. The genome of C. pasteurianum harbors an array of genes encoding aldehyde dehydrogenases (CP6013_0292, CP6013_1611, CP6013_1661, and CP6013_2575) and alcohol dehydrogenases (CP6013_0781, CP6013_1579, CP6013_2048, CP6013_2062, CP6013_3785, and CP6013_2711) for reduction of acetyl-CoA to ethanol via acetaldehyde. C. acetobutylicum harbors two bifunctional aldehyde-alcohol dehydrogenases, encoded by adhE (aad) and adhE2, that play major roles in the production of butanol and, to a lesser extent, ethanol86,87,88. Four protein products encoded in the genome of C. pasteurianum (CP6013_0292, CP6013_1611, CP6013_1661, and CP6013_2575) were found to possess substantial similarity (62–81%) to both AdhE and AdhE2 from C. acetobutylicum. It is probable that genes encoding these enzymes are involved in the production of ethanol and butanol in C. pasteurianum.

C4 trunk pathway

Butyrate- and butanol-forming clostridia produce C4 metabolites through the condensation of two molecules of acetyl-CoA16. This complex transformation involves the sequential action of four enzymes: thiolase (acetyl-CoA acetyltransferase; thl), 3-hydroxybutyryl-CoA dehydrogenase (hbd), crotonase (3-hydroxybutyryl-CoA dehydratase; crt), and butyryl-CoA dehydrogenase (bcd), and results in the generation of butyryl-CoA and oxidation of two moles of NADH per mole of butyryl-CoA formed. Thiolase, the first enzyme of this trunk pathway, condenses two molecules of acetyl-CoA, yielding acetoacetyl-CoA. C. pasteurianum harbors two putative thiolase genes (CP6013_2289 and CP6013_3617), one of which (CP6013_3617) has been cloned and characterized89. The two thiolase protein sequences share 90% of amino acid identities. Genes involved in the conversion of acetoacetyl-CoA to butyryl-CoA are organized in an operon in C. acetobutylicum, referred to as the butyryl-CoA synthesis (bcs) operon (Fig. 5). In addition to crt, bcd, and hbd, the operon also encodes both electron-transfer flavoprotein (Etf) subunits, etfA and etfB, required for reduction of crotonyl-CoA by NADH in the enzymatic step catalyzed by Bcd90. The full five-gene operon (crt-bcd-etfB-etfA-hbd; CP6013_0322–0326) was found to be highly conserved between C. pasteurianum and C. acetobutylicum (80% nucleotide identity across the entire 4.8 kb operon), as well as most other clostridia. Based on amino acid identities, the proteins of the bcs operon are most similar to those from C. arbusti, C. acetobutylicum, C. tetani, and C. botulinum (69–94% amino acid identity). In particular, 92–94% of amino acids of the bcs enzymes are common between C. pasteurianum and C. arbustii. In addition to the bcs operon, additional copies of crt (CP6013_2054), bcd (CP6013_2052 and CP6013_2324), etfB (CP6013_1682), etfA (CP6013_1681, CP6013_1657, and CP6013_2324), and hbd (CP6013_1378 and CP6013_1968), could be identified in the genome of C. pasteurianum (Fig. 5). Multiple copies of the bcs operon genes have been reported in other solventogenic clostridia, including C. carboxidivorans and C. beijerinckii91. In addition to the bcs operon, we also identified the rex gene (CP6013_0321) encoding a putative redox-sensing transcriptional regulator upstream of the bcs operon in C. pasteurianum. The C. pasteurianum Rex protein was found to possess 76% identity to the corresponding protein from C. acetobutylicum92. Accordingly, it appears that Rex-associated regulation of the bcs operon is similar between these organisms and is dictated by the cellular NADH/NAD+ ratio.

Growing cultures of C. pasteurianum generate reductant in the form of NADH and reduced ferredoxin93. Theoretically, electrons can be shuttled between these two species via ferredoxin:NAD+ oxidoreductase/NADH:ferredoxin oxidoreductase, which catalyzes the reversible reduction of NAD+ by reduced ferredoxin93,94,95. Electron flow from ferredoxin to NAD+ is evident under certain non-standard culture conditions, such as inhibition of hydrogenase by carbon monoxide5 or methyl viologen96. In these instances, abundant NADH is utilized to drive production of reduced end products, typically butyrate and butanol5. However, the ferredoxin:NAD+ oxidoreductase reaction is inhibited by low levels of NADH93,95, rendering the NADH:ferredoxin oxidoreductase pathway the presumed direction of electron flux in clostridial fermentations. Still, electron flow from NADH (E0′ = −320 mV) to ferredoxin (E0′ = −400 mV), is highly unfavorable, spawning considerable skepticism surrounding the thermodynamic feasibility of this pathway in vivo93. Despite this uncertainty, it has been observed that glucose-grown cultures of C. pasteurianum evolve more molecular hydrogen than can be accounted for by the phosphoroclastic reaction (determined by the combined amount of acetate and butyrate formed), indicating that under certain conditions NADH serves as reductant through operation of the unfavorable NADH:ferredoxin oxidoreductase reaction7. Likewise, it has been shown that cell-free extracts of C. pasteurianum produce hydrogen gas from acetyl-CoA and NADH, again implying electron transfer from NADH to ferredoxin94. This thermodynamic mystery has remained unresolved for more than 35 years, until recently when Hermann et al.97 proposed that ferredoxin reduction by NADH proceeds via coupling to the exergonic reduction of crotonyl-CoA to butyryl-CoA by NADH. This theory opened the door to a novel mode of energy conservation through electron bifurcation by the Bcd-EtfAB enzyme complex in C. pasteurianum, C. kluyveri, and possibly other solventogenic clostridia98. EtfAB has been implicated as the key enzyme complex responsible for electron bifurcation97, whereby one electron of NADH is utilized for the exergonic reduction of crotonyl-CoA to butyryl-CoA and the free enthalpy change is harnessed to drive reduction of ferredoxin using the remaining electron from NADH. Repeating this process consumes two moles of NADH and generates one mole each of butyryl-CoA and reduced ferredoxin97. The resulting electrons from ferredoxin are then used to drive production of molecular hydrogen by the hydrogenase enzyme, at last providing an explanation for earlier biochemical data obtained using cell-free extracts of C. pasteurianum94. Interestingly, gene CP6013_2324 within the genome of C. pasteurianum was found to possess similarity to both bcd (CP6013_0323) and etfA (CP6013_0325), suggesting presence of a Bcd-EtfA fusion protein. It has been suggested that redox partners evolve into a single fusion protein to promote more efficient conversion of unstable intermediates and rapid transfer of electrons99. A similar bcd-etfA fusion ortholog could only be identified in C. kluyveri (76% nucleotide identity), which could provide insight into the recently-proposed electron bifurcation mechanism of the Bcd-EtfAB enzyme complex in C. pasteurianum and C. kluyveri97,98.

Acetone-formation pathway

C. pasteurianum harbors a full acetone-formation pathway consisting of CoA transferase subunits A and B (encoded by ctfAB) and an acetoacetate decarboxylase (encoded by adc) for conversion of acetoacetyl-CoA to acetone via acetoacetate37,42 (Fig. 5). The structure and arrangement of this classical acetone-forming sol operon are identical to that of C. acetobutylicum100, whereby the ctfAB genes (CP6013_1660 and CP6013_1659, respectively) are preceded by a putative adhE (aad) gene (CP6013_1661). Additional copies of the ctfAB genes (CP6013_2266 and CP6013_2267, respectively) and two genes encoding putative CtfAB fusion proteins (CP6013_2053 and CP6013_3216) were also identified in the genome (Fig. 5). Analysis of genes CP6013_2053 and CP6013_3216 identified similar ctfAB fusion genes in a number of clostridia, including C. beijerinckii, C. carboxidivorans, and C. saccharobutylicum. In C. pasteurianum, the sol operon is positioned adjacent to a reverse-orientation adc gene (CP6013_1658), as found in C. acetobutylicum100 (Fig. 5). The C. pasteurianum CtfAB and Adc enzymes possess a high degree of similarity (71–84%) to the corresponding proteins of C. acetobutylicum, a significant acetone-producer. Despite these similarities, acetone is not a common metabolite of C. pasteurianum101. Production of acetone, as well as ethanol and butanol, is inherently linked to acetate and butyrate uptake in C. acetobutylicum as a means of preventing acid crash under low pH conditions102. Consequently, a lack of acetone production could be the result of an inability of C. pasteurianum to uptake and reassimilate acids, as acid levels generally increase throughout the course of fermentation1,103 and do not exhibit the characteristic drop associated with acetate and butyrate assimilation by C. acetobutylicum. It is also possible that the acetone pathway remains inactive in C. pasteurianum due to a lack of pathway induction under standard growth conditions, poor enzymatic activities, or lack of a functional transcriptional promoter to drive expression of the sol operon or adc gene. Induction of acetone production in C. acetobutylicum has been studied extensively and inducers include low pH and elevated concentrations of acetate and butyrate104.

Butyrate- and butanol-formation pathways

Similar to acetyl-CoA, butyryl-CoA serves as a major branch point in the central metabolism of C. pasteurianum. Butyryl-CoA can be converted into butyrate or further reduced to butanol in pathways that mimic the C2 fermentative pathways leading to production of acetate and ethanol. C. pasteurianum harbors a single butyrate-formation operon, consisting of phosphotransbutyrylase (ptb; CP6013_3580) upstream of butyrate kinase (buk; CP6013_3581) (Fig. 5). Ptb and Buk from C. pasteurianum possess a high degree of similarity (80% and 73%, respectively) to the corresponding enzymes from C. acetobutylicum. Unlike C. acetobutylicum, however, we were unable to identify a second copy of buk [i.e. buk2105] within the genome of C. pasteurianum. The other pathway from butyryl-CoA is the reductive butanol formation route, where consecutive dehydrogenation steps convert butyryl-CoA first to butyraldehyde, then butanol. In addition to the aforementioned adhE (aad) and adhE2 genes, two butanol dehydrogenases, encoded by bdhA and bdhB, have been implicated in butanol formation in C. acetobutylicum106. BLAST analysis of the C. pasteurianum genome using BdhA and BdhB protein queries returned a large array of alcohol dehydrogenases (CP6013_2711, CP6013_1579, CP6013_2048, CP6013_0905, CP6013_2062, CP6013_1661, CP6013_3785, CP6013_0292, CP6013_0781, CP6013_2575, and CP6013_1611) possessing similarity to the C. acetobutylicum isozymes. Notably, protein products corresponding to genes CP6013_2711 and CP6013_1579 produced the highest degree of identity to both BdhA (72% and 41%, respectively) and BdhB (67% and 39%, respectively) from C. acetobutylicum. Although bdhA and bdhB occur in tandem within the chromosome of C. acetobutylicum107, alcohol dehydrogenases possessing a similar genetic arrangement could not be identified in the genome of C. pasteurianum. Surprisingly, disruption of bdhA or bdhB in C. acetobutylicum had no effect on solvent formation, while disruption of adhE nearly abolished production of solvents86. Based on these findings, genes possessing the greatest protein identity to adhE from C. acetobutylicum, specifically CP6013_1661, CP6013_2575, CP6013_0292, and CP6013_1611, are likely to be the greatest contributors to butanol, as well as ethanol, formation in C. pasteurianum.

Effect of substrate reductance on fermentation end product distribution

C. pasteurianum readily utilizes glucose, fructose, mannitol, sorbitol, and sucrose, among other substrates108 (Fig. 4). Substrates that are metabolized at a reduced rate include arabinose, galactose, lactose, starch, and xylose. C. pasteurianum genes encoding putative phosphotransferase system (PTS) components could be identified for most of these fermentable substrates109,110, while galactose and gluconate have been shown to be taken up using a proton motive force (PMF)111. Owing to the immense substrate range exhibited by C. pasteurianum, product distribution varies dramatically and is dictated foremost by the degree of reductance of the substrate112. Such an effect has been documented for C. pasteurianum, where fermentation of glucose generates a predominantly acidogenic metabolism, while fermentation of mannitol or glycerol yields almost exclusively alcohols5,7. Based on our genome sequencing data, we further probed this model by comparing product distribution of C. pasteurianum grown on substrates of varying degrees of reductance, thereby allowing manipulation of the intracellular NADH/NAD+ ratio113. We selected gluconate, glucose, mannitol, and glycerol and show below that catabolism of these substrates leads to four distinct fermentation profiles ranging from entirely acidogenic to primarily alcohologenic (Fig. 6). In each case, cell growth and product distribution were assessed by analyzing 60–90 h fermentation samples from anaerobic static flask cultures at pH 6.0–6.2 containing 40 g L−1 of substrate (sodium gluconate, glucose, mannitol, or glycerol).

Figure 6: Effect of substrate degree of reductance on the fermentation product profile of C. pasteurianum.
figure 6

Active metabolic pathways employed by the cell are shown during growth on a range of substrates possessing varied degrees of reductance. General catabolic equations are provided and show the number of moles of reducing equivalents generated (in bold) per two moles of pyruvate formed. Substrates and pathway intermediates are depicted as black and blue diamonds, respectively, while acid and alcohol products are shown as blue circles and squares, respectively. Trace products (<1 g L−1) are shown in red. Product titers are provided and discussed within the main text. Lactate and acetone were not detected and gaseous products were not measured. Abbreviations are defined in Fig. 4.

With a degree of reductance of 3.67, gluconate is the most oxidized substrate fermented by C. pasteurianum. The substrate is first dehydrated to 2-keto-3-deoxy-gluconate (KDG) by gluconate dehydratase (CP6013_2550)114, followed by phosphorylation to 2-keto-3-deoxy-6-phosphogluconate by 2-keto-3-deoxygluconokinase (CP6013_3201). The resulting product is then cleaved by 2-keto-3-deoxy-6-phosphogluconate aldolase (CP6013_3200 and CP6013_2554)68, yielding glyceraldehyde-3-phosphate and pyruvate (Fig. 4). Catabolism of one mole of gluconate generates two moles of pyruvate, yet only one mole of NADH (Fig. 6). C. pasteurianum static flask cultures grown on sodium gluconate yielded an entirely acidogenic fermentative metabolism yielding 8.5 ± 0.6 g L−1 acetate and 5.9 ± 0.3 g L−1 butyrate, as the butyrate pathway alone was sufficient to oxidize NADH. The NADH-consuming ethanol and butanol pathways were not induced during gluconate catabolism, presumably due to a low intracellular NADH/NAD+ ratio. With a degree of reductance of 4115, glucose is less oxidized than gluconate, resulting in the production of two moles of NADH per two moles of pyruvate generated (Fig. 6). Fermentation of glucose often yields exclusively acetate and butyrate, yet some studies have reported notable butanol production101. We observed butanol (3.4 ± 1.2 g L−1) as the predominant fermentation product, with equal quantities of acetate (2.4 ± 0.4 g L−1) and butyrate (2.4 ± 0.8 g L−1). The relatively high levels of butanol detected in this study may be explained by growth medium formulation, as we utilized a medium optimized for production of butanol from glycerol103. Mannitol and sorbitol, both six-carbon sugar alcohols that are readily fermented by C. pasteurianum112, possess degrees of reductance of 4.33115, and therefore, are more reduced than glucose. Both substrates enter the cell using the same PEP-dependent PTS where they are phosphorylated and converted into fructose-6-phosphate (Fig. 4) by mannitol-1-phosphate dehydrogenase (CP6013_0304 and CP6013_2639) or sorbitol-6-phosphate dehydrogenase (CP6013_0284 and CP6013_0306)109,110,116. Oxidation of mannitol or sorbitol generates a total of three moles of NADH per two moles of pyruvate formed (Fig. 6). Fermentation of mannitol by C. pasteurianum leads to a product profile characterized by high butanol selectivity, as cultures produced 6.0 ± 1.7 g L−1 butanol and only trace amounts of acetate (0.4 ± 0.1 g L−1), butyrate (0.5 ± 0.3 g L−1), and ethanol (0.7 ± 0.4 g L−1), indicating that the cell relies almost exclusively on the butanol pathway for oxidation of NADH. Similar products have been detected from the fermentation of mannitol by C. pasteurianum in continuous culture112. Since glycerol possesses a degree of reductance of 4.67115, glycerol produces four moles of NADH per two moles of pyruvate formed (Fig. 4), compared to only three moles of NADH from mannitol or sorbitol (Fig. 6). Glycerol is taken up by C. pasteurianum using a unique glycerol facilitator protein channel (CP6013_1935), where it is then converted into 1,3-propanediol or siphoned into glycolysis via dihydroxyacetone phosphate. Owing to its high degree of reduction, glycerol catabolism by C. pasteurianum leads to substantial quantities of reduced end products, specifically butanol and 1,3-propanediol1,5. Medium formulation and cultivation conditions can be manipulated to favor production of either product103,117. Under the conditions employed in this study, butanol titer (7.0 ± 0.2 g L−1) surpassed that of 1,3-propanediol (5.2 ± 1.8 g L−1), while ethanol (1.3 ± 0.4 g L−1), acetate (0.7 ± 0.2 g L−1), and butyrate (0.2 ± 0.1 g L−1) represented minor co-products. The highly reduced product profile of C. pasteurianum during growth on glycerol underscores the immense industrial potential of this organism in producing butanol from crude glycerol. Since substantial quantities of 1,3-propanediol are produced along with butanol, the 1,3-propanediol pathway represents a key target of rational metabolic engineering. Note that fundamental genetic engineering technologies have only recently been developed for this organism31,32,33, whereas previous efforts have focused on random mutagenesis28,118 and bioprocessing approaches28, such as separation of butanol and 1,3-propanediol product streams via in situ removal of butanol.

While the redox state of the fermentation substrate represents the chief factor governing product distribution in C. pasteurianum, carbon and electron flow can be manipulated using a number of strategies. The effect of carbon monoxide on anaerobic fermentations has been widely documented in Clostridium119, where controlled gassing leads to potent inhibition of the hydrogenase enzyme. With hydrogen production shut down, cells are forced to utilize the ferredoxin:NAD+ oxidoreductase reaction to oxidize ferredoxin, resulting in NADH formation and subsequent production of butanol and ethanol5. In addition to carbon monoxide, redox dyes, such as methyl viologen, can be employed to induce solvent production in the clostridia, also through inhibition of the hydrogenase enzyme96. Finally, it has recently been shown that C. pasteurianum is able to utilize electrons derived directly from a supplied electric current4. Whereas other electroactive organisms require an exogenous mediator to facilitate electron transfer, C. pasteurianum is a rare exception capable of uptaking electrons directly from a cathode. Moreover, cells were found to utilize substrate and exogenous electrons concomitantly, thus building on the biotechnological potential harnessed by C. pasteurianum. On the other hand, utilization of externally-supplied electrons manifested in increased titers of 1,3-propanediol, rather than butanol, which is in line with the role of the 1,3-propanediol pathway in maintaining redox poise76. As butanol is the most promising end product of C. pasteurianum metabolism, electrosynthesis of butanol represents an important and challenging target of future strain engineering efforts. In this context, it is anticipated that the genomic analysis presented herein, as well as previous studies of C. pasteurianum metabolism1,5 and still-developing genetic technologies34, will lead to productive metabolic engineering outcomes and robust mutant strains for industrial conversion of waste glycerol to butanol using C. pasteurianum.

Materials and Methods

Strain, oligonucleotides, and growth conditions

C. pasteurianum type strain ATCC 6013 (DSM 525) was obtained from the American Type Culture Collection (ATCC). Oligonucleotides (see Supplementary Table S3) were purchased from and synthesized by Integrated DNA Technologies (IDT; Coralville, IA) at the 25 nanomole scale using standard desalting. All chemicals were purchased from Sigma-Aldrich (St. Louis, MO). Strain ATCC 6013 was grown under strictly anaerobic conditions in a semi-defined medium1,103 containing per liter: 22 g KH2PO4, 6.68 g K2HPO4, 7.35 g (NH4)2SO4, 5.08 g Bacto yeast extract, 0.2 g MgSO4 · 7H2O, 0.02 g CaCl2 · 2H2O, 0.06 g FeSO4 · 7H2O, 1 mg resazurin, and 2 ml trace element solution SL 7. The initial pH of the medium was 6.0–6.1 prior to sterilization. Carbon sources (sodium gluconate, glucose, mannitol, and glycerol) were sterilized separately as 100 g L−1 stock solutions and added to culture flasks to achieve a final concentration of 40 g L−1. Cysteine-HCl (0.5 g L−1) was used to reduce growth medium prior to inoculation. Static cultures were grown in 125 ml Erlenmeyer flasks containing 50 ml medium within an anaerobic containment chamber (Plas-Labs, Inc.; Lansing, MI) consisting of an environment of 85% N2, 10% H2, and 5% CO2. Seed cultures were prepared by heat-shocking single sporulated agar plate colonies at 80 °C for 10 minutes in 10 ml 2×YTG medium, pH 6.4 (16 g L−1 Bacto tryptone, 10 g L−1 Bacto yeast extract, 5 g L−1 glucose, and 4 g L−1 NaCl) as described previously31,32.

Analytical methods

Cell growth was monitored by measuring optical density at 600 nm (OD600). Culture supernatants were analyzed for metabolite production 60–90 h after inoculation. Product concentrations were determined by LC-10AT HPLC analysis (Shimadzu; Kyoto, Japan) equipped with a RID-10A refractive index detector (Shimadzu; Kyoto, Japan) and Aminex HPX-87H column (Bio-Rad Laboratories; Richmond, CA). Column temperature was maintained at 65 °C. The mobile phase consisted of 5 mM H2SO4 (pH 2.0) at a flow rate of 0.6 mL min−1. RID signal data processing was performed using Clarity Lite (DataApex; Prague, Czech Republic). End product titers reported represent the average of two or three biological replicates.

Phage induction and transmission electron microscopy

Phage excision and transmission electron microscopy (TEM) were performed in a manner similar to previous methods27. For mitomycin C induction, a single sporulated colony of C. pasteurianum ATCC 6013 was heat-shocked, grown to exponential phase, and used to inoculate 200 ml of fresh 2×YTG medium, pH 6.4. The resulting culture was grown to early exponential phase (OD600 0.2–0.3) and divided into six 25 ml cultures. Mitomycin C was added to a final concentration of 0, 0.5, 1, 2.5, 5, or 10 μg ml−1 and OD600 was monitored until a sharp decline in turbidity was observed approximately 4 h post-induction. One ml of the resulting phage lysates was centrifuged at 10,000× g for 10 minutes and the supernatants were filtered through a 0.45 μm filter. Following washing of phage particles twice with 0.1 M ammonium acetate, pH 7.5, five μl of lysate was pipetted onto 200-mesh Formvar/carbon-coated copper grids and incubated for approximately five minutes. Excess lysate was blotted with Whatman filter paper and grids were allowed to dry overnight. Grids were then stained for 10 minutes using a saturated uranyl acetate solution, followed by washing with 50% ethanol and drying in air for approximately three hours. Imaging was performed at 60 kV using a Philips CM10 transmission electron microscope equipped with a digital camera. Phage images were captured using 245,000× magnification.

DNA isolation, sequencing, and analysis

Total DNA was isolated from C. pasteurianum ATCC 6013 according to a previous method32 using a Qiagen (Valencia, CA) DNeasy Blood and Tissue Kit. The genome of C. pasteurianum ATCC 6013 was sequenced, assembled, and annotated as described37. An additional two single molecule real-time (SMRT) cells were sequenced from a size-selected large insert library using the RS II analyzer (Pacific Biosciences; Menlo Park, CA). Methylome analysis was performed by Pacific Biosciences and the Genomic Resource Center at the Institute for Genome Sciences (University of Maryland School of Medicine; Baltimore, MD) using raw RS II sequencing reads and the C. pasteurianum draft genome37 as a reference.

Long range PCR (15–35 kb) was performed using LongAmp Taq DNA Polymerase (New England Biolabs; Ipswich, MA). Large PCR products and intact genomic DNA were separated using 0.3–0.5% agarose gels and low voltage (12–15 V) electrophoresis for 12–18 h. Restriction endonucleases were obtained from New England Biolabs (Ipswich, MA) and utilized according to the manufacturer’s guidelines.

Additional Information

Accession numbers: This Whole Genome Shotgun project has been deposited at GenBank under the accession JPGY00000000. The version described in this paper is version JPGY02000000.

How to cite this article: Pyne, M. E. et al. Genome-directed analysis of prophage excision, host defence systems, and central fermentative metabolism in Clostridium pasteurianum. Sci. Rep. 6, 26228; doi: 10.1038/srep26228 (2016).