Transcriptome and translatome of CO2 fixing acetogens under heterotrophic and autotrophic conditions

Acetogens are anaerobic bacteria that utilise gaseous feedstocks such as carbon monoxide (CO) and carbon dioxide (CO2) to synthesise biomass and various metabolites via the energetically efficient Wood-Ljungdahl pathway. Because of this pathway, acetogens have been considered as a novel platform to produce biochemicals from gaseous feedstocks, potentially replacing the conventional thermochemical processes. Despite their advantages, a lack of systematic understanding of the transcriptional and translational regulation in acetogens during autotrophic growth limits the rational strain design to produce the desired products. To overcome this problem, we presented RNA sequencing and ribosome profiling data of four acetogens cultivated under heterotrophic and autotrophic conditions, providing data on genome-scale transcriptional and translational responses of acetogens during CO2 fixation. These data facilitate the discovery of regulatory elements embedded in their genomes, which could be utilised to engineer strains to achieve better growth and productivity. We anticipate that these data will expand our understanding of the processes of CO2 fixation and will help in the designing of strains for the desired biochemical production.

the Fd: H + oxidoreductase (Ech) complexes to translocate ions across the membrane to create ion gradient 13,14 . The established gradient drives the ion back into the cell through the ATP synthase complex, generating the ATP needed for the cell. Along with the ATP synthesis system, electron bifurcation, which oxidises one electron donor and transfers electrons to two different electron acceptors, helps to overcome energetic barriers by reducing low reduction potential Fd via oxidisation of a relatively higher reduction potential hydrogen molecule, which can then reduce CO 2 and the ion translocating complex 10,15,16 .
Along with the WLP and the energy conservation system, acetogens, similar to any other organism, contain intricate regulatory networks to control gene expression under different conditions. To date, a large number of acetogens have been sequenced to identify their genomic features; this further required a systematic understanding of their transcriptional and translational regulatory processes. In contrast to genomic studies, relatively few studies on acetogens' transcriptomes and translatomes under autotrophic growth conditions have been published. The lack of uniformly generated RNA sequencing (RNA-Seq) and ribosome profiling (Ribo-Seq) data for acetogens has not only limited from obtaining knowledge on the cellular responses but also expanding potential genetic tools for strain engineering. RNA-Seq and Ribo-Seq can determine the strength of promoters and Shine-Dalgarno (SD) sequences in the 5´-untranslated regions, which regulate transcription and translation, respectively.
In this study, we determined changes in the transcriptional and translational responses of acetogens under autotrophic growth condition compared to heterotrophic growth conditions, using RNA-Seq and Ribo-Seq. RNA-Seq and Ribo-Seq were performed on four acetogen species, Acetobacterium woodii, Clostridium aceticum, Clostridium drakei, and Eubacterium limosum cultured under the two conditions. Although studies on E. limosum and a transcriptomic study on C. drakei have been described in previous studies by our group, this study provides a uniformly generated and processed dataset of the additional model acetogens, which allows the comparative analysis of the transcriptome and translatome of CO 2 fixing acetogens 17,18 . The presented RNA-Seq and Ribo-Seq results will provide a fundamental understanding of the responses of the acetogens to autotrophic conditions, and thereby widen genetic tools for strain engineering to produce biochemicals using CO 2 as a carbon building block.

Methods
Bacterial culture conditions. For this study, A. woodii DSM 1030, C. aceticum DSM 1496, C. drakei DSM 12750, and E. limosum DSM 20543 were obtained from the Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures (DSMZ, Braunschweig, Germany). A. woodii, C. aceticum, and C. drakei were cultured under strict anaerobic conditions at 30 °C and E. limosum was cultured under anaerobic conditions at 37 °C in 150 mL serum bottles filled with 100 mL DSMZ 135 medium (pH 7.0), which is composed of 1 g L −1 NH 4 Cl, 2 g L −1 yeast extract, 10 g L −1 NaHCO 3 , 0. aceticum, and C. drakei were cultivated in the fructose supplemented (5 g L −1 ) media and E. limosum was cultured in glucose supplemented (5 g L −1 ) media. For autotrophic growth, all of the strains were cultivated in DSMZ 135 media containing H 2 /CO 2 (80:20) with pressure of 200 kPa in the headspace (50 mL). The media used for culturing A. woodii was supplemented with 2 g L −1 NaCl. For the main culture, the precultured cells were harvested via anaerobic centrifugation, then washed with basal DSMZ 135 media three times and inoculated in 100 mL fresh DSMZ 135 media supplemented with corresponding carbons. All of the strains were cultured in biological duplicates.

RNA-Seq library preparation.
Duplicate samples were harvested at the mid-exponential phase by centrifugation at 4,000 g for 15 min at 4 °C. The collected cells were resuspended anaerobically in 500 µL of lysis buffer, comprising 20 mM Tris-HCl (pH 7.4), 140 mM NaCl, 5 mM MgCl 2 , and 1% Triton X-100. Liquid nitrogen was used to freeze the samples, which were then ground using a mortar and pestle. Lysed cells were thawed on ice, and the debris was removed by centrifugation at 4,000 g for 15 min at 4 °C. Subsequently, the total RNA was isolated using TRIzol (Thermo Scientific, Waltham, MA, USA) according to the manufacturer's instruction. To remove the remaining genomic DNA (gDNA), the RNA was treated with 4 U of rDNase I (Ambion, Austin, TX, USA) for 1 h at 37 °C, then incubated at 75 °C for 10 min to deactivate the enzyme. To remove ribosomal RNAs (rRNA) in the gDNA-depleted RNA, the Ribo-Zero TM rRNA Removal Kit for Meta-bacteria (Epicentre, Madison, WI, USA) was used according to the manufacturer's instruction. The quality of the rRNA-depleted RNA was checked using an Agilent 2200 TapeStation system (Agilent Technologies, Santa Clara, CA, USA). To construct the libraries for RNA-seq, the TrueSeq Stranded mRNA Library Prep Kit (Illumina, San Diego, CA, USA) was used on the quality confirmed RNA. The libraries were sequenced using the 150 bp read recipe with an Illumina MiSeq TM system.

Ribo-Seq library preparation.
For Ribo-Seq, 100 µM chloramphenicol (CM) was added to the cultures which were then further incubated at 30 °C or 37 °C, corresponding to the duplicate culture conditions, for 10 min. The CM treated cells were subsequently washed using 500 µL polysome buffer composed of 20 mM Tris-HCl (pH 7.4), 140 mM NaCl, 5 mM MgCl 2 , and 100 µM CM, and resuspended in lysis buffer consisting of 20 mM Tris-HCl (pH 7.4), 140 mM NaCl, 5 mM MgCl 2 , 100 µM CM, and 1% Triton X-100. The resuspended cells were frozen in liquid nitrogen and ground with a pestle and mortar. The powdered cells were recovered by centrifugation at 4,000 g for 15 min at 4 °C, and the resultant supernatant was additionally centrifuged at 16,000 g for 10 min at 4 °C. To degrade RNA unprotected by ribosomes, 400 U MNase (NEB, Ipswich, MA, USA), 2 µL bovine serum albumin (1 mg mL −1 ), and 20 µL of 10 × MNase buffer were added and samples were incubated at 37 °C for 2 h with gentle rotation. To inactivate the reaction, 10 µL 0.5 M EGTA (Sigma-Aldrich, St. Louis, MO, USA) was added to the sample. The monosome fraction was recovered using Microspin S-400 HR columns (GE Healthcare Life Sciences, Marlborough, MA, USA). The recovered ribosome-bound RNA was isolated using TRIzol, and the remaining rRNAs were removed with the Ribo-Zero TM rRNA Removal Kit for Meta-bacteria. For the phosphorylation reaction, samples were denatured at 80 °C for 90 s, equilibrated to 37 °C, and incubated at 37 °C for 1 h with 5 µL 10 × T4 PNK buffer (NEB), 20 U SUPERase-In RNase Inhibitor, and 10 U T4 PNK (NEB). After purification of the RNA samples using RNeasy MinElute Column (Qiagen, Hilden, Germany), the concentration of purified RNA was measured using the Qubit RNA HS assay kit (Invitrogen, Carlsbad, CA, USA). For library construction, the small RNA library prep kit for Illumina (NEB) was used, and the constructed library was sequenced using the 50 bp read recipe on an Illumina Hiseq2500.
Data processing. For RNA-Seq, the adapter sequence of the sequenced reads and quality below Phred score of 20 were trimmed. Trimmed reads shorter than 20 bp were discarded to improve the accuracy of the mapping result. Using CLC Genomics Workbench (CLC Bio, Aarhus, Denmark), the trimmed reads were mapped onto the A. woodii (NC_016894), C. aceticum (NZ_CP009687), C. drakei (NZ_CP020953), and E. limosum (NZ_ CP019962) genomes using default parameters (mismatch cost = 2, deletion cost = 3, insertion cost = 3, length  www.nature.com/scientificdata www.nature.com/scientificdata/ fraction = 0.9, and similarity fraction = 0.9) and only the uniquely mapped reads were rescued. The gene expression was calculated from the mapped read count statistics using the DESeq. 2 package in R with default parameters. For Ribo-Seq, the adaptors of the generated reads and quality below Phred score of 20 were removed, then trimmed with the same parameters applied for RNA-Seq, and reads shorter than 20 bp were again removed. The reads were mapped onto the A. woodii (NC_016894), C. aceticum (NZ_CP009687), C. drakei (NZ_CP020953), and E. limosum (NZ_CP019962) genomes using the default parameters (mismatch cost = 2, deletion cost = 3, insertion cost = 3, length fraction = 0.9, and similarity fraction = 0.9) and only the uniquely mapped reads were  www.nature.com/scientificdata www.nature.com/scientificdata/

technical Validation
Acetogens have drawn much attention due to their ability to fix CO 2 using the efficient WLP and the energy conservation system. To systematically understand this metabolism, four acetogen species, A. woodii, C. aceticum, C. drakei, and E. limosum were cultivated under heterotrophic or autotrophic condition, then sampled at the corresponding mid-exponential point (Fig. 1). RNA-Seq and Ribo-Seq libraries were created and sequenced using the Illumina platforms. For RNA-Seq, total bases of 1,989,686,915 nt, 3,309,840,659 nt, 3,061,597,892 nt, and 4,122,893,029 nt for A. woodii, C. aceticum, C. drakei, and E. limosum, respectively, were generated as raw data (Table 1). After obtaining the raw data, the adaptor sequences and poor-quality reads of lower than 99.9% accuracy were removed, resulting in 1,925,462,358 nt, 3,192,358,774 nt, 2,898,139,861 nt, and 4,033,415,284 nt,