Bacteria respond to their environment by regulating mRNA synthesis, often by altering the genomic sites at which RNA polymerase initiates transcription. Here, we investigate genome-wide changes in transcription start site (TSS) usage by Clostridium phytofermentans, a model bacterium for fermentation of lignocellulosic biomass. We quantify expression of nearly 10,000 TSS at single base resolution by Capp-Switch sequencing, which combines capture of synthetically capped 5′ mRNA fragments with template-switching reverse transcription. We find the locations and expression levels of TSS for hundreds of genes change during metabolism of different plant substrates. We show that TSS reveals riboswitches, non-coding RNA and novel transcription units. We identify sequence motifs associated with carbon source-specific TSS and use them for regulon discovery, implicating a LacI/GalR protein in control of pectin metabolism. We discuss how the high resolution and specificity of Capp-Switch enables study of condition-specific changes in transcription initiation in bacteria.
Bacteria translate environmental signals into cellular responses using a network of regulatory RNA and proteins that control genome-wide transcription patterns. Many of these regulators affect where RNA polymerase initiates messenger RNA (mRNA) synthesis at transcription start sites (TSS). As such, locating and quantifying changes in TSS usage is an important step to understand bacterial gene regulation. Here, we investigate TSS architecture in Clostridium phytofermentans ISDg, a soil bacterium that ferments plant biomass into ethanol, H2 and acetate1, and belongs to the Lachnospiraceae family that includes gut commensals with important roles in host nutrition2,3. This anaerobic mesophile metabolizes diverse plant components including cellulose, hemicellulose and pectin by tailoring expression of many carbohydrate-active enzymes (CAZymes) and other metabolic enzymes to the available substrate4,5. C. phytofermentans has a 4.8 Mb genome with 3,926 predicted protein-encoding genes3, and its ability to alter gene expression in response to carbon sources and other environmental cues is mediated by over 300 transcription regulator proteins6 and numerous non-coding RNA including metabolite-sensing riboswitches7.
We investigate genome-wide patterns of C. phytofermentans transcription initiation on heterogeneous plant substrates by demonstrating an approach called Capp-Switch sequencing. The initiating nucleotide of nascent mRNA is distinguished by a 5′ triphosphate (5′-PPP), which has been exploited for genome-wide TSS identification with dRNA-seq8 by depleting rRNA and other monophosphorylated transcripts using terminal exonuclease (TEX). dRNA-seq has been applied to diverse bacteria9,10,11,12,13, but incomplete and non-specific degradation of processed RNA requires TSS identification to be based on statistical comparison of read coverage in +TEX and −TEX samples. Capp-Switch avoids these problems by capturing and purifying 5′ mRNA fragments, which are reverse transcribed with template-switching to tagged cDNA for high-throughput sequencing (Fig. 1). The 5′-PPP of mRNA are modified by vaccinia capping enzyme (VCE) to bear a biotinylated guanosine cap that facilitates their capture and purification using streptavidin magnetic beads. Recently, TSS were identified by Cappable-Seq14 using VCE to add a desthiobiotin cap for bead-based capture of 5′ mRNA, which were then eluted from the beads and de-capped to ligate adapters for reverse transcription to tagged cDNA. Capp-Switch streamlines this approach by reverse transcribing the 5′ mRNA fragments using template-switching by Moloney murine leukemia virus reverse (MMLV) transcriptase15. Template-switching avoids adapter ligation and enables synthesis of 5′-tagged cDNA without releasing RNA from the beads, permitting use of an irreversible, biotinylated cap to increase RNA capture affinity. In all, we show Capp-Switch is a robust method that yields a genome-wide, strand-specific, quantitative map of TSS at single nucleotide resolution.
We apply Capp-Switch sequencing to define a genome-wide map of 9,457 TSS during C. phytofermentans growth on raw biomass, heterogeneous polysaccharides (cellulose, hemicellulose and pectin) and their constituent sugars. We use this TSS map to investigate features controlling gene regulation, such as RNA polymerase binding sites, 5′ untranslated region (UTR) structure, alternative promoters, operons and non-standard (leaderless and antisense) transcription. We identify sequence motifs associated with groups of TSS that are differentially expressed on specific carbon sources and show these motifs can be used to reconstruct transcription factor regulons. By integrating Capp-Switch data with an updated genome annotation, RNA-seq and proteomics, we discover novel transcriptional units (TU) and protein-encoding genes. Finally, we discuss how Capp-Switch sequencing can be applied as a general approach to explore transcription regulation in prokaryotes.
General transcriptome features
Capp-Switch sequencing quantified TSS with high reproducibility between duplicate model substrate (Fig. 2a) and raw biomass (Fig. 2b) cultures. We identified 9,457 TSS across treatments (Supplementary Data 1), one-third of which were expressed in both sugar and polysaccharide cultures (Fig. 2c). Most reads (74%) contribute to InterS TSS (Fig. 2d), which we observed upstream of 898 genes. Among these, 687 genes (77%) are predicted to start operons16 (Supplementary Data 2), supporting these operon predictions and the existence of many sub-operons. The 5′ UTR, spanning from the primary TSS to the start codon, is less than 100 bp for most genes, but there is no correlation between 5′ UTR length and TSS strength (Fig. 2e). Studies in other bacteria report many leaderless mRNA without 5′ UTR and ribosome binding sites (RBS)11. Four per cent of InterS TSS are potentially leaderless in C. phytofermentans, but these genes generally have another upstream TSS and retain a typical RBS similar to highly expressed C. phytofermentans genes (Supplementary Fig. 1).
Most genes were expressed from a single, primary TSS on all substrates (Fig. 2f), but 191 (21%) genes altered their primary TSS in response to carbon source. Further, genes with substrate-specific InterS TSS are often differentially expressed on that carbon source (χ2 test, P<0.01 for all substrates relative to glucose) (Fig. 2g), supporting that changing TSS is a widespread means of transcription regulation. In total, more than a thousand TSS are specific to each polysaccharide (Supplementary Fig. 2A). Xylan-specific (Supplementary Fig. 2B) and pectin-specific (Supplementary Fig. 2C) TSS are primarily associated with carbohydrate metabolism genes, while the most abundant functional category of cellulose-specific TSS is prophage genes (Supplementary Fig. 2D). The C. phytofermentans genome includes a large prophage island that is not predicted to encode a viable phage3, but whose transcription is up-regulated on cellulose and biomass (Supplementary Fig. 3). This burst of transcriptional initiation at viral genes could indicate prophage excision was triggered on cellulosic substrates, that is, by low carbon stress, or that viral proteins contribute to bacterial fitness17.
Sequences upstream of primary TSS generally contain the sigma-A-type consensus −35 and −10 hexamers recognized by RNA polymerase (RNAP) and associated elements that likely contribute to promoter function in this organism. An A-rich region upstream of the -35 hexamer (TTGACA) (Fig. 2h) resembles the ‘UP element’ that stimulates transcription initiation by interacting with the RNAP alpha subunit18. Also, the Pribnow hexamer (TATAAT) has an upstream TG di-nucleotide (Fig. 2i), which enhances transcription in certain other bacteria19,20,21 by interacting with the RNAP sigma-A subunit22. In contrast, searching upstream of IntraS TSS identified an AT-rich stretch ∼10 bp upstream of the TSS lacking RNAP binding sites (Supplementary Fig. 4A), suggesting IntraS TSS often result from promiscuous initiation at AT-rich sequences. We observed IntraS TSS comprised that more than 50% of TSS (Fig. 2d), albeit with fewer reads per site than InterS TSS. dRNA-seq studies have rationalized similarly abundant intragenic TSS as resulting from incomplete TEX degradation12, but our data support these TSS bear 5′-PPP indicative of transcription initiation. IntraS TSS are preferentially found in the 5′ end of genes (Supplementary Fig. 4B), supporting they are under selective pressure and may have roles including expression of alternative protein isoforms or as mimicry molecules to sequester other RNA and ribonucleases from their mRNA targets9.
Capp-Switch reads (Fig. 3a–d) start at specific positions with respect to known genes showing TSS at single base resolution, whereas RNA-seq reads begin throughout genes (Fig. 3e–h). We observed four common TSS situations: genes with a single upstream TSS, genes with both upstream and intragenic TSS, genes with multiple TSS on a single substrate and genes with substrate-specific TSS. For example, the glyceraldehyde 3-phosphate dehydrogenase (gadph) gene is constitutively transcribed from a single TSS (Fig. 3a). The pyruvate ferredoxin oxidoreductase (pfor) gene is transcribed from a single, upstream TSS and another, weaker TSS in the coding sequence (Fig. 3b). The cel5A cellulase gene23 is simultaneously transcribed from multiple TSS on cellulose (Fig. 3c), as are other cellulases (Supplementary Fig. 5). CAZyme expression in C. phytofermentans is controlled by carbon source24,25 and our data supports their regulation involves multiple promoters. The cphy1510 gene encoding the most active xylanase5 is transcribed from three TSS on xylan and a different, upstream TSS on pectin (Fig. 3d). Similarly, genes for other CAZymes including three cellulases, one other xylanase, four pectinases and two glycosyl transferases changed their primary TSS as a function of carbon source. We confirmed the positions of the primary TSS identified by Capp-Switch for gadph, pfor (IntraS and primary TSS), cphy2243 and cphy1510 (xylan and pectin) using 5′ RACE (Supplementary Fig. 6).
Motifs associated with TSS clusters
We clustered TSS based on expression across carbon sources and searched sequences surrounding TSS for overrepresented motifs (Supplementary Fig. 7; Supplementary Data 3), revealing TSS clusters that share motifs with potential regulatory functions (Fig. 4). For example, the TSS cluster up-regulated on galacturonic acid and homogalacturonan (HG) (Fig. 4c) has a palindromic motif resembling the cre operator (TGAAAGCGCTTTCA) bound by B. subtilis CcpA26,27, a LacI/GalR regulator of numerous carbon metabolism genes. LacI/GalR genes often have upstream copies of their operators to auto-repress transcription28, and we found three copies of the galacturonic acid cluster motif in the 5′ UTR of cphy2742, a LacI/GalR gene specifically up-regulated on galacturonic acid (Fig. 5a). Further, three of the six LacI/GalR genes with detected primary TSS have upstream variants of the cre operator that are conserved in their orthologs from related species (Fig. 5b–d), leading us to propose C. phytofermentans LacI/GalR regulators recognize related, but distinct, operators to control separate regulons. Supportingly, the putative Cphy2742 operator (Fig. 5b) is upstream of 22 genes in the C. phytofermentans genome (Supplementary Table 1) including 3 CAZymes (PL9 pectin lyases) that degrade HG to galacturonic acid5 and transcription units containing all genes needed to assimilate galacturonic acid29 (Supplementary Fig. 8).
The putative Cphy2742 operator sites are co-located with or downstream of TSS for HG degradation and galacturonic acid metabolism genes (Fig. 5e), supporting Cphy2742 binds these sites to block transcription. Transcription of the pl9 genes cphy2919 and cphy3869 switches to upstream primary TSS on galacturonic acid relative to HG, but all TSS are close enough to be potentially regulated by Cphy2742 operators. The pta-ackA (cphy1326-7) acetate synthesis operon also has a Cphy2742 operator and both pta-ackA expression and acetate formation are elevated on galacturonic acid (Supplementary Fig. 9). While B. subtilis CcpA represses most of its targets, it activates pta and ackA transcription30,31 by binding upstream of their promoters32. The Cphy2742 operator is also upstream of the pta gene TSS, suggesting Cphy2742 may similarly activate transcription of the pta-ackA operon as well as the glycolytic gene ppdK and the hydrolase gene cphy0367. Collectively, we propose Cphy2742 represses a comprehensive set of pectin fermentation genes by binding a conserved palindrome at or downstream of their TSS to block transcription. In response to a galacturonic acid-based signal, Cphy2742 de-represses itself and its targets, and may activate transcription of acetate synthesis and other aspects of carbon metabolism by binding upstream of TSS.
Antisense and novel transcripts
Recent studies found 30–40% of TSS are antisense in other bacteria8,9,13. However, antisense transcription appears rare in C. phytofermentans: <1% of TSS were antisense either between (InterA) or within genes (IntraA) (Fig. 2d). To further investigate whether diffuse antisense transcription was underestimated by our TSS thresholds, we classified all mapped read starts, including those not meeting TSS thresholds. Even then, InterA and IntraA classes together comprise <4% reads. This dearth of antisense transcription may relate to the early evolutionary divergence of the Clostridiales33. Alternatively, we would not detect antisense transcripts that were processed to remove 5′-PPP or that are below the 200 bp size threshold of our cDNA libraries, but studies in other bacteria using larger size thresholds found antisense TSS in ∼35% of genes10. While comparatively rare, antisense transcription appears to have important cellular functions. For example, we observed an antisense TSS in the 5′ UTR of the sporulation regulator spoOA (cphy2497) that also opposes transcription of the spoIVB peptidase (cphy2498) (Fig. 6a). This TSS was expressed on all sugars, but not polysaccharides, supporting antisense transcription has a role in repressing sporulation during log growth in sugar-replete conditions.
TSS reveal novel transcriptional features such as a TU downstream of the glycoside hydrolase cphy2658 that is up-regulated to have the strongest initiation site in the genome on cellulose and corn stover (Fig. 6b). This region contains a hypothetical open-reading frame (ORF) in the MaGe annotation (clops3132) that has no similar sequences in Genbank, but the ORF lacks an ribosome binding site (RBS), and we did not detect any expressed peptides from this region by mass spectrometry, suggesting it is a non-coding RNA. The most highly expressed ABC transporter on glucose is a putative operon (cphy2241-3) with a single TSS (Supplementary Fig. 5C,F). On all other carbon sources, we observed repression of cphy2241-3 along with appearance of an upstream, antisense TU (Fig. 6c) that has no mapped peptides or predicted ORF. Non-coding RNA are often associated with ABC transporters in clostridia34, and they may also regulate ABC transport in this organism.
The C. phytofermentans genome may encode significantly more genes than in the NCBI Genbank annotation. Classifying TSS using the MaGe annotation showed 735 (7%) TSS map to MaGe-specific clops genes of unknown function (Supplementary Data 4), including 64 clops genes with InterS TSS. We examined which of these novel TU encode proteins by mapping C. phytofermentans MS/MS peptide spectra to the genome translated in all frames, identifying peptides outside the predicted proteome in 21 InterS, 13 IntraS, 5 InterA and 25 IntraA regions (Supplementary Data 5). The combination of TSS and expressed peptides supports ORFs with N-terminal extensions such as cphy0891 (Supplementary Fig. 10A) and the existence of novel ORFs. For example, clops3461, which overlaps with cphy2929 on the opposite strand (Fig. 6d), and an antisense overlapping ORF in cphy1953 encoding the ComEA competence protein (Supplementary Fig. 10B).
TSS also show mechanisms of RNA-mediated gene regulation. Comparative genomics with other clostridia detected a putative T-box upstream of the C. phytofermentans trp operon34. In low tryptophan conditions, the T-box promotes antitermination of the trp operon by base pairing with uncharged tRNAtrp (ref. 35). We observed transcription halted abruptly in the 5′ UTR of the trp operon in glucose cultures (Fig. 6e), consistent with T-box-mediated repression. In cellulose cultures, antitermination in the T-box enabled trp operon mRNA expression, potentially enabling translation of the trytophan-rich carbohydrate binding modules in cellulases and other CAZymes. TSS also support riboswitches associated with genes for metabolism of flavin mononucleotide (FMN), cobalamin, thiamine pyrophosphate (TPP) and lysine (Supplementary Data 6). For example, C. phytofermentans is auxotrophic for thiamine, which it uptakes by a thiamine transporter, Cphy0729 (ref. 36). The cphy0729 gene has a single, constitutive TSS with an extended 5′ UTR containing a putative TPP-sensing riboswitch (Fig. 6f) that could regulate transporter expression in response to intracellular TPP levels37.
The strategy presented here to quantify condition-specific changes in transcription initiation by Capp-Switch sequencing could be generally applied to dissect the regulation of complex bacterial phenotypes. In this study, we explored the transcriptional programme enabling C. phytofermentans to ferment the cellulosic, hemicellulosic and pectic components of plant biomass. We found that growth on these different carbon sources entailed widespread TSS changes, including use of substrate-specific TSS for genes encoding biomass-degrading enzymes such as cellulases, xylanases and pectinases. Substrate-specific TSS could enable tuning of expression by changing promoters or the regulatory properties (that is, binding sites or secondary structure) of the 5′ UTR. We observed that genes encoding cellulases and other enzymes are simultaneously expressed from more than one TSS. Multiple regulators may control transcription of these genes, reflecting the numerous transcription factors encoded by this organism (Supplementary Data 7). Genes for biomass-degrading enzymes in other Clostridiales are regulated by various transcription factors including a two-component system for hemicellulases38, a LacI/GalR protein for β-1-3 glucanases39 and alternative sigma factors for cellulases40. We defined TSS clusters that were differentially expressed on specific carbon sources and used them to guide the discovery of sequence motifs with potential regulatory function, leading us to identify the LacI/GalR Cphy2742 as a putative regulator of pectin metabolism. Combining TSS mapping with motif searching could be broadly applied to LacI/GalR regulators and other types of transcription factors. For example, each of the 4 TetR regulators for which we detected TSS also have conserved, TSS-associated palindromes that resemble operator sites (Supplementary Fig. 11).
We also gained insight into regulatory mechanisms such as antisense transcription, leaderless transcription and non-coding RNA. We observed that antisense and leaderless transcription are much rarer than reported in other bacteria and it will be interesting to see if they are similarly uncommon in closely-related bacteria. We also show that integration of Capp-Switch TSS mapping with RNA-seq and proteomics enables discovery of novel transcription units and protein-encoding genes. Transcription initiation is a complex and important component of gene regulation for which most of the underlying mechanisms in C. phytofermentans are yet unknown. Further, these results illustrate how little we know about gene regulation in plant-fermenting clostridia, a group of bacteria with important roles in soil and gut microbiomes that have significant potential to serve as biocatalysts for industrial transformation of plant biomass.
C. phytofermentans ISDg (ATCC 700394) was cultured anaerobically at 30 °C in GS2 medium41 containing 5 g l−1 of either D-(+)-glucose (Sigma G5767), D-(+)-xylose (Sigma X3877), D-galacturonic acid sodium salt (Sigma 73960), regenerated amorphous cellulose (RAC) from Avicel PH-101 (Sigma 11365), birchwood xylan (Sigma X0502), apple pectin (HG) (Sigma P8471) or raw corn stover (Qteros Inc) cut in 0.5 × 3.0 cm strips. RAC was prepared by phosphoric acid treatment42. Duplicate cultures were sampled in mid-log phase or after 2 days (RAC) or 3 days (stover). Fermentation products were quantified by HPLC43.
Capp-Switch library preparation
Total RNA was extracted from duplicate cultures for each treatment using TRI reagent (Sigma 93289) and treated with Turbo DNase (Ambion AM2238) at 0.2 U μg−1 RNA for 30 min at 37 °C. RNA was purified by Zymo Concentrator-5 (Zymo Research R1015) (>200 bp capture) into 15 μl water. RNA was 5′ capped using VCE (NEB M2080) at 3 U μg−1 RNA with 0.1 mM SAM and 0.5 mM 3′ biotin-GTP (NEB N0760) for 30 min at 37 °C and purified by Zymo Concentrator-5 (>200 bp capture) with two additional washes into 45 μl water. RNA was fragmented for 30 s at 94 °C using NEBNext Magnesium-based RNA fragmentation buffer (NEB E6101) and purified by Zymo Concentrator-5 (total RNA capture) into 100 μl water. Streptavidin magnetic beads (NEB S1421S) were pre-washed twice with low-salt buffer (10 mM Tris, 50 mM NaCl, 1 mM EDTA), twice with binding buffer (10 mM Tris, 500 mM NaCl, 1 mM EDTA) and resuspended at 4 mg ml−1 beads in binding buffer. Capped RNA fragments were bound to streptavidin beads for 20 min at room temperature and magnetically separated from other RNA by washing twice with binding buffer and twice with low-salt buffer to elute non-bound RNA. Beads were washed once with 1 mM Tris–HCl pH 7.5 and resuspended in 1 mM Tris–HCl pH 7.5.
RNA was converted to single-strand cDNA by SMARTscribe MMLV reverse transcriptase (Clontech 634836) at 10 U μl−1 with 2.5 mM DTT, 1 mM dNTP, 1.2 μM SMARTer stranded oligo and 0.6 μM SMART stranded N6 primer (Clontech 634836) by incubating 90 min at 42 °C and 10 min at 70 °C. Beads were collected and the supernatant was combined with the liquid fraction after the beads were washed with 30 μl 1 mM Tris pH 7.5. The cDNA was twice purified using 1 volume of solid phase reversible immobilization (SPRI) beads (Beckman Coulter A63880). cDNA was left on beads after the second purification and double-stranded cDNA was synthesized by 18 cycles PCR using SeqAmp DNA polymerase (Clontech 638504) with 0.25 μM primers (Universal Forward PCR primer and indexed Reverse PCR primer) and then SPRI purified with 1 volume of beads. DNA was sequenced on Illumina MiSeq with 150 bp paired-end reads chemistry.
TSS identification and classification
Sequencing reads were quality filtered44 and the 3 bp MMLV reverse transcriptase 3′ non-template extension was removed from the 5′ end of forward (R1) reads. Reads were mapped to the C. phytofermentans ISDg genome (NCBI NC_010001.1) using Bowtie 2 (version 2.2.4)45. Alignments showed 87–98% of reads mapped to unique positions in the C. phytofermentans genome, yielding between 0.4 million (corn stover) and 3.4 million (glucose) reads per culture (Supplementary Table 2). TSS were identified using R1 reads by calculating the number of reads starting at each genomic position, clustering read counts within a 5 bp sliding window, and retaining the position with the greatest number of reads. TSS were defined as genome positions with greater than 10 read starts per million reads in both duplicate cultures. Capp-switch TSS were confirmed by 5′ RACE (Sigma 03353621001) using primers in Supplementary Table 3 to amplify PCR products, which were resolved by electrophoresis, excised and sequenced.
Genes in the NCBI and MicroScope (MaGe) annotations46 were used to divide TSS into four categories: InterS (intergenic TSS with downstream gene in same orientation), InterA (intergenic TSS with downstream gene opposite orientation), IntraS (intragenic TSS in gene with same orientation) or IntraA (intragenic TSS in gene with opposite orientation). The InterS TSS with the most reads for each gene was defined as the primary TSS. Capp-Switch results were compared with strand-specific (dUTP) RNA-seq of C. phytofermentans grown in the same culture conditions5. RNA-seq gene expression was calculated as RPKM using the Bioconductor47 package ‘easyRNASeq’ and differential expression was defined as a DESeq48 (version 1.22.1) P-value <0.05 adjusted for multiple testing of the 3,902 genes in C. phytofermentans genome by Bonferroni correction. Peptides corresponding to novel ORFs were identified by mapping peptide MS/MS spectra from glucose, xylan and cellulose cultures4 to the genome translated in all six frames. Peptides were identified from spectra using SEQUEST and filtered to a 5% false discovery rate using a target-decoy approach49,50 including a target database and a decoy of the reversed sequences.
Sequence motifs were identified using MEME51 with a background model of di-nucleotide frequencies in the C. phytofermentans genome. Searches for RNA polymerase binding site motifs included positions 25–50 bp (−35 motif) and 5–20 bp (−10 motif) upstream of all primary TSS expressed on the three sugars and polysaccharides. The top palindromic motifs associated with LacI/GalR and TetR regulators were found by searching sequences from −250 (upstream) to +50 bp (downstream) relative to the start codon of C. phytofermentans genes and their putative orthologs from related genomes identified by top reciprocal BLAST searches (Supplementary Table 4). These motifs were used for genome-wide scans from −250 to +50 bp within all C. phytofermentans genes using MAST52. To cluster TSS by expression, the 1,188 TSS with at least a 30-fold change in read counts between two conditions were log2-transformed and each TSS was normalized to have a median value of 0 across conditions and scaled so the sum of the squared expression levels is 1. TSS were separated into 24 clusters by K-means using the city-block similarity metric. Significant motifs (e<0.001) associated with individual K-means clusters were identified by searching −100 to +10 bp with respect to each TSS.
The authors confirm that all data underlying the findings are fully available without restriction. RNA sequencing files in FASTQ format are available in the European Nucleotide Archive under study accession PRJEB13063.
How to cite this article: Boutard, M. et al. Global repositioning of transcription start sites in a plant-fermenting bacterium. Nat. Commun. 7, 13783 doi: 10.1038/ncomms13783 (2016).
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was funded by a CNRS Chaire d'Excellence to A.C.T. and the Genoscope-CEA. We thank NEB for providing reagents (biotin-GTP, vaccinia capping enzyme and streptavidin beads), the Genoscope-CEA sequencing platform for RNA sequencing and the LABGeM group for supporting the MicroScope (MaGe) annotation resource.
Classification of 9,457 TSS based on the C. phytofermentans ISDg NCBI annotation (NC_010001.1). TSS were categorized as either InterS (3,519 TSS), IntraS (5,721 TSS), InterA (67 TSS), or IntraA (150 TSS). Three highly-expressed TSS associated with hypothetical genes not in the NCBI annotation are included: cphy2658a (WP_041703693), cphy0846a (WP_049762298), and cphy0846b (WP_041703193). Table includes TSS position, strand, class, downstream gene, start position of downstream gene, distance to gene, gene annotation, and average reads between duplicate cultures normalized to one million total reads per carbon source. Distances are from TSS to start for InterS, IntraS, IntraA and from TSS to gene stop for InterA.
Computational prediction of 1,829 C. phytofermentans operons that include 3,630 genes using MicrobesOnline. Table rows show transcription direction and gene members for each operon
TSS members of K-means clusters. Table includes TSS position, strand, and cluster name for clusters shown in Supplementary Fig 7
Classification of 9,457 TSS based on the MaGe annotation. TSS were categorized as either InterS (3,266 TSS), IntraS (5,874 TSS), InterA (55 TSS), or IntraA (262 TSS). Table includes TSS position, strand, class, downstream gene, start position of downstream gene, distance to gene, gene annotation, and average reads between duplicate cultures normalized to one million total reads per carbon source. Distances are from TSS to start codon for InterS, IntraS, IntraA and TSS to stop codon for InterA.
Peptides detected by mass spectrometry mapping outside of predicted proteome. Table shows peptide sequence, translation frame, genome location, number of observances, surrounding genes, and proposed annotation. Peptides were categorized similar to TSS as either intergenic sense, intergenic antisense, intragenic sense, or intragenic antisense
TSS support expression of non-coding RNA. C. phytofermentans (RFAM taxon CP000885.1) encodes 134 non-coding RNA according to the RFAM 12.0 database. Page 1: TSS associated with non-coding RNA: riboswitches, RNase, sRNA sau-50, SRP subunits, and group II introns. Page 2: Translation RNAs predicted by RFAM including 8 rRNA copies (5S, 16S, 23S), a 6S RNA, 61 tRNA, a tmRNA, 15 T-box leaders, and an L10 leader
C. phytofermentans transcription regulator proteins in the InterPro database. Table shows NCBI gene number, Uniprot ID, predicted function, protein length, and InterPro classifications. Pages 1 shows all 309 transcription factors and subsequent pages are classes with more than 10 members: two-component regulators (66), AraC (61), Sigma factors (26), LacI (25), XRE (22), GntR (20), TetR (19), MerR (14), LysR (12), MarR (12).