An expanded transcriptome atlas for Bacteroides thetaiotaomicron reveals a small RNA that modulates tetracycline sensitivity

Ryan, Daniel; Bornet, Elise; Prezza, Gianluca; Alampalli, Shuba Varshini; Franco de Carvalho, Taís; Felchle, Hannah; Ebbecke, Titus; Hayward, Regan J.; Deutschbauer, Adam M.; Barquist, Lars; Westermann, Alexander J.

doi:10.1038/s41564-024-01642-9

Download PDF

Resource
Open access
Published: 25 March 2024

An expanded transcriptome atlas for Bacteroides thetaiotaomicron reveals a small RNA that modulates tetracycline sensitivity

Nature Microbiology volume 9, pages 1130–1144 (2024)Cite this article

4169 Accesses
87 Altmetric
Metrics details

Subjects

Abstract

Plasticity in gene expression allows bacteria to adapt to diverse environments. This is particularly relevant in the dynamic niche of the human intestinal tract; however, transcriptional networks remain largely unknown for gut-resident bacteria. Here we apply differential RNA sequencing (RNA-seq) and conventional RNA-seq to the model gut bacterium Bacteroides thetaiotaomicron to map transcriptional units and profile their expression levels across 15 in vivo-relevant growth conditions. We infer stress- and carbon source-specific transcriptional regulons and expand the annotation of small RNAs (sRNAs). Integrating this expression atlas with published transposon mutant fitness data, we predict conditionally important sRNAs. These include MasB, which downregulates tetracycline tolerance. Using MS2 affinity purification and RNA-seq, we identify a putative MasB target and assess its role in the context of the MasB-associated phenotype. These data—publicly available through the Theta-Base web browser (http://micromix.helmholtz-hiri.de/bacteroides/)—constitute a valuable resource for the microbiome community.

A distinct Fusobacterium nucleatum clade dominates the colorectal cancer niche

Article Open access 20 March 2024

Elucidation of genes enhancing natural product biosynthesis through co-evolution analysis

Article 12 April 2024

Nanopore sequencing technology, bioinformatics and applications

Article 08 November 2021

Main

Bacteria of the Gram-negative Bacteroides genus are universal members of the gut microbiota of healthy human adults¹. These bacteria occupy a hub position in the distal colon, influencing both host physiology and incoming enteric pathogens², and serve as reservoirs of antibiotic resistance genes within the gastrointestinal tract³. Consequently, knowledge of the regulatory mechanisms underlying Bacteroides gene expression can help in the conception of microbiota-centric interventions to correct intestinal disorders.

Our current understanding of transcriptional control mechanisms in Bacteroides species mostly derives from studying their metabolic potential. Encoded on distinct clusters of neighbouring genes, these bacteria harbour numerous polysaccharide utilization loci (PULs)⁴, which allow them to feed on dietary fibre, as well as on host glycans⁵. PULs typically comprise regulatory systems—specific transcriptional regulators and sigma factors encoded within the same locus—that spur PUL transcription when the corresponding carbon source is sensed^6,7,8,9,10. On a higher hierarchical level, a conserved global transcription regulator termed Cur¹¹ coordinates Bacteroides carbohydrate utilization with other cellular processes^12,13.

Complementing protein-mediated transcriptional control, bacteria universally employ small RNAs (sRNAs) that post-transcriptionally modulate gene expression via binding to complementary sequences within target messenger RNAs (mRNAs)¹⁴. While individual members of the Bacteroides genus are known to encode hundreds of sRNAs^15,16, a primary bottleneck is that the vast majority of them do not yet have a known molecular function. Previously, we established Theta-Base¹⁶, a transcriptome database for Bacteroides thetaiotaomicron, which features the growth phase-dependent expression of sRNAs. However, these data were solely derived from experiments in nutrient-rich laboratory medium that falls short of reflecting in vivo-relevant conditions, composed of diverse stresses and nutritional variation. Besides, genome-wide phenotypic screens in Bacteroides species have so far been restricted to the analysis of mutations within coding genes^17,18,19, yet knowledge of fitness-contributing noncoding genes could help to prioritize RNAs for functional studies in these health-relevant bacteria. To date, only few Bacteroides sRNAs have been partially characterized^15,16,20, yet inactivation of none of them has been associated with a robust fitness phenotype, obscuring their importance for Bacteroides’ physiology.

In this Resource, we dissect global gene expression signatures in B. thetaiotaomicron type strain VPI-5482 under a range of host niche-related stresses and during growth on defined carbon sources. From the resulting transcriptomic compendium, we infer stress- and carbon source-specific gene expression patterns and identify noncoding RNAs. In an integrative approach, we use gene expression and mutant fitness data to link individual sRNAs to specific cellular processes. To demonstrate the value of our combined transcriptomics and functional genomics data, we focus on the previously uncharacterized sRNA MasB (BTnc201). Our findings assign MasB to the Cur regulon and suggest that this sRNA is a post-transcriptional regulator of a conserved tetratricopeptide protein, with phenotypic consequences when B. thetaiotaomicron is exposed to translation-blocking compounds.

Results

B. thetaiotaomicron transcriptome annotation

To expand the transcriptome annotation of B. thetaiotaomicron, we compiled a suite of in vitro conditions that mimic specific aspects of this bacterium’s host niche (Fig. 1a and Supplementary Table 1). The large intestine exerts selective pressure on colonizing bacteria in the form of fluctuating pH levels, heterogeneous oxygen tension and the presence of secreted antimicrobial peptides and bile salts^21,22,23,24. Consequently, our suite of stress conditions included moderate acidic pH, aerobic shaking, exposure to hydrogen peroxide, bile acids (deoxycholate or a bile salt mixture) and the antibiotic gentamicin (to which Bacteroides species are naturally resistant) and increased or decreased temperature (Extended Data Fig. 1a–d). To reflect metabolic fluctuations associated with the gastrointestinal tract, bacteria were grown in minimal medium supplemented with defined simple sugars (glucose, arabinose, xylose, maltose, N-acetyl-d-glucosamine (GlcNAc)) or porcine mucin glycans (Extended Data Fig. 1e,f), or they were nutrient deprived in minimal medium lacking a carbon source. Total RNA was extracted from the respective cultures and either pooled and analysed via differential RNA sequencing (dRNA-seq) for comprehensive transcription start site (TSS) mapping²⁵ or sequenced separately via conventional RNA-seq to profile conditional gene expression. In all cases, library preparation was generic, resulting in the detection of both protein-coding and noncoding transcripts.

**Fig. 1: Comprehensive transcriptome annotation and gene expression profiling of B. *thetaiotaomicron*.**

The pooled complementary DNA (cDNA) sample was sequenced to ~40 million reads (that is, twice the depth that was previously considered to be sufficient to annotate the transcriptome of Salmonella enterica to saturation²⁶). We analysed the resulting data using the ANNOgesic pipeline²⁷ and collectively mapped the position of 4,123 TSSs across the B. thetaiotaomicron chromosome and plasmid. Comparing these results with our previously mapped TSSs, when B. thetaiotaomicron grew in rich TYG medium in the early or mid-exponential and stationary phase¹⁶, we found 252 unique TSS annotations contributed by the 15-condition pool (Fig. 1b, Extended Data Fig. 2a and Supplementary Table 2). Likewise, the number of transcription termination sites—predicted by a combination of read coverage drop and likelihood to fold into an intrinsic terminator hairpin (see Methods)—increased by 86 (Extended Data Fig. 2b). We also updated the annotation of operon structure predictions (Extended Data Fig. 2c). To better interpret Bacteroides transcriptomic features, we integrated our refined transcript boundary annotations with a map of invertible DNA regions (invertons) obtained by application of the PhaseFinder software²⁸ to the B. thetaiotaomicron genome. Of the resulting 1,997 inverted repeats, 569 contained potential promoters (that is, involved sequences within a 50-base pair (bp) window upstream of a mapped TSS) and may represent sites contributing to Bacteroides phase-variable transcription initiation.

Bacteroides stress response signatures

The conventional RNA-seq libraries were sequenced to 10–15 million reads per sample, as per general guidelines for bacterial differential expression analysis²⁹. Of the 5,442 coding sequences in the B. thetaiotaomicron genome, 5,137 (94.4%) were expressed (more than ten reads per sample) under at least one of the 15 experimental conditions. Biological replicate samples clustered closely (Fig. 1c), indicating the absence of major batch effects. To aid in interpretation of the gene expression data, we compiled functional information by merging Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology term annotations with manually collated gene sets and regulons retrieved from literature research (see Methods and Supplementary Table 3).

Pairwise comparisons of brief stress exposures to an unstressed control sample (Supplementary Table 4) and an ensuing gene set enrichment analysis (Extended Data Fig. 3a and Supplementary Table 5) revealed Bacteroides transcriptomic responses to environmental cues. Heat, mild acidic pH and aerobic exposure triggered only few, yet specific expression changes (Fig. 2a and Extended Data Fig. 5b–d), whereas substantial transcriptomic reprogramming was observed when bacteria faced cold, bile or sub-lethal antibiotic stress (Fig. 2a,c and Extended Data Fig. 5a,f). Brief exposure to hydrogen peroxide did not induce any significant expression changes, probably because the selected concentration (480 µM) was relatively low. Generally, stress-specific marker genes inferred from the literature showed the anticipated alterations (Supplementary Text). In the case of the BT_2792–BT_2795 operon that encodes a bile salt tolerance-conveying efflux pump¹⁹, we observed a TSS and alternative start codon 60 nucleotides downstream of the annotated one (red and black ATG sequence in Fig. 2d). Since the previously annotated amino terminus was not supported by any sequencing reads (Fig. 2d, bottom), we re-annotated BT_2795 accordingly.

**Fig. 2: Stress response- and carbon source-specific expression of B. *thetaiotaomicron* genes.**

Bacteroides genomes harbour multiple capsular polysaccharide (CPS) loci, allowing these bacteria to alter their surface structure with consequences for the evasion of host immunity and phage attack³⁰. Invertible promoters—included in our inverton map—result in phase-variable expression of certain CPS loci³¹. Of the eight CPSs of B. thetaiotaomicron, CPS4 and, to lesser extent, CPS3 were dominant during in vitro growth (Extended Data Fig. 4), recapitulating previous findings³². The relative CPS dominance changed (from CPS4 to CPS3) when bacteria were exposed to the secondary bile acid deoxycholate (Fig. 2b and Extended Data Fig. 6a). Exposure to a bile salt mixture, gentamicin or the cold led to a partial induction of CPS3. Interestingly, the updated Theta-Base annotation features four predicted cps3 sub-operons, only the first of which was upregulated under these conditions (operon 65 in Extended Data Fig. 6b), which we interpret as a snapshot during a gradual shift from CPS4 to CPS3 expression—as opposed to an outright CPS switch—during certain stresses.

Carbon source-specific gene expression patterns

To dissect B. thetaiotaomicron metabolic programs, we calculated differential gene expression upon growth in the different carbon sources relative to cultures feeding on glucose (Extended Data Fig. 5 and Supplementary Table 4) and again determined enriched gene sets (Extended Data Fig. 3b and Supplementary Table 5). The number of significantly differentially expressed genes (log₂[fold change (FC)] < −2 or > +2 and false discovery rate (FDR) < 0.05) tended to increase with the complexity of the carbon sources (Fig. 2a). Generally, PUL expression responded to known substrates, adding further confidence to our dataset (Fig. 2b and Supplementary Text).

When polymeric mucin was the sole carbon source, the high mannose mammalian N-glycan utilization system PUL72 (average log₂[FC] = 5; ref. ³³) and host-derived mucin O-glycan-processing systems PUL14, PUL78 and PUL80 (mean log₂[FC] = 7, 5.2 and 10.5, respectively^7,34,35) were strongly induced (Fig. 2b,e). PUL62, whose inducer is currently not known, was also upregulated (average log₂[FC] = 2.5; Fig. 2b), suggesting that this PUL responds to mucin-derived glycans. Overall, a substantial fraction of the mucin-activated genes identified here overlapped with genes previously found to be upregulated in B. thetaiotaomicron growing in vitro on a glycan mixture prepared from the porcine gastric mucosa⁷ or colonizing the outer mucous layer of C57BL/6 mice³⁶ (Fig. 2f and Supplementary Table 6). In addition, we noticed an overlap between the set of mucin-regulated genes and genes differentially expressed in the presence of its constituent monosaccharide, GlcNAc (Extended Data Fig. 6c,d). Induction of the 83 common PUL-associated genes was generally more pronounced in mucin than in GlcNAc, whereas expression of the 81 common non-PUL genes was at rather similar levels (Extended Data Fig. 6e,f and Supplementary Table 7). This would be in line with a model wherein GlcNAc is less repressive towards basal mucin PUL expression than other, unrelated simple sugars, leading to comparably small PUL expression changes during growth on GlcNAc, while growth on polymeric mucin induces larger changes in specific PULs.

Inspection of known members of the regulon governed by the transcriptional master regulator of carbohydrate utilization, Cur¹³, indicated major gene expression changes, particularly when bacteria consumed mucin or were starved (Extended Data Figs. 4 and 5l,m), in accordance with previous reports^12,13. For instance, expression of fusA2 (BT_2167), which is an established Cur target¹³ and encodes the alternative translation elongation factor G2 (EF-G2), was induced in mucin and peaked during carbon deprivation (Fig. 2b). The inverse expression pattern was observed for fusA (BT_2729) (Fig. 2b), encoding the canonical EF-G and not belonging to the Cur regulon¹³. This corroborates previous reports^12,13,37 and further supports that B. thetaiotaomicron utilizes a distinct protein synthesis machinery during colonization of a sugar-deprived host niche. In summary, the combined data corroborate former reports, but also extend our collective knowledge of conditional gene expression in B. thetaiotaomicron.

Conditional expression of noncoding genes

Unlike the situation for B. thetaiotaomicron protein-coding genes, there is hardly any information in the literature with respect to stress- and metabolism-related expression of noncoding genes of this bacterium. Interestingly, and in contrast with the relatively small fraction of additional TSSs and transcription termination sites gained from the pooled dRNA-seq experiment (Fig. 1b and Extended Data Fig. 2b), the extended dataset resulted in a substantial increase in the number of B. thetaiotaomicron noncoding RNA candidates (Fig. 3a). For example, following manual curation (see Methods), we confidently predict 135 intergenic sRNAs, 44 of which were identified in this study.

**Fig. 3: *Bacteroides* fitness-influencing sRNAs.**

Generally, the basal abundance levels of sRNAs varied substantially and we observed several differentially expressed candidates across our conditional dataset (Fig. 3b and Extended Data Fig. 7a). Expression of the established sRNA GibS, for example, was upregulated in GlcNAc (as previously reported¹⁶), but peaked when bacteria fed on mucin or were starved. Using northern blotting, we validated the expression of seven of the here-predicted sRNAs (Fig. 3c). This included the acid-induced expression of BTnc207, the starvation-specific accumulation of BTnc302 and the downregulation of BTnc311 and BTnc325 during bile stress.

A previous study discovered a family of antisense RNAs (asRNAs) divergently encoded to PUL operons¹⁵. The observed anti-correlation in expression between several of these antisense–sense pairs suggested that asRNAs repress their cognate PUL—a mechanism validated exemplarily for one PUL-associated asRNA in Bacteroides fragilis¹⁵. In B. thetaiotaomicron, we recently annotated ten PUL-associated asRNAs¹⁶; the present data further increased this number to 32 (Fig. 3a) and we validated two of the predicted candidates by northern blotting (Extended Data Fig. 7b). What is more, the comprehensive metabolic expression data now allowed us to probe this anti-correlation phenomenon on a more global scale. We again found examples where an asRNA’s expression inversely mirrored that of its cognate PUL operon, but also observed counter-examples of positive correlation in individual asRNA–PUL pairs (Extended Data Fig. 7c). In other words, the extended transcriptomic data revealed a more nuanced picture of PUL-associated asRNAs than was anticipated and further enhance the need for functional characterization of this specialized class of noncoding RNAs.

Phenotypes associated with Bacteroides sRNA inactivation

To provide support for the involvement of individual noncoding RNAs in specific cellular processes, we reanalysed an existing high-throughput transposon insertion sequencing (TIS) dataset from B. thetaiotaomicron grown under 490 defined experimental conditions¹⁹. For practical reasons, we focus here on standalone sRNA genes, encoded as independent transcriptional units and not overlapping with other genetic features. Of these 135 intergenic sRNAs, 81 were represented in the transposon mutant library (Fig. 3d and Extended Data Fig. 8a). Mutants of 28 sRNAs exhibited a statistically significant fitness change (|t| > 4) compared with the other mutants in the pool in at least one successful experiment (as defined in ref. ¹⁹; that is, an experiment in which a gene is represented by a sufficient number of barcode counts) (Fig. 3d,e and Supplementary Table 8). The majority of intergenic sRNAs affecting fitness showed a condition-specific phenotype when disrupted (Fig. 3d). However, in the case of a handful of sRNAs, disruption resulted in broader competitive fitness changes. In the following, we focused on the MasB sRNA, whose inactivation led to the highest predicted number of significant fitness phenotypes among all intergenic sRNA mutants.

MasB confers antibiotics susceptibility

MasB (previously BTnc201; renamed here for reasons to follow) is a roughly 100-nucleotide-long, narrowly conserved sRNA³⁸. Relative to the transcripts from its flanking genes, MasB accumulated to high steady-state levels under all of the experimental conditions tested here, but peaked when bacteria were starved (Fig. 4a). The TIS analysis suggested that masB disruption promotes growth upon exposure to diverse antibiotics and antimicrobials (Fig. 3e). This included enhanced fitness of bacteria with mutated masB during exposure to tetracycline derivatives (oxytetracycline and doxycycline hyclate). We confirmed this phenotype using a clean deletion mutant of this sRNA and increasing concentrations of doxycycline, both during growth in liquid medium (Fig. 4b and Extended Data Fig. 8b) and on solid agar (Fig. 4c). Similar effects were observed when exposing the strains to conventional tetracycline (Fig. 4b,c and Extended Data Fig. 8c). MasB expression did not respond to antibiotic exposure (Extended Data Fig. 8d), yet the associated fitness effects were stress specific, as the mutant grew indistinguishably from an isogenic wild-type strain in vehicle-treated control cultures (Fig. 4b). Based on these results, we concluded that MasB confers Bacteroides sensitivity to ribosome-targeting antibiotics of the tetracycline family. We hence name this sRNA MasB, for modulator of antibiotics susceptibility in Bacteroides.

**Fig. 4: Genetic depletion of *masB* confers B. *thetaiotaomicron* enhanced tolerance of doxycycline and tetracycline.**

Assignment of MasB to the Cur regulon

Our transcriptomics dataset lends itself for co-expression analysis to obtain insight into cellular regulatory circuits. Here, as an illustrative use case, we performed co-expression analysis for MasB (Fig. 5a), which grouped the sRNA among genes that are activated by Cur—the transcriptional master regulator of carbohydrate utilization in B. thetaiotaomicon^12,13. This prompted us to explore the hypothesis that transcription of MasB might also be governed by Cur. Indeed, closer inspection of publicly available chromatin immunoprecipitation and sequencing data revealed that one of the most significant Cur binding sites in the B. thetaiotaomicron chromosome is located upstream of the masB gene (peak ID #598 in ref. ¹³). To assess the impact of this transcription factor on MasB expression, we deleted cur from the chromosome. During carbon source deprivation (that is, the condition when Cur activation is maximal (Fig. 5a)), the level of MasB was more than twofold decreased in the ∆cur strain compared with the wild-type strain (Fig. 5b). This effect could be complemented in trans. We conclude from these experiments that Cur acts as a transcriptional activator of the MasB sRNA.

**Fig. 5: MasB is assigned to the Cur regulon and targets *BT_1675* mRNA.**

MAPS predicts BT_1675 as a direct MasB target

To search for MasB targets, we established MS2 affinity purification and sequencing (MAPS) technology³⁹ in B. thetaiotaomicron (Fig. 5c, Extended Data Fig. 9 and Supplementary Text). The top-enriched transcripts in the MS2-MasB co-purifications relative to the untagged background control were BT_1675, encoding a conserved tetratricopeptide domain protein, and the fusA2 mRNA that encodes the alternative ribosomal factor EF-G2 (Fig. 5d and Extended Data Fig. 10a). To narrow in on potential targeting regions, we applied in silico prediction of partially complementary sequences between the sRNA and its presumed targets using the IntaRNA algorithm^40,41. For fusA2, no convincing binding site was found; however, in silico prediction suggested that the BT_1675 mRNA might be targeted ~40 nucleotides downstream of its translation start codon (Extended Data Fig. 10b). Electrophoretic mobility shift assays (EMSAs) confirmed MasB binding to the 5′ region of BT_1675 mRNA in vitro (Fig. 5e). Notably, sequence mutations in the predicted interaction sites of MasB (positions 41–50 and 69–74 relative to the TSS) abrogated target binding, yet the interaction was partially restored with a compensatory mutant of BT_1675 (Fig. 5e).

To test for an effect of MasB on the steady-state levels of BT_1675 mRNA, we grew B. thetaiotaomicron wild-type, ∆masB and masB⁺ (a corresponding trans-complementation strain) cultures in TYG or starved them for 2 h in minimal medium. We then extracted total RNA and subjected the samples to northern blot and quantitative reverse-transcription PCR (qRT-PCR) analyses (Extended Data Fig. 10c). The BT_1675 mRNA level was around fivefold de-repressed in the absence of MasB, yet exclusively so in mid- and late exponentially growing bacteria. This suggests that the exponential phase is the critical time window when MasB exerts a negative effect on BT_1675 and reflects the observed growth phenotype of the sRNA deletion mutant (Fig. 4b). Trans-complementation of MasB reverted BT_1675 expression to near wild-type levels (Extended Data Fig. 10c). To support these findings, we harnessed a dual-plasmid fluorescence reporter assay^42,43. The 5′ region of BT_1675 (encompassing the predicted MasB binding site) was translationally fused to the coding sequence of superfolder green fluorescent protein (GFP) and the resulting construct transformed into Escherichia coli as a heterologous host. Co-expression of the wild-type (but not mutated; Fig. 5e, top) MasB led to a decrease of GFP intensity to ~70% compared with an unrelated control RNA (Fig. 5f and Extended Data Fig. 10d). In contrast, only the MasB variant harbouring the respective compensatory mutations was able to repress the mutated BT_1675-GFP variant in the same assay. Based on these data, we conclude that MasB inhibits the expression of the conserved tetratricopeptide domain protein BT_1675, probably through direct binding to the 5′ region of the corresponding mRNA.

Antibiotic phenotype in light of the MasB regulatory axis

Lastly, we set out to evaluate the MasB-associated antibiotic phenotype in the context of its identified regulator (Cur) and target (BT_1675). We constructed a deletion mutant of BT_1675 and—to test for epistatic effects—combined the single masB and single cur deletions (∆cur∆masB). Importantly, none of the mutants had a growth defect in rich medium (Extended Data Fig. 10e). When subjected to doxycycline susceptibility testing (Fig. 5g and Supplementary Table 9), ∆cur bacteria tended to phenocopy the enhanced antibiotic tolerance observed for the ∆masB mutant, whereas trans-complementation of Cur reverted the susceptibility to that of the isogenic wild type. Surprisingly, the ∆cur∆masB double mutant was severely affected—yet in the opposite direction to the respective single deletions—and failed to grow on doxycycline-containing plates altogether, which warrants further investigation. Deletion of the MasB target BT_1675 had only a subtle influence on doxycycline susceptibility. Collectively, this work illustrates the synergy between transcriptomics and functional genomics for the discovery of phenotypes associated with bacterial noncoding RNAs.

Discussion

Transcriptomics has proven invaluable for our understanding of the molecular basis of Bacteroides’ activities in the mammalian intestine^36,44,45,46. Pinpointing the precise triggering factors that induce the expression of certain gene sets is important to understand the underlying regulatory networks and to obtain a molecular handle to rationally interfere with these processes for the benefit of the human host. However, disentangling in vivo gene expression patterns becomes complicated by the multitude of overlapping stimuli that the microbes are simultaneously exposed to in their natural habitat. To decompose host-adapted bacterial gene expression, we here reconstituted specific responses to defined environmental cues in vitro. Complementation of the corresponding transcriptomic data with functional sRNA genomics suggested specific phenotypes for 28 B. thetaiotaomicron sRNAs as starting points for targeted follow-up studies.

The predictive power of this integrative approach is exemplified by our findings on the previously uncharacterized sRNA MasB. MasB is transcriptionally activated by the master regulator Cur and, in turn, post-transcriptionally represses the mRNA for the hypothetical tetratricopeptide repeat protein BT_1675, whose function is currently unknown (Fig. 5h). A co-expression analysis approach revealed a strong positive correlation (Pearson’s r = 0.91) between BT_1675 and the Gene Ontology term ‘unfolded protein binding’, comprising protein chaperones such as DnaK and GroEL (Extended Data Fig. 10f). Of interest in the present context, certain protein chaperones have the ability to stabilize resistance-conferring amino acid substitutions in drug targets⁴⁷, suggesting that BT_1675 could play a role in the maintenance of antimicrobial resistance.

In summary, our data highlight the relevance of MasB for antibiotic sensitivity in a major human gut commensal. More generally, our study emphasizes the power of combining bacterial expression atlases with additional data modalities. Building on a state-of-the-art visualization tool, Theta-Base 2.0 allows easy and intuitive interaction with our diverse datasets and constitutes a much-needed resource for the microbiome research community.

Methods

Bacterial culture conditions

Bacteroides strains were routinely cultured in an anaerobic chamber (Coy Laboratory Products) with an anaerobic gas mix (85% N₂, 10% CO₂ and 5% H₂) at 37 °C. Routine cultivation of all strains was performed in TYG medium and on Brain Heart Infusion Supplemented (BHIS) plates. For a detailed description of media composition and culture conditions for the RNA-seq analysis, refer to Supplementary Table 1.

Growth assays in the presence of diverse carbon sources were carried out in minimal medium supplemented with 0.5% of a suitable carbon source, as follows. A single colony of wild-type B. thetaiotaomicron VPI-5482 (AWS-001) was inoculated into 5 ml minimal medium–glucose and incubated anaerobically for 24 h. Then, 1 ml of this culture was centrifuged (2,000g for 3 min) to pellet bacterial cells that were resuspended in an equal volume of minimal medium (without a carbon source). This was subsequently used to inoculate (1:100 dilution) minimal medium containing an appropriate carbon source and incubated for the indicated time, following which aliquots (optical density equivalents of ~4) were collected for RNA extraction.

Stress response assays were performed in TYG medium as indicated below. A single colony of AWS-001 was inoculated into 5 ml TYG medium and incubated anaerobically overnight. The next day, it was sub-cultured into TYG medium and grown to the mid-exponential phase (~7 h; OD₆₀₀ = 2.0). This culture was sub-divided into 5 ml fractions corresponding to each stress condition and centrifuged to pellet the bacterial cells, as before. The pellet was resuspended in an equal volume of TYG medium containing the indicated concentration of a stressor and incubated for a further 2 h, following which samples were collected for RNA extraction.

Bacterial genetics

A detailed list of the strains, plasmids and oligonucleotides used in this study can be found in Supplementary Table 10. To create ∆masB, we employed a previously established method⁴⁸. To this end, we assembled 1-kilobase sequences flanking the deletion site into the pExchange-tdk suicide vector and introduced this construct into E. coli S17-1 λpir. The resulting transformants were mated with B. thetaiotaomicron Δtdk (AWS-003) and the resulting conjugants were selected on 5-fluoro-2′-deoxyuridine plates. Single recombinants were isolated on BHIS agar with 200 μg ml⁻¹ gentamicin and 25 μg ml⁻¹ erythromycin (BHIS^gent/erm). Double recombinants, leading to either scarless deletion mutants or wild-type revertants, were identified by their ability to grow on BHIS agar with 200 μg ml⁻¹ 5-fluoro-2′-deoxyuridine while being unable to grow on BHIS agar with 25 μg ml⁻¹ erythromycin. The masB complementation strain (masB⁺) was assembled using a version of the pNBU2 vector system, as previously described⁴⁸. The complete masB gene sequence was integrated into pNBU2 using Gibson Assembly (New England Biolabs (NEB)) to ensure transcription from the native TSS. This construct was conjugated into the ∆masB strain via E. coli S17-1 λpir, as described above.

The other deletion mutants (∆cur, ∆1675 and ∆cur∆masB) were generated using the pSIE1 plasmid system⁴⁹. Briefly, 750-nucleotide flanking regions around the deletion site were Gibson assembled (NEB) into the linearized pSIE1 plasmid (SpeI and BamHI digested). The assembled plasmid was subsequently introduced into B. thetaiotaomicron via conjugation with E. coli S17-1 λpir and the conjugants were streaked onto BHIS^gent/erm plates. Resistant colonies were cultured overnight in TYG medium without antibiotics, and dilutions (10⁻¹ to −3) were plated onto BHIS agar with 100 ng ml⁻¹ anhydrotetracycline (aTC). Colony PCR and Sanger sequencing were used to confirm the intended deletions. The cur⁺ complementation strain resulted from Gibson Assembly (NEB) of full-length cur with pWW3452 (AWP-015) such that transcription was under the control of the phage promoter on the plasmid. It was ensured that the 3′ end of the transcript maintained a reading frame with the downstream FLAG and His tags. The construct was conjugated into the ∆cur background as described above.

Total RNA purification and removal of genomic DNA

All of the RNA-seq samples were collected as biological duplicates. Total RNA was isolated by the hot-phenol method, as follows. Briefly, bacterial cultures containing a total of ~4 OD equivalents of cells were collected and a one-fifth volume of stop mix was added (5% vol vol⁻¹ water-saturated phenol; pH > 7.0; 95% vol vol⁻¹ ethanol)⁵⁰. Cell lysis was achieved by incubation with lysozyme (600 µl; 0.5 mg ml⁻¹) and sodium dodecyl sulfate (60 µl of a 10% solution) for 2 min at 64 °C with the subsequent addition of NaOAc (66 µl of a 3 M solution). Extraction with 750 μl phenol (ROTIAqua-Phenol) was carried out at 64 °C for 6 min, followed by the addition of 750 μl chloroform. Precipitation of RNA from the aqueous phase was achieved with twice the volume of ethanol and 3 M NaOAc (30:1) mix and incubated at −80 °C overnight. The samples were then centrifuged and the pellets washed with ethanol (75% vol vol⁻¹), followed by resuspension in 50 µl RNase-free water. Traces of genomic DNA were removed by treating ~40 µg total RNA with 5 U DNase I (Fermentas) and 0.5 µl SUPERase·In RNase Inhibitor (Ambion) in a reaction volume of 50 µl. Samples for dRNA-seq were prepared by pooling equimolar amounts (each 100 ng) of total RNA from each condition.

cDNA library preparation and sequencing

For dRNA-seq, samples were treated according to a previous protocol¹⁶. Before synthesizing cDNA, pooled total RNA was fragmented by ultrasound (four pulses of 30 s each at 4 °C) and then treated with T4 Polynucleotide Kinase (NEB). The RNA sample was then split evenly, one half of which was treated with Terminator Exonuclease to enrich for primary transcripts, whereas the other half remained untreated. The samples were then poly(A)-tailed using poly(A) polymerase and the 5′-PPP removed using 5′ polyphosphatase (Epicentre Biotechnologies). RNA adaptors were ligated and the synthesis of first-strand cDNA was performed using M-MLV reverse-transcriptase and oligo(dT) primers. The cDNA was subsequently amplified to a concentration of ~10–20 ng µl⁻¹, purified (Agencourt AMPure XP kit; Beckman Coulter Genomics) and fractionated in a size range of 200–600 bp. The libraries were deep-sequenced on an Illumina NextSeq 500 system using 75 bp read length at Vertis Biotechnologie.

For conventional RNA-seq, samples were first depleted of ribosomal RNA (rRNA) using the Pan-Prokaryote riboPOOL kit (siTOOLs Biotech). This involved incubation of 1 µg total RNA with 100 pmol rRNA-specific biotinylated DNA probes at 68 °C for 10 min, followed by a shift to 37 °C for 30 min in 0.25 mM ethylenediaminetetraacetic acid (EDTA), 2.5 mM Tris-HCl (pH 7.5) and 500 mM NaCl. Depletion of rRNA–DNA hybrids was achieved by two 15-min incubation periods with streptavidin-coated magnetic Dynabeads MyOne Streptavidin C1 beads (0.45 mg; Thermo Fisher Scientific) in 0.25 mM EDTA, 2.5 mM Tris-HCl (pH 7.5) and 1 M NaCl at 37 °C. The samples were then purified using the Zymo RNA Clean & Concentrator kit along with DNase I treatment (Zymo Research). Libraries were prepared with the NEBNext Multiplex Small RNA Library Prep kit for Illumina (NEB) according to the manufacturer’s instructions and the following modifications. Samples were fragmented at 94 °C for 2.75 min, per the NEBNext Magnesium RNA Fragmentation Module (NEB) with subsequent RNA purification using the Zymo RNA Clean & Concentrator kit. The fragmented RNA was then 3′ dephosphorylated, 5′ phosphorylated and decapped with 10 U T4 Polynucleotide Kinase ± 40 nmol ATP and 5 U RNA 5′ Pyrophosphohydrolase (NEB). After each step, RNA was purified as mentioned above. The fragmented RNA was then ligated to adapters (3′ SR and 5′ SR, pre-diluted 1:3 in nuclease-free water) and the cDNA was amplified for 14 cycles. These barcoded libraries were purified using MagSi-NGSPREP Plus beads (AMSBIO) at a 1.8:1 ratio of beads to sample volume. Libraries were checked for quantity and quality using a Qubit 3.0 Fluometer (Thermo Fisher Scientific) and a 2100 Bioanalyzer with the High Sensitivity DNA kit (Agilent). Pooled libraries were sequenced on the NextSeq 500 platform (Illumina) at the Core Unit SysMed of the University of Würzburg.

Read processing and mapping

Generated reads were quality checked using FastQC (version 0.11.8) and adapters were trimmed using Cutadapt (version 1.16) with Python (version 3.6.6), using the following parameters: -j 6 -a Illumina Read 1 adapter=AAGATCGGAAGAGCACACGTCTGAACTCCAGTCA -a Poly A=AAAAAAAAAAA --output=out1.fq.gz --error-rate=0.1 --times=1 --overlap=3 --minimum-length=20 --nextseq-trim=20 3_1. For both sequencing data types (dRNA-seq and conventional RNA-seq), READemption⁵¹ (version 0.4.5) was used to map reads to the B. thetaiotaomicron VPI-5482 reference genome (NC_004663.1) and plasmid (NC_004703.1). Details of the alignment statistics can be found in Supplementary Table 11.

Transcriptome annotation

We used the ANNOgesic pipeline²⁷ (version 0.7.33) to update the annotations of TSSs, terminators, operons and noncoding RNAs, as previously described¹⁶. In short, TSSs were identified using the TSSpredator function, which compares the relative enrichment of reads between Terminator Exonuclease-treated and untreated libraries to identify enriched peaks that are characteristic of the protected 5′ ends of primary transcripts. TSSs were classified on the basis of this enrichment and distance, relative to a coding gene. Primary TSSs were identified as having the highest enrichment within a 300-bp region upstream of an open reading frame. All other TSSs within this region were classified as secondary TSSs. Internal TSSs were defined as originating on the sense strand within a coding sequence, whereas antisense TSSs were those that originated on the antisense strand and overlapped with or within 100-bp flanks of a sense gene. All remaining TSSs were classified as orphan TSSs. To predict terminators, ANNOgesic utilizes two heuristic algorithms, one of which scans the genome for Rho-independent terminators (TransTermHP) and the other predicts terminators by detecting a decrease in read coverage between two adjacent genes. Leveraging the wealth of our diverse conditional datasets, we additionally utilized the operon detection function of ANNOgesic (default settings) to predict both operons and sub-operons based on the other detected features; namely, TSSs, transcripts and genes supplied as general feature format (GFF) files.

The automated prediction of noncoding RNAs was done using the srna function of ANNOgesic, which first compares predicted transcripts with known RNAs within the sRNA database (all sequences were downloaded from BSRD⁵²) and the non-redundant protein database (ftp://ftp.ncbi.nih.gov/blast/db/FASTA/). Candidates that were contained in the sRNA database were retained, while those contained in the non-redundant protein database were excluded from further analysis. All remaining transcripts were classified as intergenic sRNA candidates if they possessed a defined TSS, stable secondary structure (RNAfold normalized folding energy < −0.05) and length within 30–500 nucleotides and did not overlap with any other genetic feature in either sense or antisense orientation. The prediction of cis-asRNAs was based on the same criteria along with the presence of an annotated gene on the opposing strand. Untranslated region (UTR)-derived sRNAs were classified as 5′ if they shared the TSS with an mRNA and were associated with a read coverage drop or processing site in front of the coding sequence. Similarly, 3′ UTR-derived sRNAs were predicted by either a TSS or processing site in the 3′ UTR and either a processing site or terminator shared with an mRNA. Intra-operonic sRNAs were associated with a TSS or processing site at the 5′ end and a read coverage drop or processing site at the 3′ end.

As with all computationally automated predictions, manual curation was necessary to ensure the accuracy of the global annotations. Manual confirmation of sRNA predictions was guided by the following criteria for excluding putative sRNA annotations: (1) lack of an identifiable promoter or processing site up to 50 bp upstream of a predicted sRNA’s 5′ end; (2) complete overlap with an annotated mRNA in the most recent genome update on the National Center for Biotechnology Information’s Nucleotide database (NC_004663.1; 21 August 2022); (3) complete overlap with an annotated terminator sequence; (4) overlap with cis-regulatory elements such as riboswitches or RNA thermometers; and (5) no evident change in read coverage relative to flanking regions. As a result, we report a total of 135 intergenic sRNAs in B. thetaiotaomicron. This includes an overlap of 91 sRNAs previously identified¹⁶, as well as 44 novel sRNA candidates. In total, 20 candidates previously annotated as intergenic sRNAs¹⁶ were reassigned here to different sub-classes (Supplementary Table 12): 11 were re-annotated as asRNAs (as a divergently encoded genetic feature was discovered in the present data), three were re-annotated as 3′ UTR-derived sRNAs (due to their overlap with the 3′ end of an mRNA) and six were re-annotated as 5′ UTR-derived sRNAs (as the present dataset allowed us to refine TSS annotations). Additionally, 14 previously annotated sRNA candidates¹⁶ were eliminated as they were very weakly expressed in the former dataset and their existence was not further supported by the present dataset, despite a greater sequencing depth.

Updated annotations of the TSSs, terminators, operons and noncoding RNAs can be accessed via Theta-Base 2.0 (http://micromix.helmholtz-hiri.de/bacteroides/). Coding genes and sRNAs can be interrogated using either their ID (BT_xxxx or BTncxxx) or—when applicable—their trivial name (for example, MasB).

Prediction of invertible DNA regions

Invertible DNA regions were predicted using the PhaseFinder -locate pipeline as described in ref. ²⁸. Briefly, inverted repeats were determined by allowing no mismatches for repeats of a maximum of 11 bp, one mismatch for repeats up to 13 bp and two mismatches for repeats with lengths exceeding 19 bp. Homopolymeric inverted repeats were removed and the maximum GC content per inverted repeat was filtered to be between 15 and 85%.

Differential gene expression analysis

Differential gene expression analysis was performed using the R package edgeR (version 3.38.2)^53,54. Using the filterByExpr function, the genes with a counts per million (CPM) value of >0.6635 (equivalent to around ten reads per sample) across all replicates under each growth condition (median library size of ~15 million reads) were retained for differential analysis. While calling for contrasts (using the makeContrast function), the analysis was sub-divided into two groups based on the respective control condition: all conditions with varying carbon sources (including starvation) were compared with glucose as a control, whereas all stress conditions were compared with the condition immediately before stress induction (that is, a mid-exponential phase culture in TYG medium). Differential gene expression data across all conditions relative to their respective control condition can be found in Supplementary Table 4.

Gene set annotation and enrichment analyses

We assembled a list of functionally annotated gene sets from the literature. We recovered annotations of PULs 01–96 from the Polysaccharide Utilization Loci Database⁴; CPSs and conjugative transposons from ref. ⁵⁵; the genes transcribed from promoter motifs PM1 and PM2 from ref. ¹⁶; regulons from RegPrecise version 3.2 (ref. ⁵⁶); the Cur regulon from ref. ¹³; annotated KEGG pathways and modules from the KEGG database (accessed on 1 December 2022); Gene Ontology terms from UniProt (accessed on 25 November 2021); and predicted KEGG modules and pathways and Gene Ontology terms from an eggNOG version 5.0 (ref. ⁵⁷) annotation of the B. thetaiotaomicron genome. Gene set enrichment analysis was performed with the fgsea R package over all gene sets with more than nine genes, except for PULs, which were retained irrespective of their gene number. Genes were ranked based on the −log₁₀[P value] × sign[FC] metric.

Northern blot

Northern blotting was performed as described previously¹⁶. In short, total RNA (2.5–10 µg) was electrophoretically resolved on a 6% (vol vol⁻¹) polyacrylamide (PAA) gel containing 7 M urea and electro-blotted onto a membrane (Amersham Hybond-XL) at 50 V and 4 °C for 1 h. The blots were probed with gene-specific ³²P-labelled oligonucleotides in Hybri-Quick buffer (Carl Roth) at 42 °C and subsequently exposed to a phosphor screen as required. Images were visualized using a phosphorimager (FLA-3000 Series; Fuji).

Reanalysis of TIS data

We reanalysed fitness data from a comprehensive B. thetaiotaomicron transposon mutant library that probed a suite of hundreds of different conditions, including 48 different carbon sources and 56 stress-inducing compounds¹⁹, in the context of our extensive noncoding RNA annotation (see above). This was done with the primary objective of identifying and possibly correlating gene expression from our transcriptomic dataset with mutant fitness data and thereby allowing us to draw biologically meaningful conclusions. To further streamline our analysis, we focused exclusively on independently encoded intergenic sRNAs since phenotypes pertaining to such mutants would probably not involve polar effects. Consequently, of the 135 intergenic sRNAs identified in B. thetaiotaomicron, we obtained fitness data for 81 sRNAs (~70%), of which 28 were associated with statistically significant effects (|t| > 4) in at least one successful experiment (Supplementary Table 8). A successful experiment requires that each gene is represented by a sufficient number of barcode counts⁵⁸. The fitness of a gene is defined as the average log₂[change in relative abundance of its mutants (|fit|)]. Negative and positive values mean that the sRNA mutants were less or more fit, respectively, than the average strain in the pool. Experiments with ‘jackpot’ effects, whereby the disruption of an sRNA resulted in a large competitive advantage versus the other mutants in the pool, were retained, but specifically labelled as strong phenotypes (|fit| > 2 and |t| > 5) (Supplementary Table 8). A third category, namely ‘combined’, comprised those phenotypes that were both strong and significant, per the above criteria.

Launch of Theta-Base 2.0

Theta-Base 2.0 (http://micromix.helmholtz-hiri.de/bacteroides/) was created using Micromix (https://github.com/BarquistLab/Micromix)⁵⁹, which relies on Flask⁶⁰ (back end) and Vue.js (front end), storing underlying visualization and expression data using MongoDB. The Clustergrammer plugin uses the API from the Ma’ayan laboratory⁶¹, while the heat map plugin follows the same front- and back-end architecture as the main site. Gene set annotations (Gene Ontology terms, KEGG pathways and modules, PULs, CPSs, conjugative transposons, promoter motifs and known regulons) were prepared as described in the section ‘Gene set annotation and enrichment analyses’ and can be found in Supplementary Table 3. The sRNA fitness dataset was adapted from Supplementary Table 8. Deployment of the back and front ends uses Gunicorn (https://readthedocs.org/projects/gunicorn-docs) and Nginx⁶².

B. thetaiotaomicron datasets can be manually selected by first clicking the Bacteroides Theta tab, followed by providing a title and then selecting an appropriate dataset using the dropdown menu. Users can select from a choice of expression data (that is, normalized in CPM or log₂[FC] (compared with control conditions)) or between the entire dataset or specific sRNA fitness data. As an option, columns of interest can be further customized using the ‘Select columns’ box and by subsequently clicking the ‘Add’ tab. Users also have the option to add their own data, as outlined in additional tabs, such as by uploading a delimited file.

The resulting data tables are displayed in the browser and can be filtered or transformed. For example, the ‘Filter’ button allows data tables to be filtered using keywords with prompts to make the search process seamless. The ‘Functional annotation’ button permits the user to select from a large number of preset manually curated gene lists, such as ‘GO term’, ‘KEGG pathway’, ‘PUL’, ‘CPS’ and ‘CTn’, to name a few. A third button, ‘ncRNA’ permits selection of manually curated noncoding RNA gene sets (for example, ‘High-confidence intergenic sRNAs’, ‘Intergenic sRNAs’ and ‘Cis-antisense RNAs’). Once the desired genes have been filtered, they may be transformed by clicking the ‘Transform data’ button and performing operations such as rounding values, log conversion or calculating transcripts per million. Once datasets have been loaded by the user, they can be further examined using three visualization modes; namely, ‘Heatmap’, ‘Clustergrammer’ and ‘JBrowse’. Two- and three-dimensional heat maps can be generated using the Heatmap function. Note that the heat map defaults to a three-dimensional option, but users can manually switch to the two-dimensional option. Heat maps can be customized using the menu on the left that permits changes to the colour gradients and overall structure. Customized heat maps can be downloaded in SVG or PNG formats using the download tab. Alternatively, for clustering according to genes or conditions, ‘Clustergrammer’ is recommended. Selecting this tab generates a two-dimensional dynamic heat map of the data that can be further investigated using the menu on the left. Currently, the tool only permits a maximum of 200 rows to be loaded and users will be notified if more rows are selected. Customized heat maps and data tables can be downloaded using the ‘Take snapshot’ and ‘Download matrix’ buttons, respectively. For a detailed view of normalized coverage plots for the investigated conditions, in addition to those published in the first iteration of Theta-Base¹⁶, users can select the ‘JBrowse’ button⁶³. Users are free to select from a range of updated annotations displaying high-resolution maps for noncoding RNAs (ncRNA), TSSs (TSSv3), terminators (term_v2), operons (Operon_structure), a transposon insertion map related to the fitness data (Tn_insertions) and invertons (Inverted_repeats).

On the top right of the website there are four buttons. The ‘Padlock’ button locks the current state of the site, allowing users to copy their URL and share with colleagues. The next (‘Download’) button allows users to download the currently selected dataset as an Excel or a delimited file (such as .csv). The ‘New document’ button will re-load the website so users can select another dataset. The ‘Help’ button—when clicked—will provide pop-over text explaining various features of the site.

Antibiotics growth curve analyses and agar strip assays

Bacterial growth curves were determined by inoculating a single colony each of AWS-003 (Δtdk; referred to as the wild-type in Fig. 4) and AWS-029 (∆masB) into 5 ml TYG medium and incubating overnight under anaerobic conditions. These cultures were sub-cultured (1:100 dilution) in 2 ml TYG medium containing the indicated final concentrations of doxycycline (Sigma-Aldrich), tetracycline (AppliChem) or a water control. The samples (200 µl volume) were incubated in a 96-well flat-bottom plate (Nunclon) at 37 °C (doxycycline) or 40 °C (tetracycline) with continuous shaking (double orbital) in a microplate spectrophotometer (BioTek Epoch 2). Optical densities were recorded every 20 min. The assay was performed in three biological replicates, each comprising technical duplicates.

Antibiotics strip assays were performed by dipping a sterile cotton swab into overnight TYG cultures of AWS-003 or AWS-029 and streaking on BHIS agar plates containing strips for doxycycline (EM103 (HiMedia; in Fig. 4c) or 92156 (Liofilchem; in Fig. 5g)) or tetracycline (EM056; HiMedia). The plates were incubated anaerobically for 48 h at 37 °C and images were taken. The minimal inhibitory concentrations were derived from the positions where the inhibition ellipses intersected the strips.

Gene co-expression analyses

Correlation of the expression of all B. thetaiotaomicron genes across all of the profiled carbon source and stress conditions was calculated by generating a correlation matrix (Pearson’s correlation score) of the z scores of the CPM values of each gene. To identify the correlation in expression between our gene sets (see ‘Gene set annotation and enrichment analyses’) and a given gene of interest (MasB in Fig. 5a and BT_1675 in Extended Data Fig. 10f), the median of the correlation values between all genes within a gene set and the gene of interest was calculated. Gene sets composed of fewer than ten operons were excluded from this analysis.

MS2 affinity purification and sequencing

A B. thetaiotaomicron ΔmasB mutant complemented with either MS2-MasB (AWS-062) or untagged MasB (AWS-036) was diluted 1:100 in TYG medium from an overnight culture grown anaerobically at 37 °C. At an OD₆₀₀ of 2.0, expression of MS2-tagged MasB and untagged MasB was induced by the addition of 200 ng ml⁻¹ aTC. After another 2 h of growth at 37 °C, 90 OD equivalents of the cultures were collected, centrifuged for 20 min at 2,000g and 4 °C and snap-frozen in liquid nitrogen. MS2 pulldown and RNA purification was performed as described in ref. ⁶⁴, but with slight modifications to adapt the protocol to Bacteroides. Specifically, the column was washed only six (instead of eight) times with buffer A before elution. Elution itself was then induced with 600 µl (rather than 300 µl) elution buffer.

For library preparation (at Vertis Biotechnologie), the RNA samples were first fragmented using ultrasound (four pulses of 30 s each at 4 °C). Then, an oligonucleotide adapter was ligated to the 3′ ends of the RNA molecules. First-strand cDNA synthesis was performed using M-MLV reverse-transcriptase and the 3′ adapter as a primer. The first-strand cDNA was purified and the 5′ Illumina TruSeq sequencing adapter was ligated to the 3′ end of the antisense cDNA. The resulting cDNA was PCR-amplified to ~10–20 ng μl⁻¹ using a high-fidelity DNA polymerase. The cDNA was purified using the Agencourt AMPure XP kit (Beckman Coulter Genomics) and analysed by capillary electrophoresis. For Illumina sequencing, the samples were pooled in approximately equimolar amounts. To deplete sequences derived from 5S rRNA, the cDNA pool was digested using probes specific for bacterial 5S and Cas9 endonuclease. Afterwards, the cDNA pool was fractionated in the size range of 200–600 bp using a preparative agarose gel. An aliquot of the size-fractionated pool was analysed by capillary electrophoresis. The cDNA pool was sequenced on an Illumina NextSeq 500 system using a read length of 2 × 150 bp.

Generated reads were quality-checked using FastQC (version 0.11.8) and adapters were trimmed using BBDuk with the following parameters: qtrim=r trimq=10 ktrim=r ref=bbmap/ressources/adapters.fa k=23 mink=11 hdist=1 tpe tbo. BBmap was used to map reads to the B. thetaiotaomicron VPI-5482 reference genome (NC_004663.1) and plasmid (NC_004703.1), as well as to the MS2-MasB sequence. Read quantification was performed using featureCounts (2.0.1). Differential abundance analysis between the MS2-MasB and untagged samples was conducted using edgeR⁶⁵ in combination with RUVSeq⁶⁶ to estimate the factor of unwanted variation using replicate sample with correction factor k=1.

IntaRNA prediction of sRNA–mRNA interactions

In silico interaction prediction between MasB and its putative mRNA targets fusA2 and BT_1675 was performed with the help of IntaRNA^40,41 using the Vienna RNA package (2.4.14 and boost 1.7) at default settings along with the output flag (--out=pMinE:FILE.csv) to generate minimal energy values for intermolecular index pairs. For visualization, the resulting values were plotted in form of a heat map in R (version 4.2).

In vitro transcription and radiolabelling of RNA

DNA templates for in vitro transcription were amplified using genomic DNA and primer pairs carrying a T7 promoter (Supplementary Table 10). The in vitro transcription reaction was performed using the MEGAscript T7 kit (Thermo Fisher Scientific) followed by DNase I digestion (1 U; 37 °C; 15 min). RNA products were then excised from a 6% (vol vol⁻¹) PAA-7M urea gel by comparison with a Low Range RNA ladder (Thermo Fisher Scientific) and eluted overnight in elution buffer (0.1 M NaOAc, 0.1% sodium dodecyl sulfate and 10 mM EDTA) on a thermoblock at 8 °C and 1,400 r.p.m. The next day, the RNA was precipitated in an ethanol:NaOAc (30:1) mix, washed with 75% ethanol and resuspended in 20 µl water (at 65 °C for 5 min).

Radioactive labelling of the in vitro-transcribed RNA was carried out by dephosphorylating 50 pmol RNA with 25 U calf intestine alkaline phosphatase (NEB) in a 50 µl reaction and incubating at 37 °C for 1 h. The dephosphorylated RNA was extracted using phenol:cholorform:isoamylalcohol (25:24:1) and precipitated as described above. Next, 20 pmol of this RNA was 5′ end-labelled (20 µCi ³²P-γATP) using 1 U polynucleotide kinase (NEB) at 37 °C for 1 h in a 20 µl reaction. The labelled RNA was purified using a G-50 column (GE Healthcare) and extracted from a PAA gel as described above.

EMSA

EMSA was carried out in a reaction volume of 10 μl, containing 1× RNA structure buffer (Ambion), 1 μg yeast RNA (~4 μM final concentration), 5′ end-labelled MasB RNA (4 nM final concentration) and an mRNA segment of 137 nucleotides in length, spanning the predicted MasB target site within BT_1675 (see Fig. 5e) at final concentrations of 0, 8, 16, 32, 64, 128, 256, 512 and 1,024 nM. Following incubation at 37 °C for 1 h, 3 μl of 5× native loading dye (0.2% bromophenol blue, 0.5× TBE and 50% glycerol) was added to each tube. All of the samples were loaded on a native 6% (vol vol⁻¹) PAA gel in 0.5× TBE buffer and run at 300 V and 4 °C for 3 h. The gel was dried, exposed and visualized using a phosphorimager (FLA‐3000 Series; Fuji). The experiment was repeated three times and quantified using ImageJ version 1.52s⁶⁷ and GraphPad Prism version 9 for Windows (GraphPad Software; www.graphpad.com). The dissociation constant (K_d) was calculated via the one site-specific binding formula:

$$Y=B_{{\rm{max}}} \times X/({K_{{\rm{d}}}}+X)$$

where Y is the specific binding; X the concentration of radio ligand; B_max the maximum binding in the same unit as Y; and K_d the dissociation constant in the same unit as X.

qRT-PCR analysis

For the qRT-PCR assays, Δtdk B. thetaiotaomicron (AWS-003; referred to as the wild-type in Fig. 5), ΔmasB (AWS-029) and masB⁺ (AWS-036) were grown anaerobically overnight at 37 °C in 5 ml TYG medium, then sub-cultured 1:100 in TYG and induced with 200 ng ml⁻¹ aTC. Around four optical density equivalents of samples were collected at the early exponential phase (OD₆₀₀ = 0.3), mid-exponential phase (OD₆₀₀ = 2.0) and late exponential phase (OD₆₀₀ = 3.7) for RNA extraction, as described above. For the starvation condition, the same strains were grown anaerobically for 24 h at 37 °C in 5 ml minimal medium supplemented with 0.5% glucose, and then sub-cultured 1:100 in 0.5% glucose-containing minimal medium supplemented with 200 ng ml⁻¹ aTC. At OD₆₀₀ = 2.0, the cultures were centrifuged, the supernatant was discarded and the pellet was resuspended in minimal media without a carbon source and incubated anaerobically at 37 °C for another 2 h. Around four optical density equivalents of the samples were collected for RNA extraction. qRT-PCR reactions were performed as described in ref. ¹⁶. A minimum of three biological replicates were pipetted and plates were analysed on a QuantStudio 5 instrument (Thermo Fisher Scientific).

Dual-plasmid fluorescence reporter assay

Strains of E. coli TOP10, which were engineered to carry translational fusions of superfolder GFP⁴³ to different variants of the 5′ part of the BT_1675 coding sequence, were cultured in Lysogeny Broth medium supplemented with chloramphenicol (20 μg ml⁻¹) and carbenicillin (100 μg ml⁻¹) until an OD₆₀₀ of 0.5 was reached. Subsequently, 100 μl of the cultures was collected and subjected to three washes with 1× phosphate-buffered saline before fixation with a 4% paraformaldehyde solution. The fluorescence intensity of GFP was measured in phosphate-buffered saline using flow cytometry (NovoCyte Quanteon; Agilent).

Statistics and reproducibility

Conventional RNA-seq of diverse growth conditions, dRNA-seq of pooled conditions and MAPS were performed in biological duplicates. Testing for differential expression or enrichment was performed using the generalized linear model likelihood ratio test implemented in edgeR⁵³. Northern blots were performed in two biological replicates. EMSAs were performed in technical triplicates. qRT-PCR analysis was performed in a minimum of three biological replicates, each comprising technical duplicates, and a Mann–Whitney test was used to call significant comparisons. Growth curve experiments were performed in a minimum of three biological replicates, unless explicitly mentioned otherwise. Two-plasmid GFP reporter assays were performed in biological triplicates and Tukey’s multiple comparisons test was used to test for statistically significant differences. Minimal inhibitory concentration strip assays were performed in a minimum of five biological replicates for doxycycline and three biological replicates for tetracycline. No statistical method was used to predetermine the sample size. Instead, sample sizes were chosen based on previous experience and studies¹⁶. No data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during the experiments and outcome assessment.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The raw sequencing data are available from the National Center for Biotechnology Information’s Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) under the accession number GSE234958. Our analysed sequencing data are accessible at http://micromix.helmholtz-hiri.de/bacteroides/. Source data are provided with this paper.

Code availability

Core software central to the conclusions drawn in this study is publicly available and its usage parameters described in the appropriate sections above. Code for the Micromix data integration platform is available at https://github.com/BarquistLab/Micromix (ref. ⁵⁹).

References

Wexler, A. G. & Goodman, A. L. An insider’s perspective: Bacteroides as a window into the microbiome. Nat. Microbiol. 2, 17026 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bornet, E. & Westermann, A. J. The ambivalent role of Bacteroides in enteric infections. Trends Microbiol. 30, 104–108 (2022).
Article CAS PubMed Google Scholar
Whittle, G., Shoemaker, N. B. & Salyers, A. A. The role of Bacteroides conjugative transposons in the dissemination of antibiotic resistance genes. Cell. Mol. Life Sci. 59, 2044–2054 (2002).
Article CAS PubMed Google Scholar
Terrapon, N. et al. PULDB: the expanded database of polysaccharide utilization loci. Nucleic Acids Res. 46, D677–D683 (2018).
Article CAS PubMed Google Scholar
Grondin, J. M., Tamura, K., Dejean, G., Abbott, D. W. & Brumer, H. Polysaccharide utilization loci: fueling microbial communities. J. Bacteriol. 199, e00860-16 (2017).
Article PubMed PubMed Central Google Scholar
Martens, E. C., Roth, R., Heuser, J. E. & Gordon, J. I. Coordinate regulation of glycan degradation and polysaccharide capsule biosynthesis by a prominent human gut symbiont. J. Biol. Chem. 284, 18445–18457 (2009).
Article CAS PubMed PubMed Central Google Scholar
Martens, E. C., Chiang, H. C. & Gordon, J. I. Mucosal glycan foraging enhances fitness and transmission of a saccharolytic human gut bacterial symbiont. Cell Host Microbe 4, 447–457 (2008).
Article CAS PubMed PubMed Central Google Scholar
D’Elia, J. N. & Salyers, A. A. Effect of regulatory protein levels on utilization of starch by Bacteroides thetaiotaomicron. J. Bacteriol. 178, 7180–7186 (1996).
Article PubMed PubMed Central Google Scholar
Sonnenburg, E. D. et al. A hybrid two-component system protein of a prominent human gut symbiont couples glycan sensing in vivo to carbohydrate metabolism. Proc. Natl Acad. Sci. USA 103, 8834–8839 (2006).
Article CAS PubMed PubMed Central Google Scholar
Sonnenburg, E. D. et al. Specificity of polysaccharide use in intestinal Bacteroides species determines diet-induced microbiota alterations. Cell 141, 1241–1252 (2010).
Article CAS PubMed PubMed Central Google Scholar
Pearce, V. H., Groisman, E. A. & Townsend, G. E. II. Dietary sugars silence the master regulator of carbohydrate utilization in human gut Bacteroides species. Gut Microbes 15, 2221484 (2023).
Schwalm, N. D. III, Townsend, G. E. II & Groisman, E. A. Multiple signals govern utilization of a polysaccharide in the gut bacterium Bacteroides thetaiotaomicron. mBio 7, e01342-16 (2016).
Townsend, G. E. 2nd et al. A master regulator of Bacteroides thetaiotaomicron gut colonization controls carbohydrate utilization and an alternative protein synthesis factor. mBio 11, e03221-19 (2020).
Wagner, E. G. H. & Romby, P. Small RNAs in bacteria and archaea: who they are, what they do, and how they do it. Adv. Genet. 90, 133–208 (2015).
Article CAS PubMed Google Scholar
Cao, Y., Forstner, K. U., Vogel, J. & Smith, C. J. Cis-encoded small RNAs, a conserved mechanism for repression of polysaccharide utilization in Bacteroides. J. Bacteriol. 198, 2410–2418 (2016).
Article CAS PubMed PubMed Central Google Scholar
Ryan, D., Jenniches, L., Reichardt, S., Barquist, L. & Westermann, A. J. A high-resolution transcriptome map identifies small RNA regulation of metabolism in the gut microbe Bacteroides thetaiotaomicron. Nat. Commun. 11, 3557 (2020).
Article CAS PubMed PubMed Central Google Scholar
Goodman, A. L. et al. Identifying genetic determinants needed to establish a human gut symbiont in its habitat. Cell Host Microbe 6, 279–289 (2009).
Article CAS PubMed PubMed Central Google Scholar
Wu, M. et al. Genetic determinants of in vivo fitness and diet responsiveness in multiple human gut Bacteroides. Science 350, aac5992 (2015).
Article PubMed PubMed Central Google Scholar
Liu, H. et al. Functional genetics of human gut commensal Bacteroides thetaiotaomicron reveals metabolic requirements for growth across environments. Cell Rep. 34, 108789 (2021).
Article CAS PubMed PubMed Central Google Scholar
Waters, J. L. & Salyers, A. A. The small RNA RteR inhibits transfer of the Bacteroides conjugative transposon CTnDOT. J. Bacteriol. 194, 5228–5236 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ryan, D., Prezza, G. & Westermann, A. J. An RNA-centric view on gut Bacteroidetes. Biol. Chem. 402, 55–72 (2020).
Article PubMed Google Scholar
Yao, L. et al. A selective gut bacterial bile salt hydrolase alters host metabolism. eLife 7, e37182 (2018).
Article PubMed PubMed Central Google Scholar
Singhal, R. & Shah, Y. M. Oxygen battle in the gut: hypoxia and hypoxia-inducible factors in metabolic and inflammatory responses in the intestine. J. Biol. Chem. 295, 10493–10505 (2020).
Article CAS PubMed PubMed Central Google Scholar
Nugent, S. G., Kumar, D., Rampton, D. S. & Evans, D. F. Intestinal luminal pH in inflammatory bowel disease: possible determinants and implications for therapy with aminosalicylates and other drugs. Gut 48, 571–577 (2001).
Article CAS PubMed PubMed Central Google Scholar
Sharma, C. M. & Vogel, J. Differential RNA-seq: the approach behind and the biological insight gained. Curr. Opin. Microbiol. 19, 97–105 (2014).
Article CAS PubMed Google Scholar
Kroger, C. et al. An infection-relevant transcriptomic compendium for Salmonella enterica serovar Typhimurium. Cell Host Microbe 14, 683–695 (2013).
Article CAS PubMed Google Scholar
Yu, S. H., Vogel, J. & Forstner, K. U. ANNOgesic: a Swiss army knife for the RNA-seq based annotation of bacterial/archaeal genomes. GigaScience 7, giy096 (2018).
Article PubMed PubMed Central Google Scholar
Jiang, X. et al. Invertible promoters mediate bacterial phase variation, antibiotic resistance, and host adaptation in the gut. Science 363, 181–187 (2019).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J., Chin, M., Nusbaum, C., Birren, B. W. & Livny, J. How deep is deep enough for RNA-seq profiling of bacterial transcriptomes? BMC Genomics 13, 734 (2012).
Article CAS PubMed PubMed Central Google Scholar
Porter, N. T. & Martens, E. C. The critical roles of polysaccharides in gut microbial ecology and physiology. Annu. Rev. Microbiol. 71, 349–369 (2017).
Article CAS PubMed Google Scholar
Porter, N. T. et al. Phase-variable capsular polysaccharides and lipoproteins modify bacteriophage susceptibility in Bacteroides thetaiotaomicron. Nat. Microbiol. 5, 1170–1181 (2020).
Article CAS PubMed PubMed Central Google Scholar
Porter, N. T., Canales, P., Peterson, D. A. & Martens, E. C. A subset of polysaccharide capsules in the human symbiont Bacteroides thetaiotaomicron promote increased competitive fitness in the mouse gut. Cell Host Microbe 22, 494–506.e8 (2017).
Article CAS PubMed PubMed Central Google Scholar
Cuskin, F. et al. Human gut Bacteroidetes can utilize yeast mannan through a selfish mechanism. Nature 517, 165–169 (2015).
Article CAS PubMed PubMed Central Google Scholar
Martens, E. C. et al. Recognition and degradation of plant cell wall polysaccharides by two human gut symbionts. PLoS Biol. 9, e1001221 (2011).
Article CAS PubMed PubMed Central Google Scholar
Briliute, J. et al. Complex N-glycan breakdown by gut Bacteroides involves an extensive enzymatic apparatus encoded by multiple co-regulated genetic loci. Nat. Microbiol. 4, 1571–1581 (2019).
Article CAS PubMed Google Scholar
Li, H. et al. The outer mucus layer hosts a distinct intestinal microbial niche. Nat. Commun. 6, 8292 (2015).
Article CAS PubMed Google Scholar
Han, W. et al. Gut colonization by Bacteroides requires translation by an EF-G paralog lacking GTPase activity. EMBO J. 42, e112372 (2022).
Article PubMed PubMed Central Google Scholar
Prezza, G. et al. Comparative genomics provides structural and functional insights into Bacteroides RNA biology. Mol. Microbiol. 117, 67–85 (2022).
Article CAS PubMed Google Scholar
Lalaouna, D., Prévost, K., Eyraud, A. & Massé, E. Identification of unknown RNA partners using MAPS. Methods 117, 28–34 (2017).
Article CAS PubMed Google Scholar
Wright, P. R. et al. CopraRNA and IntaRNA: predicting small RNA targets, networks and interaction domains. Nucleic Acids Res. 42, W119–W123 (2014).
Article CAS PubMed PubMed Central Google Scholar
Mann, M., Wright, P. R. & Backofen, R. IntaRNA 2.0: enhanced and customizable prediction of RNA–RNA interactions. Nucleic Acids Res. 45, W435–W439 (2017).
Article CAS PubMed PubMed Central Google Scholar
Urban, J. H. & Vogel, J. Translational control and target recognition by Escherichia coli small RNAs in vivo. Nucleic Acids Res. 35, 1018–1037 (2007).
Article CAS PubMed PubMed Central Google Scholar
Corcoran, C. P. et al. Superfolder GFP reporters validate diverse new mRNA targets of the classic porin regulator, MicF RNA. Mol. Microbiol. 84, 428–445 (2012).
Article CAS PubMed Google Scholar
Donaldson, G. P. et al. Spatially distinct physiology of Bacteroides fragilis within the proximal colon of gnotobiotic mice. Nat. Microbiol. 5, 746–756 (2020).
Article CAS PubMed PubMed Central Google Scholar
Becattini, S. et al. Rapid transcriptional and metabolic adaptation of intestinal microbes to host immune activation. Cell Host Microbe 29, 378–393.e5 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kennedy, M. S. et al. Dynamic genetic adaptation of Bacteroides thetaiotaomicron during murine gut colonization. Cell Rep. 42, 113009 (2023).
Article CAS PubMed PubMed Central Google Scholar
Neckers, L. & Tatu, U. Molecular chaperones in pathogen virulence: emerging new targets for therapy. Cell Host Microbe 4, 519–527 (2008).
Article CAS PubMed PubMed Central Google Scholar
Koropatkin, N. M., Martens, E. C., Gordon, J. I. & Smith, T. J. Starch catabolism by a prominent human gut symbiont is directed by the recognition of amylose helices. Structure 16, 1105–1115 (2008).
Article CAS PubMed PubMed Central Google Scholar
Bencivenga-Barry, N. A., Lim, B., Herrera, C. M., Trent, M. S. & Goodman, A. L. Genetic manipulation of wild human gut Bacteroides. J. Bacteriol. 202, e00544-19 (2020).
Article PubMed PubMed Central Google Scholar
Eriksson, S., Lucchini, S., Thompson, A., Rhen, M. & Hinton, J. C. Unravelling the biology of macrophage infection by gene expression profiling of intracellular Salmonella enterica. Mol. Microbiol. 47, 103–118 (2003).
Article CAS PubMed Google Scholar
Förstner, K. U., Vogel, J. & Sharma, C. M. READemption—a tool for the computational analysis of deep-sequencing-based transcriptome data. Bioinformatics 30, 3421–3423 (2014).
Article PubMed Google Scholar
Li, L. et al. BSRD: a repository for bacterial small regulatory RNA. Nucleic Acids Res. 41, D233–D238 (2013).
Article CAS PubMed Google Scholar
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Article CAS PubMed Google Scholar
McCarthy, D. J., Chen, Y. & Smyth, G. K. Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation. Nucleic Acids Res. 40, 4288–4297 (2012).
Article CAS PubMed PubMed Central Google Scholar
Xu, J. et al. A genomic view of the human–Bacteroides thetaiotaomicron symbiosis. Science 299, 2074–2076 (2003).
Article CAS PubMed Google Scholar
Novichkov, P. S. et al. RegPrecise 3.0—a resource for genome-scale exploration of transcriptional regulation in bacteria. BMC Genomics 14, 745 (2013).
Article CAS PubMed PubMed Central Google Scholar
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).
Article CAS PubMed Google Scholar
Wetmore, K. M. et al. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons. mBio 6, e00306–e00315 (2015).
Article CAS PubMed PubMed Central Google Scholar
BarquistLab/Micromix. GitHub https://github.com/BarquistLab/Micromix (2024).
Grinberg, M. Flask Web Development: Developing Web Applications with Python (O’Reilly Media, 2018).
Fernandez, N. F. et al. Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data. Sci. Data 4, 170151 (2017).
Article PubMed PubMed Central Google Scholar
Reese, W. Nginx: the high-performance web server and reverse proxy. Linux J. 2008, 2 (2008).
Google Scholar
Diesh, C. et al. JBrowse 2: a modular genome browser with views of synteny and structural variation. Genome Biol. 24, 74 (2023).
Article PubMed PubMed Central Google Scholar
Correia Santos, S., Bischler, T., Westermann, A. J. & Vogel, J. MAPS integrates regulation of actin-targeting effector SteC into the virulence control network of Salmonella small RNA PinT. Cell Rep. 34, 108722 (2021).
Article CAS PubMed Google Scholar
Chen, Y., Lun, A. T. & Smyth, G. K. From reads to genes to pathways: differential expression analysis of RNA-seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. F1000Res 5, 1438 (2016).
PubMed PubMed Central Google Scholar
Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotechnol. 32, 896–902 (2014).
Article CAS PubMed PubMed Central Google Scholar
Schneider, C. A., Rasband, W. S. & Eliceiri, K. W. NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675 (2012).
Article CAS PubMed PubMed Central Google Scholar
McMillan, A. S., Foley, M. H., Perkins, C. E. & Theriot, C. M. Loss of Bacteroides thetaiotaomicron bile acid altering enzymes impact bacterial fitness and the global metabolic transcriptome. Preprint at bioRxiv https://doi.org/10.1101/2023.06.27.546749 (2023).
Wegmann, U., Horn, N. & Carding, S. R. Defining the Bacteroides ribosomal binding site. Appl. Environ. Microbiol. 79, 1980–1989 (2013).
Article CAS PubMed PubMed Central Google Scholar
Luis, A. S. et al. Sulfated glycan recognition by carbohydrate sulfatases of the human gut microbiota. Nat. Chem. Biol. 18, 841–849 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bechon, N. et al. Bacteroides thetaiotaomicron uses a widespread extracellular DNase to promote bile-dependent biofilm formation. Proc. Natl Acad. Sci. USA 119, e2111228119 (2022).
Article PubMed PubMed Central Google Scholar
Cho, K. H., Cho, D., Wang, G. R. & Salyers, A. A. New regulatory gene that contributes to control of Bacteroides thetaiotaomicron starch utilization genes. J. Bacteriol. 183, 7198–7205 (2001).
Article CAS PubMed PubMed Central Google Scholar
Reeves, A. R., Wang, G. R. & Salyers, A. A. Characterization of four outer membrane proteins that play a role in utilization of starch by Bacteroides thetaiotaomicron. J. Bacteriol. 179, 643–649 (1997).
Article CAS PubMed PubMed Central Google Scholar
Schofield, W. B., Zimmermann-Kogadeeva, M., Zimmermann, M., Barry, N. A. & Goodman, A. L. The stringent response determines the ability of a commensal bacterium to survive starvation and to persist in the gut. Cell Host Microbe 24, 120–132.e6 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lorenz, R. et al. ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank S. Reichardt for technical support and A.-R. Brochado (University of Tübingen) for sharing antibiotics. We are grateful to J. Vogel and A. Sparmann for constructive feedback on this manuscript, as well as to all members of A.J.W.’s laboratory for fruitful discussions. We also thank M. Price (Lawrence Berkeley National Laboratory) for calculating gene fitness scores, M. Kütt for help when setting up the Theta-Base 2.0 website, L. Jenniches for help with JBrowse and S. C. Santos for helpful discussions about MAPS. E.B. and T.F.d.C. were recipients of fellowships from the Helmholtz Institute for RNA-based Infection Research graduate training programme ‘RNA and Infection’. A.M.D. acknowledges support from US National Institutes of Health grant RM1 GM135102. Research in A.J.W.’s laboratory is supported by the German Research Foundation (Individual Research Grant We6689/1-1) and by the European Research Council (ERC Starting Grant #101040214 ‘GUT-CHECK’).

Funding

Open access funding provided by Helmholtz-Zentrum für Infektionsforschung GmbH (HZI).

Author information

Hannah Felchle
Present address: Department of Radiation Oncology, Technical University of Munich, School of Medicine, Klinikum rechts der Isar, Munich, Germany

Authors and Affiliations

Helmholtz Institute for RNA-based Infection Research, Helmholtz Centre for Infection Research, Würzburg, Germany
Daniel Ryan, Elise Bornet, Gianluca Prezza, Shuba Varshini Alampalli, Taís Franco de Carvalho, Hannah Felchle, Titus Ebbecke, Regan J. Hayward, Lars Barquist & Alexander J. Westermann
Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
Adam M. Deutschbauer
Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, USA
Adam M. Deutschbauer
Faculty of Medicine, University of Würzburg, Würzburg, Germany
Lars Barquist
Department of Biology, University of Toronto Mississauga, Mississauga, Ontario, Canada
Lars Barquist
Institute of Molecular Infection Biology, University of Würzburg, Würzburg, Germany
Alexander J. Westermann
Department of Microbiology, Biocentre, University of Würzburg, Würzburg, Germany
Alexander J. Westermann

Authors

Daniel Ryan
View author publications
You can also search for this author in PubMed Google Scholar
Elise Bornet
View author publications
You can also search for this author in PubMed Google Scholar
Gianluca Prezza
View author publications
You can also search for this author in PubMed Google Scholar
Shuba Varshini Alampalli
View author publications
You can also search for this author in PubMed Google Scholar
Taís Franco de Carvalho
View author publications
You can also search for this author in PubMed Google Scholar
Hannah Felchle
View author publications
You can also search for this author in PubMed Google Scholar
Titus Ebbecke
View author publications
You can also search for this author in PubMed Google Scholar
Regan J. Hayward
View author publications
You can also search for this author in PubMed Google Scholar
Adam M. Deutschbauer
View author publications
You can also search for this author in PubMed Google Scholar
Lars Barquist
View author publications
You can also search for this author in PubMed Google Scholar
Alexander J. Westermann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

D.R. and A.J.W. planned the project. D.R., E.B., G.P., T.F.d.C. and H.F. performed the experimental work. G.P., T.E., R.J.H. and L.B. developed the software. D.R., E.B., G.P., S.V.A. and T.F.d.C. analysed the data. D.R. and A.J.W. wrote the original draft of the manuscript. E.B., G.P., S.V.A., T.F.d.C., H.F., T.E., R.J.H., A.M.D. and L.B. reviewed and edited the manuscript. A.J.W. acquired the funding.

Corresponding author

Correspondence to Alexander J. Westermann.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Microbiology thanks Kerry Brown, Eric Martens, Eric Masse, Joseph Wade and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 B. thetaiotaomicron growth curves over the profiled in vitro conditions.

a–d, Growth in rich TYG medium upon constant exposure to the indicated environmental stresses. e, f, Growth in minimal medium with the indicated carbohydrates—simple sugars (e) and porcine mucin (f)—added as sole carbon sources. In each case, growth curves and error bars denote the mean ± SD from each three biological replicates. The dashed vertical lines in panels e and f denote the time points of sampling.

Source data

Extended Data Fig. 2 Updated transcription start site annotations, intrinsic terminator annotations and operon structures of B. thetaiotaomicron.

a, Venn diagram showing the updated numbers of transcription start site categories. b, Refined transcription termination site (TTS) annotations. c, Operon structure prediction. Operons encompassing more than 10 genes are labeled by name.

Extended Data Fig. 3 Gene set enrichment analyses.

Top 10 enriched (normalized enrichment score >0) and depleted (normalized enrichment score <0) gene sets in the stress (a) and carbon source conditions (b). Gene sets derive from the custom gene set list described in the Methods section under ‘Gene set annotation and enrichment analyses’. Each gene set is represented by a circle whose size is proportional to the number of included genes (gene set size). Known inducers of PULs are included in brackets after the PUL name.

Extended Data Fig. 4 Transcript levels of B. thetaiotaomicron PUL systems, Cur regulon members, capsular polysaccharides, and conjugative transposons.

For each condition and each replicate, the read counts per million (CPM) mapped to individual PUL genes (as inferred from PULDB⁴), to individual members of the Cur regulon as inferred from (Supplementary Table S3B in Ref. ¹³), to CPS genes, and to CTn loci are plotted. Selected, strongly affected PULs and CPSs are labeled and re-plotted in Fig. 2b.

Extended Data Fig. 5 Pair-wise comparison of stress- and carbon source-specific gene expression changes.

a–f, Volcano plots report differential expression of B. thetaiotaomicron exposed to the indicated stresses (x-axis) over significance (y-axis). The generalized linear model likelihood ratio test implemented in edgeR was used to test for differential expression. a, Cold vs. TYG control. b, Heat vs. TYG control. c, Acidic vs. TYG control. d, Aerobic shaking vs. TYG control. The genes for cytochrome C peroxidase (BT_1606) and thioredoxin (BT_1456) are labeled. e, Deoxycholate vs. TYG control. Genes belonging to the BT_2792-BT_2795 operon¹⁹ or encoding bile acid-altering enzymes⁶⁸ are labeled. f, Gentamicin vs. TYG control. g, Venn diagrams display the overlap of bile salts-specific gene expression (significantly upregulated) between our dataset and that obtained in Ref. ⁷¹. The associated heat map depicts highly upregulated operons including components of two efflux systems (BT_2793-BT_2795, BT_2685-BT_2689) as well as an outer membrane protein and a calcineurin superfamily phosphohydrolase (BT_0691-BT_0692). h–l, Volcano plots report differential expression of B. thetaiotaomicron feeding on the indicated carbon sources (x-axis) over significance (y-axis). h, Arabinose vs. glucose. Genes belonging to the arabinose utilization operon BT_0348-BT_0369 (Ref. ¹²) are indicated. i, Xylose vs. glucose. Genes belonging to the xylose utilization operon BT_0791-BT_0794 (Ref. ¹³) are indicated. j, Maltose vs. glucose. Genes belonging to the starch utilization system (sus) operon BT_3704-BT_3698 (Refs. ^72,73) are labeled. k, N-acetyl-D-glucosamine vs. glucose. PUL80 genes are labeled. l, Starvation vs. TYG. The genes for the aldose 1-epimerase precursor AraM and for arabinose-utilizing PUL07 are marked. m, Venn diagram denotes the overlap of starvation-induced up- or downregulations observed here and in previous studies^13,74.

Source data

Extended Data Fig. 6 A shift in CPS expression and the overlap between GlcNAc- and mucin-responsive genes.

a, Expression of CPS3 (left) and CPS4 (right) genes across selected stress conditions, namely 28 °C, deoxycholate, bile salts and gentamicin in comparison to the TYG control. Genes were grouped into bins (width=100) based on average CPM values per condition. b, Theta-Base screenshot of the RNA-seq read coverages across the cps3 locus. Sub-operon #65, to which the main text refers to, is boxed. Color-coding is the same as for panel a. c, Venn diagram denotes the overlap in the regulated genes between N-acetyl-D-glucosamine- and mucin-consuming bacteria. d, The violin plots depict the fold-change in expression of individual PUL-associated genes induced in the presence of GlcNAc (n = 85) and polymeric mucin (n = 458) relative to the reference sugar glucose. Each dot refers to a single gene and the solid and dashed lines indicate the median and quartiles, respectively. e, f, Expression profiles of commonly induced PUL genes (n = 83) (e) and of non-PUL genes (n = 81) (f) in the presence of GlcNAc or mucin relative to that of glucose.

Source data

Extended Data Fig. 7 Expression of B. thetaiotaomicron cis-encoded antisense RNAs.

a, Heat map showing the expression of annotated cis-asRNAs across the set of experimental conditions. Growth phase-dependent expression data stem from Ref. ¹⁶. EEP, early exponential phase; MEP, mid-exponential phase; CPM, counts per million. b, Northern blot-based validation of newly predicted asRNA candidates. Shown are representative northern blots of two biological replicates. c, The heat map shows z-scores of log₂FC values of PUL-associated antisense RNAs and their respectively overlapping PUL gene (susC homologue). This reveals patterns of correlation and anti-correlation, as indicated by Pearson’s r values at the right.

Source data

Extended Data Fig. 8 TIS screen for phenotypes associated with B. thetaiotaomicron sRNAs and role of MasB in antibiotic tolerance.

a, Average number of transposon insertions per ncRNA gene in the mutant library across the range of different sample conditions. Intergenic B. thetaiotaomicron sRNAs are shown in blue, and 5′- or 3′-derived, or intra-operonic sRNA candidates in black. In this study, we focused on only the intergenic sRNA mutants. b, c, Growth curves of B. thetaiotaomicron isogenic wild-type (upper) and ∆masB (lower) in TYG supplemented with increasing concentrations of doxycycline (b) or tetracycline (c). Plotted are the means ±SD from each three biological replicate experiments, that each comprised technical duplicates. Indicated with dotted lines are the times to reach an OD₆₀₀ of 0.5 for each strain and treatment. d, Antibiotics exposure does not majorly influence MasB steady-state levels. Northern blot on total RNA samples derived from B. thetaiotaomicron cultures exposed for 2 h to increasing concentrations of either doxycycline or tetracycline, relative to RNA from vehicle (water)-treated control cultures. 5S rRNA was the loading control. Shown are representative northern blots of two biological replicates.

Source data

Extended Data Fig. 9 Establishment of MAPS for B. thetaiotaomicron sRNA MasB.

a, In-silico prediction of the secondary structure of a fusion of the 5′ end of MasB to the MS2 aptamer using the RNAfold WebServer⁷⁵. b, Plasmid map of AWP-020 containing the anhydrotetracycline-inducible MS2-MasB construct. c, Growth curves of the indicated strains in TYG medium in the absence (solid lines) or presence (dotted lines) of anhydrotetracycline (aTC; 200 ng/mL) as an inducer of MasB expression. Plotted values are the means of three biological replicates with error bars indicating the standard deviation. d, Northern blot to probe MS2-MasB in the strains used for MAPS grown in TYG to mid-exponential phase (MEP, ~7 h) or stationary phase (~10 h) in the absence or presence of anhydrotetracycline induction (for 2 h). The blot is representative of two biological replicates. e, Northern blot of the input (I) and eluate (E) fractions of the MAPS experiment, probed for MasB or MS2. 5S rRNA served as the loading control. An enrichment of MS2-MasB in the eluate compared to the input and to the samples derived from the untagged control strain demonstrates efficient capture and pull-down of MasB. The blot is representative of two biological replicates. f, Principal component analysis plot of the sequencing data revealed a segregation between the two biological replicates derived from the pull-down of MS2-MasB (blue) and those from the control strain (grey).

Source data

Extended Data Fig. 10 Characterization of the MasB target BT_1675.

a, MAPS coverage plots across the fusA2 and BT_1675 loci. R#1 and #2 are biological replicates. b, In-silico prediction of putative MasB interaction sites within MAPS-derived target candidates. The heat maps display the position-wise minimal energy (E) profiles of fusA2 or BT_1675 mRNAs (x-axes), respectively, with MasB (y-axis) as retrieved from IntaRNA⁴¹. Full-length sequences from TSS to TTS have been queried and the positions are relative to the TSS in each case. c, The indicated strains were grown in TYG to early (EEP), mid- (MEP), or late exponential phase (LEP), and starved for 2 h in minimal media devoid of a carbon source. Total RNA samples were collected and analyzed via northern blotting (upper) and qRT-PCR (lower; bars denote the means of five biological replicates). 5S rRNA or 16S rRNA served as loading control or reference gene, respectively. ‘**’ refers to a p-value of 0.0022 (non-parametric t-test; Mann-Whitney test); all other comparisons were statistically insignificant (p ≥ 0.05). d, Graphical illustration of the gating strategy applied to the fluorescent reporter assay, whose results are plotted in Fig. 5f. Left: gating for intact bacterial cells in the forward/sideward scatter. Right: fluorescence intensity histogram (detected in the FITC channel). e, Growth curves of B. thetaiotaomicron mutant strains in TYG. The graphs refer to the means of each three biological replicates. f, Coexpression analysis of BT_1675. Left: violin plots of Pearson’s correlation coefficients between BT_1675 expression and that of annotated Bacteroides gene sets (≥10 individual transcriptional units) sorted from left (lowest r) to right (highest r). Out of all 381 gene sets included in this analysis, the 5 top positively and negatively correlated (highest and lowest median r, respectively) are shown. Gene sets with |median r| > 0.5 are named below the plot. The GO term ‘Unfolded protein binding’ ranked first in absolute correlation. Right: heat map showing the expression of BT_1675 and ‘Unfolded protein binding’ genes across the set of 15 different experimental conditions.

Source data

Supplementary information

Supplementary Information

Supplementary text.

Reporting Summary

Supplementary Tables

Supplementary Tables 1–12.

Source data

Source Data Fig. 2

Expression profiling data for Fig. 2a–c,e,f.

Source Data Fig. 3

Expression data for Fig. 3b.

Source Data Fig. 3

Unmodified gels for Fig. 3b.

Source Data Fig. 4

Growth curve data for Fig. 4b.

Source Data Fig. 5

Source data for Fig. 5a,d–g.

Source Data Fig. 5

Unmodified blots for Fig. 5b,e.

Source Data Extended Data Fig. 1

Growth curve data for Extended Data Fig. 1a–f.

Source Data Extended Data Fig. 5

Expression profiling data for Extended Data Fig. 5a–m.

Source Data Extended Data Fig. 6

Expression profiling data for Extended Data Fig. 6a,c,d.

Source Data Extended Data Fig. 7

Expression profiling and statistical data for Extended Data Fig. 7a,c.

Source Data Extended Data Fig. 7

Unmodified blots for Extended Data Fig. 7b.

Source Data Extended Data Fig. 8

Growth curve data for Extended Data Fig. 8b,c.

Source Data Extended Data Fig. 8

Unmodified blots for Extended Data Fig. 8d.

Source Data Extended Data Fig. 9

Growth curve data for Extended Data Fig. 9c.

Source Data Extended Data Fig. 9

Unmodified blots for Extended Data Fig. 9d,e.

Source Data Extended Data Fig. 10

qRT-PCR data for Extended Data Fig. 10c, growth curve data for Extended Data Fig. 10d and gene set enrichment data for Extended Data Fig. 10e.

Source Data Extended Data Fig. 10

Unmodified blots for Extended Data Fig. 10c.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ryan, D., Bornet, E., Prezza, G. et al. An expanded transcriptome atlas for Bacteroides thetaiotaomicron reveals a small RNA that modulates tetracycline sensitivity. Nat Microbiol 9, 1130–1144 (2024). https://doi.org/10.1038/s41564-024-01642-9

Download citation

Received: 17 February 2023
Accepted: 07 February 2024
Published: 25 March 2024
Issue Date: April 2024
DOI: https://doi.org/10.1038/s41564-024-01642-9

Subjects

Abstract

Similar content being viewed by others

Main

Results

B. thetaiotaomicron transcriptome annotation

Bacteroides stress response signatures

Carbon source-specific gene expression patterns

Conditional expression of noncoding genes

Phenotypes associated with Bacteroides sRNA inactivation

MasB confers antibiotics susceptibility

Assignment of MasB to the Cur regulon

MAPS predicts BT_1675 as a direct MasB target

Antibiotic phenotype in light of the MasB regulatory axis

Discussion

Methods

Bacterial culture conditions

Bacterial genetics

Total RNA purification and removal of genomic DNA

cDNA library preparation and sequencing

Read processing and mapping

Transcriptome annotation

Prediction of invertible DNA regions

Differential gene expression analysis

Gene set annotation and enrichment analyses

Northern blot

Reanalysis of TIS data

Launch of Theta-Base 2.0

Antibiotics growth curve analyses and agar strip assays

Gene co-expression analyses

MS2 affinity purification and sequencing

IntaRNA prediction of sRNA–mRNA interactions

In vitro transcription and radiolabelling of RNA

EMSA

qRT-PCR analysis

Dual-plasmid fluorescence reporter assay

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links