Main

The gastrointestinal tract harbours a dense and complex microbial community termed the gut microbiota. At the phylum level, the composition of the microbiota tends to remain relatively constant over time1. However, when examined at the strain diversity level, the gut proves to be a highly dynamic environment governed by cooperation and more predominantly exclusionary relationships between bacterial species2. Freter’s nutrient niche theory posits that bacteria consuming identical nutrients with equal efficiency cannot coexist stably within the same ecological niche3,4. This principle holds particularly true in nutrient-limited environments such as the gastrointestinal tract, where a variety of bacterial adaptive mechanisms and competitive interactions have been selected and studied over the past years5,6,7. In this context, overlapping rivals can engage in a direct competition by using antagonistic strategies that favour the dominance of a single bacterium excelling in consuming shared resources8. This strategy, referred to as ‘exploitation’, is exemplified by certain commensal Escherichia coli and Klebsiella strains, and can directly affect pathogen fitness by depleting the environment of galactitol and beta-glycosides9,10,11. Furthermore, the probiotic E. coli Nissle can actively use oxygen and iron, thereby limiting the growth of pathogens such as Salmonella enterica serovar Typhimurium (S. Tm)12,13. Despite numerous examples of bacterial exploitation strategies documented in the literature, we still lack a mechanistic understanding of how niche exclusion occurs among metabolically related rivals. Specifically, it is unknown how bacteria with identical fitness-associated genes can metabolically compete, or what factors determine the outcome of exploitation interactions between strains whose growth relies on equivalent sets of nutrient utilization genes.

Protein translation is a complex cellular process shared by all living organisms. This multilayered phenomenon starts with an initiation phase orchestrated by a dynamic interplay between mRNAs, ribosomal subunits, the initiator tRNA and initiation factors. Together, they ensure the precise recognition of the start codon and delineate the open reading frame of the transcripts14. The nucleotide triplet ATG is the most predominant start codon in all kingdoms of life and is considered as the universal initiation codon in every known genetic code. The strong evolutionary pressure to use ATG as start codon relates to its intrinsic properties leading to the highest ribosome binding strength and translation initiation rate compared with other codon sequences15,16. Notably, recent advances in proteomic and ribosome profiling techniques highlighted thousands of eukaryotic, archaeal and prokaryotic ORFs harbouring non-ATG start codons17,18,19,20. In bacteria, GTG and TTG are respectively found in ~12% and ~8%, which makes them some of the most common alternative start codons18. Mechanistically, and despite recruiting the same N-formyl methionyl-tRNA21,22, the use of GTG or TTG start codons leads to a suboptimal translation efficiency reflected by an 8- to 12-fold decrease in the expression rate compared with the use of ATG15,23. Considering the significant effect of a single nucleotide change in the start codon on the gene expression level, what benefit organisms get from such a suboptimal translation system and how this is reflected in vivo remain unclear.

Here we aimed to explore the role of the non-canonical start codons in bacterial metabolic adaptation within the context of gut colonization.

Results

Non-ATG start codons in E. coli carbohydrate regulators

Given their direct function in metabolic adjustment, we focused our attention on the start codon sequences found in metabolic gene regulators. We analysed the occurrence of non-ATG start codons within characterized carbohydrate utilization regulator genes in 10,643 E. coli genomes (Extended Data Fig. 1a, Supplementary Table 1 and Methods)24. Out of the 32 regulator genes subjected to analysis, 13 featured non-ATG start codons (Fig. 1a). Notably, rbsR, murR, mltR, malT, gntR, gatR, fucR, araC and alsR showed a dual presence of both ATG and non-ATG start codons, arranged in distinct patterns. By contrast, lacI, rhaR, mlc and cra carry almost exclusively a non-ATG start codon, suggesting a strong evolutionary preference for unconventional start codons in these metabolic gene regulators. To explore that phenomenon and considering the established literature on the lactose operon, we focused our effort on studying the function of the non-canonical start codon in the E. coli lacI gene. lacI codes for a negative regulator tightly regulating the lactose utilisation operon lacZYA by binding to its promoter and preventing gene transcription in the absence of lactose25,26. Upon closer examination of the lacI start codon sequences encoded in E. coli strains, we found that more than 99% are predicted to be GTG (Fig. 1b). Specifically, the vast majority of analysed E. coli strains harbour multiple potential GTG start codons, located in frame and within ten nucleotides (Fig. 1c). To our surprise, an additional scenario arose in which ATG and GTG codons were in close vicinity (Fig. 1c). For the analysis of these ambiguous cases, we selected the start codons with the highest prediction scores and calculated their relative fractions (Methods). Genomes with a predicted lacI ATG start codon were spread throughout the phylogenetic diversity within a subset of E. coli strains (Extended Data Fig. 1b), suggesting that the preference for the GTG or ATG start codon in lacI is attributable to selection and not genetic drift. This prompted us to mechanistically investigate the benefit associated with the lacI GTG start codon in the natural habitat of a gut commensal E. coli strain. For this purpose, we selected the recently isolated E. coli 8178 mouse strain as our model organism27.

Fig. 1: Start codon distribution in metabolic gene regulators across E. coli genomes.
figure 1

a, The start codon sequence of 32 carbohydrate metabolism regulator genes was analysed across 10,643 E. coli genomes. The distribution of the start codon sequence is indicated for each regulator. Non-ATG start codons were classified as ‘other’. Of the predicted start codons in the lacI gene, 99% are non-canonical in Escherichia genomes. b, Among the different non-ATG combinations, GTG was exclusively found to initiate lacI translation in E. coli strains. c, The lacI sequence of two E. coli strains (SMN152SH1 and K-12 MG1655) is provided to emphasize cases with multiple ambiguous start codons. In the top half of the boxes, the ribosome binding site (RBS), and the stop and predicted start codons are indicated. The prodigal output exposing the prediction score for each potential start codon, as well as information about the length of proteins and ribosome binding sites are depicted in the bottom half.

Functional lacZYA promotes E. coli 8178 growth in vivo

E. coli 8178 is a murine commensal strain (NCBI reference sequence: NZ_JAEFCJ010000001.1)28 that encodes for a complete and functional lacZYA operon (Extended Data Fig. 2a,b). The lacZYA operon repressor lacI (NCBI reference sequence: WP_000805859) harbours an ambiguous and rare ATGTG sequence composed of both the ATG and GTG start sites, out of frame (Extended Data Fig. 2c). Translation from the commonly universal ATG codon leads to immediate termination (Extended Data Fig. 2c). Conversely, translation from the GTG reading frame generates a full and functional protein that actively represses the lacZYA operon in the absence of lactose, with this repression relieved by increasing concentrations of lactose (Fig. 2a,b and Extended Data Fig. 2c). To study the advantage conferred by lactose metabolism in vivo, we used 129S6/SvEvTac mice harbouring a complex specific pathogen-free (SPF) microbiota. The microbiota of these mice is devoid of endogenous commensal E. coli, allowing us to monitor E. coli 8178 gut colonization properties without related endogenous strains29. Mice were treated with a single dose of streptomycin (25 mg, by gavage) 1 day before inoculation to temporally disrupt the gut microbiota and facilitate E. coli 8178 colonization (Fig. 2c)30. Inoculation with a 1:1 mixture of the wild type (WT) and lacZ-deficient E. coli 8178 strains followed by bacterial enumeration from faecal samples revealed a growth advantage of the WT over the lacZ mutant (normalized competitive index (C.I) < 1) (Fig. 2d). To further probe the role of lactose metabolism in the competitive growth of E. coli 8178, we inoculated an additional group of mice that were provided drinking water supplemented with lactose. For this purpose, we opted for a concentration of 3% (w/v) lactose, which falls within the range of lactose concentrations found in human (6–8%), cow (5%) and mouse (3–4%) milk31,32,33. While lactose supplementation did not impact the total load of E. coli in the gut (Extended Data Fig. 2d), the increased lactose intake aggravated the growth disadvantage of the lacZ-deficient strain as reflected by a reduced C.I (Fig. 2d). Expression of lacZ in trans partially restored the optimal growth of the lacZ mutant (Fig. 2e), which ruled out any indirect negative effect of the gut microbiota and indicated that lactose metabolism in E. coli 8178 plays an important role during intestinal colonization. Overall, these data show that the lacZYA operon of E. coli 8178 functions effectively, promoting gut luminal growth in the murine gut under these conditions.

Fig. 2: Lactose metabolism facilitates E. coli 8178 growth in the mouse intestine.
figure 2

a, Detection of LacI expression in E. coli 8178. Soluble extracts of E. coli 8178 strains (WT and lacI mutant) were subjected to a 12.5% acrylamide SDS-PAGE analysis followed by an immunodetection step using an anti-LacI antibody. Detected LacI is indicated on the right. The molecular weight marker (in kDa) is indicated on the left. The western blot was performed independently twice, and a representative experiment is shown. b, β-galactosidase activity assays. WT and lacI-deficient E. coli 8178 strains were grown for 8 h on a minimal medium supplemented with glucose (20 mM) or lactose (10 mM or 100 mM). Bacterial cells were collected and subjected to a β-galactosidase activity assay. The β-galactosidase activity is expressed in Miller units (MU). Activities are the mean of six biological replicates, and the error bars indicate standard deviation. c, Experimental scheme. Streptomycin-pretreated SPF 129S6/SvEvTac mice were inoculated with an equal mixture of the E. coli 8178 WT and lacZ-deficient strains. The drinking water was supplemented (3% w/v) or not (0%) with lactose. d, Competitive experiment. The WT and lacZ-deficient strains were used to inoculate streptomycin-pretreated 129S6/SvEvTac mice supplemented (3%) or not (0%) with lactose in the drinking water. The WT and lacZ counts were determined by selective plating. The normalized C.I is calculated as the ratio between the mutant (lacZ) and WT divided by the ratio of both strains in the inoculum. e, Complementation experiments. The lacZ mutant was transformed with a plasmid constitutively expressing the lacZ gene. The fitness of the resulting strain (lacZ + p-lacZ) was analysed through a competitive experiment in streptomycin-pretreated 129S6/SvEvTac mice supplemented with lactose (3%) in the drinking water. d,e, The x axis represents the time post-inoculation (in days). The bars represent the median, and the dotted lines represent the C.I expected for a fitness-neutral mutation. The results from at least two independent replicates are shown. b,d,e, Two-tailed Mann–Whitney U tests were used to compare two groups in each panel. NS, not significant (P ≥ 0.05); **P < 0.01; ***P < 0.001.

Source data

Microbiota and diet composition modulate E. coli 8178 lacZ fitness in vivo

Recent studies have presented inconclusive findings on the role of lactose in E. coli gut colonization34,35,36,37. To assess whether lactose-related fitness of E. coli 8178 was a general trait or context dependent, we analysed the E. coli 8178 lactose-dependent growth competitiveness in C57BL/6 mice. This commonly used mouse harbours a different E. coli-free SPF microbiota compared with 129S6/SvEvTac animals29. C57BL/6 mice were pretreated with streptomycin and inoculated with a 1:1 mixture of the WT and lacZ strains (Fig. 3a). Similar to our findings in 129S6/SvEvTac animals, E. coli 8178 reached a high level of gut colonization, with the lacZ mutant exhibiting a fitness defect (Fig. 3b (black symbols) and Extended Data Fig. 3a). However, the addition of lactose (3%) to the drinking water did not exacerbate the competitiveness of the WT over the lacZ strain in this mouse model (Fig. 3b). This was achieved at a higher lactose concentration (8%), showing a different response to lactose supplementation in C57BL/6 compared with 129S6/SvEvTac mice (Fig. 3c and Extended Data Fig. 3a). To explore the potential role of the gut microbiota in the lactose-dependent fitness defect of a lacZ mutant, we used germ-free C57BL/6 mice (Fig. 3a). Inoculated germ-free C57BL/6 mice showed a competitive defect of the lacZ mutant, which, in contrast to SPF colonized C57BL/6 animals, was significantly exacerbated upon 3% lactose supplementation (Fig. 3b (orange symbols) and Extended Data Fig. 3b). As opposed to SPF C57BL/6 mice, germ-free mice are inherently permissive to gut colonization and, as a result, were not subjected to antibiotic pretreatment. To verify that the difference in the lacZ-associated fitness between germ-free and antibiotic-pretreated SPF C57BL/6 mice arises from the microbiota rather than the antibiotic effect on the host, we evaluated the C.I of the lacZ mutant in streptomycin-pretreated germ-free mice. As expected, the C.I of the lacZ mutant remained unchanged between antibiotic-pretreated and non-pretreated germ-free animals when exposed to 3% lactose in the drinking water (Extended Data Fig. 3c,d). Combined, these observations show that the reliance of E. coli 8178 on the lacZYA operon varies in a microbiota-dependent manner.

Fig. 3: Lactose-dependent E. coli 8178 growth in the murine gut is microbiota and diet dependent.
figure 3

a, Experimental scheme. Streptomycin-pretreated SPF 129S6/SvEvTac or C57BL/6 mice were colonized with an equal mixture of the E. coli 8178 WT and lacZ-deficient strains. No antibiotic pretreatment was performed on C57BL/6 germ-free mice before inoculation. The drinking water was supplemented (3%) or not (0%) with lactose. b, Influence of the mouse microbiota on E. coli 8178 lacZ mutant fitness. The E. coli 8178 WT and lacZ mutant strains were used to colonize streptomycin-pretreated SPF 129S6/SvEvTac, streptomycin-pretreated SPF C57BL/6 or C57BL/6 germ-free mice. The counts of the WT and lacZ mutant strains were determined by selective plating from faecal samples and used to calculate the C.I at day 3 post-inoculation. c, Increased lactose supplementation exacerbates the lacZ-associated growth defect in streptomycin-pretreated SPF C57BL/6 mice. The C.I of the E. coli 8178 lacZ strain in antibiotic-pretreated and inoculated SPF C57BL/6 mice is indicated at day 3 post-inoculation. The drinking water was supplemented with lactose at a concentration of 5% or 8%. d, Lactose proportion is heterogeneous across diets. SPF and germ-free facility chows (n = 8 pieces) were subjected to lactose measurement assays. The mean and error (standard deviation) are represented for each diet. e, E. coli 8178 lactose-dependent fitness is influenced by the diet composition. The C.I of the lacZ mutant at day 3 post-inoculation was assessed in streptomycin-pretreated SPF 129S6/SvEvTac mice fed with different mouse chows (standard or germ-free facility diets). b,c,e, The bars represent the median, and the dashed lines represent the CI expected for a fitness-neutral mutation. Two-tailed Mann–Whitney U tests were used to compare two groups in each panel. NS, not significant (P ≥ 0.05); *P < 0.05; **P < 0.01. The results from at least two independent replicates are shown.

Source data

In contrast to antibiotic-pretreated SPF 129S6/SvEvTac and C57BL/6 mice, the importance of lactose utilization by E. coli 8178 in germ-free animals became evident only at day 3 post-inoculation (Fig. 3b and Extended Data Fig. 3e). To maintain their status, C57BL/6 germ-free mice were housed in a controlled environment (germ-free facility) until the experiment began, at which point they were transferred to the SPF facility (day 0). Considering that the diet composition in our germ-free facility (germ-free facility diet) was slightly different from that in our SPF facility (standard diet; Methods), we wondered whether this would be the origin of the observed E. coli 8178 lactose-dependent fitness discrepancy between SPF and germ-free animals. We measured the level of lactose in both diets and found a significantly higher amount of lactose in SPF chow than in the germ-free diet (Fig. 3d). To confirm the diet effect on E. coli 8178 lacZ fitness, we inoculated 129S6/SvEvTac mice fed with the germ-free facility diet and found that the fitness advantage of the WT over the lacZ mutant strain was lost compared with mice that were kept on the standard diet (Fig. 3e). By contrast, we could restore the competitive advantage of the WT over the lacZ mutant strain by providing lactose-supplemented water to mice fed with germ-free-facility diet (Fig. 3e). These findings indicate that variation in lactose content between diets can lead to different lacZ fitness outcomes.

lacI GTG start codon enhances lactose utilization in vitro

We next deciphered the fitness of the non-canonical GTG start codon in lacI and compared it with its more common counterpart, ATG. We selected the lacI start codon based on the sequence similarity to E. coli K-12 and the extensive body of functional and structural validation in the literature38,39,40. By targeting the historically established LacI start codon, we expected to alter the protein expression level without generating isoforms. The selected native GTG lacI start codon was subsequently replaced by ATG in E. coli 8178, resulting in the ATGlacI strain. In addition, we created a lacI-deficient mutant and a GTGlacI strain from the ATGlacI mutant by reverting the lacI ATG start codon into the original GTG, serving as a complementation step to further validate our functional studies. Western blot analyses on the native LacI protein indicated a higher abundance of LacI in the ATGlacI mutant compared with the WT and GTGlacI (Fig. 4a), confirming the effect of the start codon mutation on protein expression. To assess the fitness associated with this codon change, we performed in vitro growth assays. The WT, ATGlacI, GTGlacI and lacI-deficient strains were initially cultured in a defined medium supplemented with glucose, followed by incubation under similar conditions using either glucose or lactose as the sole carbon source. No fitness defect was observed when glucose was added to the medium, indicating that none of these mutations has a global effect on bacterial growth (Extended Data Fig. 4). In the presence of lactose, all strains showed similar growth rates. However, we observed distinct lag phases after the switch from glucose to lactose, which were markedly shortened in the lacI, WT and complemented GTGlacI mutants, compared with the ATGlacI strain (Fig. 4b). This effect was further exacerbated when bacteria were incubated in a nutrient-rich lysogeny broth (LB) medium before transitioning to minimal media supplemented with lactose (Fig. 4c). Under these conditions, the lacI mutant adapted and grew the fastest. The WT and GTGlacI strains exhibited a significantly longer lag phase that was further extended in the ATGlacI strain. To mechanistically explain the observed variations in the lactose-adaptation phase, we designed a gfp-based transcriptional reporter system and measured the expression level of the lacZYA operon in the WT, GTGlacI, ATGlacI and lacI strains. Bacterial cells were incubated with glycerol and isopropyl-β-d-thiogalactopyranoside (IPTG), an allolactose analogue. Among all tested strains, the lacI mutant showed a high and a consistently unchanged gfp signal across all tested IPTG concentrations, which illustrated the derepression of the lacZYA operon in this strain (Fig. 4d). At the highest IPTG concentration tested (500 µM), the WT, GTGlacI and ATGlacI strains showed reporter signals similar to the lacI background, indicating that the full derepression of the lacZYA operon is achieved under these conditions. Notably, the gfp signal resulting from the lacZYA operon transcription in the WT and GTGlacI strains was consistently higher than in the ATGlacI mutant at any tested intermediate IPTG concentration (1 µM, 5 µM, 20 µM, 100 µM). This trend held true even in the absence of IPTG, showing that the utilization of the GTG start codon in lacI was sufficient to enhance the sensitivity and basal expression of the lacZYA operon (Fig. 4d). These observations indicated that the decreased LacI protein level caused by the GTG start codon alleviated the repression of the lacZYA operon, allowing cells to adapt faster to lactose consumption. Whole genome sequencing analysis verified that the observed phenotypes were indeed linked to the mutation of the lacI start codon in the ATGlacI strain, ruling out the involvement of other secondary mutations.

Fig. 4: The lacI non-canonical GTG start codon fosters lactose utilization in vitro.
figure 4

a, Effect of the start codon sequence on the LacI expression level. Soluble extracts of E. coli 8178 strains (WT, ATGlacI, GTGlacI and lacI mutant) were subjected to a 12.5% acrylamide sodium dodecyl-sulfate polyacrylamide gel electrophoresis (SDS-PAGE) analysis followed by an immunodetection step using an antibody specific against LacI (upper gel) or the subunit-β of the RNA polymerase (lower gel). The detected LacI protein is indicated on the right. The molecular weight marker (in kDa) is added on the left. The western blot was performed independently twice and a representative experiment is shown. b,c, In vitro growth assays. The indicated strains were individually grown on a minimal medium supplemented with lactose (10 mM). Bacterial growth was quantified by measuring the optical density (OD) at 600 nm (OD600). Strains were previously incubated overnight in a minimal medium supplemented with glucose (20 mM) (b) or in a rich LB medium (c). The solid coloured lines represent the means of biological triplicates, and the error bars represent standard deviation. d, The lacZ expression level is influenced by the lacI start codon sequence. The lacI, WT, ATGlacI and GTGlacI strains were transformed with the p-PlacZ-gfp reporter plasmid. The strains were incubated in a minimal medium supplemented with glycerol (50 mM) in the presence of the indicated IPTG concentrations. The gfp signal resulting from lacZ expression at mid-exponential phase represents the mean of a biological triplicate and is expressed as the mean fluorescence intensity (MFI) normalized by the bacterial density (optical density unit: uOD600). The error bars represent standard deviation. Student t-tests were used to compare two groups in each panel. NS, not significant (P ≥ 0.05); *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001.

Source data

lacI GTG start codon increases E. coli 8178 fitness in vivo

Given the importance of lactose in supporting E. coli 8178 growth in vivo, we speculated that the utilization of a non-canonical lacI start codon would similarly enhance E. coli 8178 fitness in the murine gut. To test this hypothesis, we colonized streptomycin-pretreated 129S6/SvEvTac mice with an equal mixture of the WT and ATGlacI strains and assessed their C.I. When mice were fed a standard diet supplemented with 3% lactose in the drinking water, we found that the WT strain outperformed the isogenic ATGlacI mutant by approximately 30-fold within 3 days (Fig. 5a). Notably, the ATGlacI mutant did not go extinct but rather stably coexisted (albeit at a lower proportion) with the WT strain when the experiment was prolonged to 2 weeks (Extended Data Fig. 5a,b). By contrast, the complemented GTGlacI mutant had a similar competitive fitness as the WT strain (Fig. 5a and Extended Data Fig. 5a,b). It is worth noting that the load of the GTGlacI strain tends to be slightly lower than that of the WT. This can be attributed to the growth defect associated with antibiotic-based selection of the mutant, whereas the WT count is deducted from the total load on antibiotic-free plates. Altogether, we found that a single nucleotide change in the lacI start codon is sufficient to alter the fitness of isogenic E. coli 8178 strains carrying identical sets of metabolic genes in vivo. Strikingly, the fitness advantage of the WT over the ATGlacI strain diminished in mice that were kept on germ-free chow, showing that the increased competitiveness conferred by the lacI GTG start codon is contextual and depends on the presence of lactose in the environment (Fig. 5a and Extended Data Fig. 5a,b). We next explored how the initial mutant frequency influences the impact of the lacI ATG start codon mutation. For this aim, we inoculated streptomycin-pretreated 129S6/SvEvTac mice with a mixture of the WT and ATGlacI strains at varying ratios and evaluated the mutant fitness over a 2-week experiment course in the presence of lactose (3%) in the drinking water. When placed in excess, the ATGlacI mutant was progressively outnumbered by the WT strain, highlighting a fitness disadvantage conferred by the ATG lacI start codon even at a high frequency (Fig. 5b and Extended Data Fig. 5c,d). Interestingly, the fitness of the ATGlacI mutant remained unchanged when diluted a thousand-fold with the WT strain. This indicated that this mutation, at a lower frequency, appears to be no longer detrimental under our conditions (Fig. 5b and Extended Data Fig. 5c,d).

Fig. 5: A single nucleotide change within the lacI start codon impacts E. coli 8178 fitness in the gut.
figure 5

a, The C.I of the ATGlacI strain (ATGlacI/WT) or GTGlacI mutant (GTGlacI/WT) in the streptomycin-pretreated 129S6/SvEvTac mouse model is indicated. Mice were kept under different regimes (standard diet coupled with lactose supplementation in the drinking water or germ-free facility diet without supplemented lactose) to analyse the effect of lactose in the bacterial fitness. b, Mice were inoculated with the WT and ATGlacI strains at different ratios (indicated). The C.I of the ATGlacI mutant to the WT strain is indicated for each dilution tested. Mice were kept under a standard diet supplemented with lactose (3%) in the drinking water. c, Barcoded E. coli 8178 strains carrying different genetic backgrounds (WT, lacI, lacZ ATGlacI, GTGlacI) were equally mixed and used to orally inoculate streptomycin-pretreated SPF 129S6/SvEvTac mice kept under a standard diet supplemented with lactose 3% or on a germ-free facility diet. The different tagged E. coli 8178 populations were quantified by qPCR from faecal samples. The fitness of each single mutant is represented as a CI. The x axis represents the C.I of each indicated mutant compared with that of the WT strain. ac, The bars represent the median, and the dotted lines represent the C.I expected for a fitness-neutral mutation. The results from at least two independent replicates are shown. Two-tailed Mann–Whitney U tests was used to compare two groups in each panel. NS, not significant (P ≥ 0.05); *P < 0.05; **P < 0.01.

Source data

Finally, given the slight alleviation of the lacZYA repression in the WT strain compared with the ATGlacI mutant, we reasoned that E. coli 8178 fitness in vivo would, similarly to our in vitro observation, be negatively correlated with the degree of lacZYA repression. To explore the potential correlation between the bacterial growth fitness and the level of lacZYA repression, we inoculated 129S6/SvEvTac mice with a 1:1:1:1:1 mixture of barcoded versions of the lacI-deficient strain, WT, GTGlacI, ATGlacI and the lacZ mutant as an internal control. The relative abundance of each mutant was assessed through the detection of strain-specific fitness-neutral 40 bp barcodes41 (Extended Data Fig. 5e). In the lactose-supplemented group, the normalized abundance of the lacZ, ATGlacI and lacI strains at 3 days post-inoculation was below 1, revealing a growth defect compared with the WT strain under these conditions (Fig. 5c). In contrast to our in vitro results, these data indicate that after 3 days of colonization, the complete derepression of the lacZYA operon is as detrimental as the elevated LacI expression resulting from the restoration of the canonical ATG start codon in the ATGlacI strain. The fitness of the lacI and ATGlacI mutants stabilized throughout the 2 week duration of the experiment, highlighting the coexistence of multiple lactose-consuming variants, with the WT prevailing under our conditions. Contrasting with the lactose-supplemented group, the fitness defect of the ATGlacI strain was less pronounced under lactose-limited conditions (Fig. 5c and Extended Data Fig. 5f). Taken together, these observations highlight the frequency- and context-dependent metabolic benefit attributed to the lacI start codon in E. coli 8178.

Non-ATG start codons in Enterobacteriaceae carbohydrate regulators

Building upon our functional studies, we next analysed the sequence of the lacI start codons across 16 genera within the Enterobacteriaceae family to investigate the conservation of similar mutations in related species (Fig. 6a). Despite the conventional characterization of Salmonella species as ‘Lac-negative’, all Salmonella species selected for this analysis were ‘Lac-positive’. Unusual lacI start codons were heterogeneously distributed across nearly all the selected genera. A closer examination at the sequence level indicated that, in most instances, the predominant non-ATG start codon found within the Enterobacteriaceae lacI genes was GTG (Fig. 6a). Aside from Escherichia, more than 80% of the lacI start codons are predicted to be GTG in the Shigella and Citrobacter genomes analysed, suggesting a benefit associated with the lacI GTG start codon that can be extended—in a strain-dependent manner—to other bacterial genera. Examination of the different genome phylogenies revealed that the distribution of the different start codons within the different lineages is not monophyletic. This observation indicated that appearance and distribution of non-canonical start codons in lacI resulted from multiple and independent evolutionary events, further emphasizing the notion of contextual fitness associated with the choice of start codon sequences. We expanded our bioinformatic analysis to include genes coding for the main transcriptional regulators of carbohydrate utilization (Supplementary Table 1 and Fig. 6b). Upon scrutinizing the occurrence of non-ATG start codons at the Enterobacteriaceae family level, we found that all examined metabolic regulator genes showed alternative start codons in at least a subset of strains within specific genera. Notably, non-canonical start codons were further enriched in cra and mlc, starting exclusively with a GTG start codon in almost all the genera analysed (Fig. 6b). Given the potential benefit associated with non-canonical start codons in carbohydrate metabolism, we next wondered whether such distribution would show a similar pattern among carbohydrate-unrelated regulators. To explore this, we carried out an analysis of the start codons in 29 additional regulators that do not appear to be exclusively involved in carbohydrate metabolism (Supplementary Table 2). The overall proportion of non-canonical start codons was significantly lower in carbohydrate-unrelated compared with carbohydrate-related regulators (Fig. 6c and Extended Data Fig. 6a). This trend persisted across positive, negative and bidirectional regulators (Extended Data Fig. 6b). Taken together, these observations suggest that non-canonical start codons have a wider range of functional advantages beyond lactose utilization in E. coli 8178, extending to carbohydrate metabolic pathways within the Enterobacteriaceae family.

Fig. 6: Distribution of metabolic regulator start codons across the Enterobacteriaceae family.
figure 6

a, Non-ATG start codons are conserved among several genera of the Enterobacteriaceae family. Shigella and Citrobacter genomes have a similar level of lacI GTG start codon conservation as Escherichia. The bacterial families were grouped according to their phylogenetic lineage. b, Alternative start codons are found in multiple carbohydrate-related regulators within the Enterobacteriaceae family. The data shown in this figure rely on existing annotations, which can be ambiguous or missing. The conservation level is represented by a circle-filled colour code that varies depending on the regulator and genus. c, Percentage of non-ATG start codons in carbohydrate-related (n = 32) and carbohydrate-unrelated (n = 29) transcriptional regulators in Enterobacteriaceae. A two-tailed Mann–Whitney U test was used to compare the two groups (****P = 1.5 × 10−11). Data are shown as median (black horizontal line) and 25% and 75% percentiles (hinges). Whiskers extend from the hinges to the maxima and minima, no further than 1.5× distance of the interquartile range.

Discussion

Non-ATG codons are widespread across genomes, and the evolutionary pressures driving their selection remain debated. Translation from unusual and differently placed start codons in eukaryotic genomes has been proposed to contribute to the generation of protein isoforms42. In bacteria, it is yet to be determined whether the presence of multiple alternative start codons in close proximity is associated with the generation of protein variants. Recent works in E. coli proposed stress-related functions to non-ATG start codons owing to their high prevalence in genes ensuring essential processes43. However, this study offered only a partial view of the function of non-canonical start codons, as many of them are also found in non-essential genes44. Our computational analysis highlighted a significant conservation level of various non-ATG start codons among known carbohydrate utilization regulators across the Escherichia genus and the broader Enterobacteriaceae family (Figs. 1a and 6b). Yet, upon closer examination, we also observed cases in which ATG start codons were more prevalent or even exclusively used in certain strains and regulators. From this distribution, it became evident to us that the choice between ATG and non-ATG start codons is influenced by distinct evolutionary pressures determining the preference for the universal or atypical start codons. This specific evolutionary selection presents an intriguing avenue for future research. Focusing on the lactose utilization operon, we found that the GTG start codon was predominantly found in 99% of the analysed start codon sequences of the E. coli 8178 lacI gene (Fig. 1b). Lactose is primarily found in the mammalian neonate gut where it can benefit E. coli intestinal colonization early in life45. Supporting this hypothesis, mutations abrogating LacI production are selected for via mother-to-offspring microbiota transmission experiments46. Nevertheless, introduction of solid food can be detrimental for lactose-dependent long-term resilience of E. coli in adults. In line with this hypothesis, various studies revealed that E. coli does not rely on lactose utilization to colonize the mouse intestine34,36. By contrast, our competitive experiment data suggested not only that E. coli 8178 relies on lactose metabolism to efficiently colonize the mouse gut but also that such a phenotype was highly variable depending on the mouse microbiota and diet composition37. This conditional importance may result from lactose heterogeneity in diets, which, along with the variations in the model organism, could explain distinct outcomes observed in studies examining the importance of lactose in the intestinal colonization of E. coli.

To study the role of the lacI GTG start codon in E. coli 8178, we introduced a single point mutation (G→A) precisely at the known lacI start codon position. By doing so, we anticipated that both the WT and ATGlacI strains would produce identical LacI protein sequences but in variable amounts. Using a lactose-rich condition, we found that the growth of both the lacI and ATGlacI mutants was impeded compared with that of the WT. This indicated that reduction of the LacI protein level by the GTG start codon confers optimal growth parameters in this environment. Indeed, utilization of the canonical ATG codon in lacI (ATGlacI) generates a higher repression level reducing the responsiveness of the lactose utilization operon when switching to a lactose-dependent growth phase (Fig. 4c,d). Conversely, complete derepression of the lacZYA operon (lacI mutant) becomes detrimental for bacterial growth as lactose is not the sole and primary carbon source in the adult murine gut46,47,48. The growth advantage of the WT was, albeit at a lesser extent, also visible in the lactose-limited germ-free food condition, implying that the expression of the lactose utilization operon in the WT strain was optimally balanced to prevent any fitness burden. From these observations, we concluded that selection of the GTG start codon in lacI results in an optimal metabolic fitness balancing between the full derepression (lacI) and stringent repression (ATGlacI) of the lacZYA operon, allowing cells to transition faster to lactose-dependent growth (Extended Data Fig. 7). Finally, it is worth mentioning that the strong preference for the GTG start codon in lacI, and in certain other sugar utilization regulators, may indicate a potential function of non-canonical start codons that expands beyond translation processes. Further study is required to investigate this hypothesis thoroughly. The disadvantage associated with the lacI ATG start codon was primarily visible in the early stages of gut colonization (days 1–3 post-inoculation). Beyond this period, the ATGlacI strain did not go extinct but rather persisted at a stable proportion (Extended Data Fig. 5a). When coupled with the observed neutral fitness of the ATGlacI mutant at a lower proportion (Fig. 5b and Extended Data Fig. 5c), we concluded that the lacI ATG start codon is subjected to a frequency-dependent selection. This phenomenon, similarly observed for the galactitol and sorbitol metabolism49,50, may provide a reasonable explanation for the relatively low frequency of the ATG lacI start codon observed in the analysed E. coli genomes (Fig. 1). Furthermore, the observed long-term coexistence of the WT and ATGlacI and lacI strains indicated the maintenance of a polymorphism, with the WT strain prevailing under our conditions. The simultaneous presence of different lactose metabolism variants, albeit at different proportions, concurs with recent findings highlighting the coexistence of different genotypes through negative frequency-dependent selection50,51,52,53. This suggests that, beyond gene regulation, the long-term preservation of different lactose metabolism variants may play an important role in the bacterial adaptation to the fluctuating conditions of the gastrointestinal tract.

Besides depicting the in vivo advantage associated with the lacI GTG start codon, our study provides a conceptual example of how bacteria can fine-tune their metabolic properties to be at an advantage over their closely related competitors. In a similar fashion, recent work on epidemic Clostridioides difficile ribotypes highlighted the presence of a single point mutation on the trehalose repressor, which abrogated the repression of the trehalose utilization operon and conferred an increased growth competitiveness under low trehalose concentrations54. In contrast to the trehalose utilization, which is completely derepressed in certain C. difficile strains, our experimental data show that an alternative strategy has been selected for the lactose operon in which repression is alleviated by an unusual start codon in E. coli 8178. These two examples illustrate how gene regulation can affect metabolic capacities and demonstrate the generalizability of such metabolic rewiring strategy. In that perspective, our bioinformatic analysis pinpointed several metabolic regulators in which non-ATG codons were enriched. Some of them, such as UxuR, GatR, FucR and RhaR, play an important function in carbohydrate metabolism and foster S. Tm, Citrobacter rodentium and E. coli growth in the gut10,55,56,57,58,59. To a greater extent, the cra and mlc genes almost exclusively adopted GTG start codons, suggesting a role of non-canonical start codons extending to other metabolic pathways within the Enterobacteriaceae family level.

Methods

Strain, media and chemicals

All strains, plasmids and oligonucleotides used in this study are listed in Supplementary Tables 35. The physiological function of the non-canonical LacI start codon in intestinal colonization was assessed using the recently isolated murine gut commensal E. coli 8178 strain27. Bacterial strains were routinely grown in LB supplemented or not with bactoagar (1%). Plasmid maintenance and mutant selections were performed via appropriate antibiotic addition: streptomycin (100 μg ml−1), ampicillin (100 μg ml−1), kanamycin (50 μg ml−1) and chloramphenicol (30 μg ml−1). Insertion of fitness-neutral barcodes and deletion of the lacZ gene into E. coli 8178 were achieved using a modified version of the lambda red recombinase-dependent one-step inactivation procedure60. Briefly, the antibiotic resistance cassette (kanamycin for gene deletion, ampicillin for barcode insertion) was PCR amplified using primer pairs carrying a 50-nucleotide extension homologous to the adjacent targeted region. Mutants were obtained following the electroporation of the PCR product into E. coli 8178 cells expressing the lambda red recombinase from the pSIM5 plasmid and incubated on selective media61. Gene deletion was confirmed by colony PCR. The lacI point mutation at the native locus was conducted by allelic replacement using the pKO3 suicide vector62. Briefly, the lacI region including the start codon and 500 base pairs upstream and downstream was cloned into the pKO3 plasmid that carried the sacB gene for counter selection on sucrose-containing media. Following the mutagenesis of the start codon by PCR amplification, the plasmid was introduced into E. coli 8178 and we selected on ampicillin plates for mutants with a single-crossover integration of the plasmid into the chromosome. A sucrose-based counter selection, colony PCR and sequencing were applied to identify clones carrying the desired marker-less nucleotide exchange at the lacI start codon.

Animals

Male and female 8- to 12-week-old mice were used in this study and randomly assigned to experimental groups. 129S6/SvEvTac (Jackson Laboratory) and C57BL/6 (Jackson Laboratory) mice were held under SPF conditions in individually ventilated cages at the EPIC mouse facility of ETH Zurich (light–dark cycle 12 h:12 h, room temperature 21 ± 1 °C, humidity 50 ± 10%). C57BL/6 germ-free animals were bred in flexible film isolators under strict exclusion of microbial contamination at the isolator facility (EPIC). All animal experiments were reviewed and approved by Tierversuchskommission, Kantonales Veterinäramt Zürich under licence ZH158/2019, ZH108/2022 and ZH109/2022 complying with the cantonal and Swiss legislation. All mice were maintained on the mouse maintenance diet (SPF diet: Kliba Nafag number 3537). When indicated, mice were shifted to a different diet (germ-free diet: Kliba Nafag number 3302) 1 day before inoculation. For the conditions in which lactose was supplemented, lactose was diluted in 200 ml of drinking water and filtered using syringe filters (0.22 μm). The mouse drinking water was replaced with 3% (w/v) lactose 1 day before inoculation for SPF mice and on the inoculation day for germ-free mice. Lactose was maintained throughout the entire course of the experiment.

Selection of Enterobacteriaceae and E. coli genomes

Genomes classified as Enterobacteriaceae or Yersiniaceae in the progenomes v3 database63 were selected for the analysis. To ensure that the selected genomes are of high quality, CheckM2 (ref. 64) assembly statistics provided by S. Schmidt were used to remove genomes with completeness below 95% and contamination larger than 10%. From this initial selection, exceptionally small genomes compared with those annotated with the same species were excluded and genomes with excessively high numbers of contigs were removed as well. In total, 10,643 E. coli genomes were downloaded and used in the subsequent analysis. To cover the entire family of Enterobacteriaceae, large genera were randomly subset to maximally 100 genomes, while genera with fewer than 10 genomes were discarded. This resulted in a final dataset of 1,158 genomes distributed among 16 genera that remained to be analysed. The identifiers and taxonomic classification of all genomes are provided in Supplementary Table 6.

Start codon analysis in E. coli carbohydrate regulators

A total of 10,643 E. coli genomes were downloaded from the progenomes3 database as described previously63. The sequences of established metabolic regulator genes from E. coli K-12 MG1665 were obtained from the Biocyc database24 and converted to a blast database for each regulatory gene with the command -makeblastdb. A homology search using nucleotide blast (blastn) was performed for each genome against every database, with parameters as follows: -max_target_seqs 100 -evalue 1e-5 -outfmt ‘6 qseqid qstart qend bitscore’65. The best hit and 75 nucleotides on either end were cut out for each genome. Prodigal version 2.6.3 (ref. 66) trained on the E. coli K-12 MG1665 genome was then used to identify potential start codons for each gene. In certain cases, the predicted start codon was ambiguous, meaning that two or more potential start codons were situated in frame and close proximity to each other. In this scenario, we selected all start codons whose scores were within 80% of the best scoring hit. In addition, the predicted genes must span at least 90% of the best-hit sequence length, as otherwise those fragmented proteins would most probably not be functional. If the best hit included a predicted ribosome binding site, only alternative start codons that were also predicted to contain a ribosome binding site were considered. We then calculated the relative fractions of potential start codons for each genome (for example, if a gene in one genome has three potential, high-scoring start codons located in close vicinity, ATG, GTG and GTG, they would count as 0.66 GTG and 0.33 ATG). Python 3.7.6 was used to compute the relative percentages of start codons over all genomes, and bar plots were generated using ggplot2 v3.3.5 in R 3.6.3 (ref. 67). For the phylogenetic tree showing the distribution of ATG and GTG start codons in lacI within E. coli, we chose the two genomes depicted in Fig. 1c and randomly selected an additional 25 genomes with a high-scoring ATG start codon, and 75 genomes without. The phylogenetic tree was constructed using Salmonella as an outgroup in gtdbtk v 2.1.0 (ref. 68) with the parameters ‘gtdbtk de_novo_wf–genome_dir genomes/–outgroup_taxon g__Salmonella–bacteria–taxa_filter g__Escherichia–out_dir de_novo_output_2–cpus 24’ and visualized in iTOL v 6.9.1 (ref. 69).

Start codon distribution in Enterobacteriaceae carbohydrate regulators

Sequences of the selected 61 metabolic regulators (32 carbohydrate-related regulators, 29 carbohydrate-unrelated regulators; Supplementary Tables 1 and 2) were obtained by applying PIRATE v1.0.5 to the dataset of 1,158 Enterobacteriaceae genomes70. In short, PIRATE generates orthologous gene clusters at different sequence identity thresholds using GFF3 files as an input. GFF3 files were generated using Prodigal v2.6.3 (ref. 66). PIRATE was applied with default settings and an inflation parameter of 3 was set for the Markov Cluster Algorithm. Gene clusters containing sequences with annotations of interest, such as metabolic regulators listed in Supplementary Tables 1 and 2, were manually verified using blast (blastx, -max_target_seqs 10000 –evalue 1e-5) against the UniProtKB/Swiss-Prot database71. Manually curated sequences were then merged using the short name resulting in a set of 65,679 sequences for 61 metabolic regulators. Start codon distributions were calculated as described previously with Prodigal v2.6.3 trained on the genome from which the respective sequence originated. Regulators were categorized as negative or positive according to RegulonDB (ref. 72). Regulators influencing gene expression in both positive and negative ways were classified as bidirectional. Exceptions were made only for regulators exhibiting either a positive or a negative effect on a target gene, excluding self-regulation of the transcriptional regulator in this classification. Two-tailed Mann–Whitney U tests were used for statistical analysis of alternative start codon usage between carbohydrate-related and unrelated regulators. P < 0.05 was considered to indicate statistical significance.

Mouse colonization experiments

Mice aged 8 weeks to 12 weeks were orally pretreated with streptomycin (25 mg) 24 h before inoculation. Due to the low intestinal colonization resistance properties of germ-free mice, no antibiotic pretreatment was achieved. E. coli cultures were grown on LB at 37 °C for 4 h and washed twice with a phosphate-buffered saline solution (PBS: 137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4 and 1.8 mM KH2PO). Before colonization, E. coli strains were made streptomycin resistant by transferring the streptomycin-resistance-conferring plasmid pRSF1010 from S. enterica serovar Typhimurium SL1344 (refs. 30,73). Each mouse was orally given a single 50 µl dose containing ~5.107 colony-forming units (cfu) of an inoculum mixture composed of an equal ratio of the indicated strains. Faeces samples were collected 24 h and 72 h post-inoculation. Animals were euthanized by CO2 asphyxiation at day 3 post-inoculation. Faecal samples were suspended in 1 ml PBS and homogenized using a TissueLyser (Qiagen). The bacterial load was determined by plating the suspension on MacConkey or LB agar supplemented or not with antibiotics. In competitive experiments with two strains (WT and mutant), the load of the mutant was directly determined on the antibiotic plate. By contrast, the load of the WT strain was calculated from the total amount of bacteria on an antibiotic-free plate (cfuWT = cfutotal − cfumutant). The load of every single mutant strain was then normalized to the inoculum and used to calculate the normalized C.I to compare their fitness with that of the WT. The C.I of individual mutant in a competitive experiment was determined as the ratio between cfumutant and cfuWT divided by the ratio of both strains in the inoculum.

Fitness measurement of barcoded strains

Faecal E. coli cells were inoculated in 3 ml LB (37 °C, overnight) supplemented with ampicillin to select and enrich for living E. coli barcoded strains. The bacterial cells were pelleted and stored at −20 °C. DNA was extracted from thawed pellets using commercial kits (Qiagen Mini DNA) according to the manufacturer’s instructions. The relative densities of the different barcodes were determined by real-time PCR quantification using tag-specific primers. The obtained ratio was multiplied by the number of cfu recovered from selective plating to calculate the absolute load of each tagged strain.

Lactose measurement

Mice chow was suspended in PBS and homogenized using a TissueLyser (Qiagen). The lactose concentration in the homogenate was determined using a commercially available lactose assay kit (Sigma-Aldrich, MAK017) following the manufacturer’s instructions.

In vitro growth assays

Individual overnight culture of each strain was initiated 1 day before the experiment, either in LB or minimal media supplemented with glucose (50 mM) at 37 °C. The following day, the cultures were washed with PBS and diluted into the appropriate medium to reach a final concentration of 0.025 uOD600 ml−1. A volume of 200 µl of every single culture was distributed in a 96-well plate placed in an Infinite 200 Pro plate reader (Tecan) that automatically measured the OD600 at 37 °C for 24 h.

lacZYA reporter expression analysis

The WT, ATGlacI and GTGlacI strains were previously transformed with the p-PlacZ-gfp plasmid encoding for the lacZ promoter fused to the gfp reporter gene. The resulting strains were individually incubated overnight in minimal media supplemented with glycerol (50 mM). The day after, bacterial cultures were diluted into an identical and freshly prepared medium to reach a final bacterial concentration of 0.025 uOD600 ml−1. A volume of 200 µl of every single culture was distributed in a 96-well plate supplemented with increasing concentrations of IPTG. The bacterial growth and gfp signal were automatically recorded for 24 h at 37 °C by an Infinite 200 Pro plate reader (Tecan). The output was corrected by subtracting the autofluorescence signal and normalizing to the bacterial density (OD).

β-galactosidase activity measurement

The activity of the LacZ β-galactosidase protein was assessed in a WT and lacI background. Briefly, strains were incubated in minimal media supplemented with either lactose or glucose as sole carbon source at 37 °C. After 8 h of incubation, cells were collected and lysed, and the β-galactosidase activity was measured in the presence of o-nitrophenyl β-d-galactopyranoside (Sigma-Aldrich) as described in ref. 74.

Analysis of the intracellular LacI protein level

The WT, lacI-deficient, ATGlacI and GTGlacI strains were grown on LB until reaching the late-exponential phase. Cells were then pelleted, resuspended into a Laemmli loading buffer supplemented with 1 mM β-mercaptoethanol (Sigma-Aldrich) and heated for 10 min at 96 °C before analysis by sodium dodecyl-sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and immunoblotting.

SDS-PAGE and immunoblotting

SDS-PAGE was performed on Bio-Rad Mini Protean systems using standard protocols with homemade 12.5% polyacrylamide gels. For immunostaining, proteins were transferred onto a 0.2 µm nitrocellulose membrane that was then saturated with 5% milk and probed with primary antibodies (anti-LacI: Sigma-Aldrich-05-503-I, dilution 1:1,000; anti-RNA polymerase beta: ThermoFisher-MA125425, dilution 1:1,000). Mouse secondary antibody coupled with alkaline phosphatase (anti-mouse antibody: Sigma-Aldrich AP124A, dilution 1/1,000) was added and developed using a commercialized alkaline phosphatase substrate (Promega, S3841) containing 5-bromo-4-chloro-3-indoyl phosphate and nitroblue tetrazolium.

Molecular biology

Custom oligonucleotides were synthetized using Microsynth and are listed in Supplementary Table 5. PCRs were performed using the Phusion DNA polymerase (Thermo Scientific), and PCR products were purified using the Nucleospin Gel and PCR clean-up mini kit (Macherey-Nagel). The complementation p-lacZ plasmid, suicide plasmids pT2214/pT2215 and lacZ reporter p-PlacZ-gfp were constructed by restriction-free cloning (E5510S Gibson Assembly, New England Biolabs) and verified by DNA sequencing (Microsynth). The QIAprep Spin Miniprep kit (E5510S; Qiagen) was used to extract plasmids from bacterial pellets. Real-time PCR was done using the FastStart Universal SYBR Green Master (Roche) according to the manufacturer’s instructions.

Whole genome sequencing analysis

The genomic DNA of the WT, ATGlacI and GTGlacI strains was extracted from overnight cultures using the NucleoBond (AXG20; Macherey-Nagel) kit. Library preparation and sequencing (ONT and Illumina) were performed by BMKGENE. Long reads were assembled using Flye 2.9.2, and the draft genome was polished with the Illumina reads using CLC Workbench 23.0.1 (‘polish with reads 1.1’; Racon). The closed genomes were annotated with prokka 1.14.5. To find mutations in the ATGlacI and GTGlacI strains, short genomic reads were mapped (‘map reads to reference 1.9’; CLC Workbench) to the closed genome of the parental strain. Basic Variant Detection 2.4 (CLC Workbench) was used to detect mutations.

Statistical analysis

No statistical methods were used to predetermine sample sizes. The current sample sizes (n ≥ 5) are similar to those reported in previous publications75,76. Data collection and analysis were not performed blind to the conditions of the experiments. No animals or data points were excluded from the analyses. Data distribution was assumed to be normal, but this was not formally tested. The statistical analysis and data graphical representation were done using GraphPad Prism 9.2.0 version for Windows (GraphPad Software; www.graphpad.com). When applicable, the unpaired Mann–Whitney U test (comparison of ranks) was used to assess statistical significance when two groups were compared. P < 0.05 was considered to indicate statistical significance.

Material availability

Mouse lines used in this study can be obtained from Jackson Laboratories. Gnotobiotic mice are available upon request.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.