Cage and maternal effects on the bacterial communities of the murine gut

Findings from gut microbiome studies are strongly influenced by both experimental and analytical factors that can unintentionally bias their interpretation. Environment is also critical. Both co-housing and maternal effects are expected to affect microbiomes and have the potential to confound other manipulated factors, such as genetics. We therefore analysed microbiome data from a mouse experiment using littermate controls and tested differences among genotypes (wildtype versus colitis prone-mdr1a−/−), gut niches (stool versus mucus), host ages (6 versus 18 weeks), social groups (co-housed siblings of different genotypes) and maternal influence. We constructed a 16S phylogenetic tree from bacterial communities, fitting random forest models using all 428,234 clades identified. Models discriminated all criteria except host genotype, where no community differences were found. Host social groups differed in abundant, low-level, taxa whereas intermediate phylogenetic and abundance scales distinguished ages and niches. Thus, a carefully controlled experiment treating evolutionary clades of microbes equivalently without reference to taxonomy, clearly identifies whether and how gut microbial communities are distinct across ecologically important factors (niche and host age) and other experimental factors, notably cage effects and maternal influence. These findings highlight the importance of considering such environmental factors in future microbiome studies.

. Distribution of phyla across the phylogenetic tree. Colonic tissue sections from a male wildtype (WT) mouse was stained with a fluorescent DNA probe specific for the 16S rRNA gene to identify bacteria (red), a Muc2 antibody (green) to identify mucus and counterstained with DAPI (blue) (a). A phylogenetic tree of 16S rRNA sequences derived from the gut microbiota of FVB wildtype (WT) mice and mdr1a −/− mice (b). The distribution of major gut phyla are highlighted on the tree: Firmicutes (grey), Bacteroidetes (pink), Proteobacteria (olive), Actinobacteria (red), and Deferribacteres (gold). The same tree is shown, coloured by other criteria, in Supplementary Figure S1. Figure (b) was produced in R 3.6.0 for Windows.
Isolation of genomic DNA. Sample collection and processing was performed as described by Glymenaki et al. 17 . In brief, samples were harvested from mice at two time points, 6 and 18 weeks of age. Stool samples were collected from mice in individual autoclaved cages into sterile tubes and snap frozen on dry ice. Mice were sacrificed via CO 2 inhalation, the proximal colon was cut open and the colonic mucus scraped using cell scrapers and Inhibitex buffer (QIAGEN, Manchester, UK) and snap frozen until use. Genomic DNA was extracted using QIAamp Fast Stool Mini-Kits according to the manufacturer's instructions (QIAGEN).
Histology. Snips of the proximal colon were fixed in Carnoy's solution (60% methanol, 30% chloroform, 10% glacial acetic acid), incubated in two changes of dry methanol (Sigma-Aldrich, Dorset, UK) for 30 min each, followed by absolute ethanol (ThermoFisher Scientific, Paisley, UK) for two incubations at 30 min each. Finally, tissue cassettes were processed in a Micro-spin Tissue Processor STP120 (ThermoFisher Scientific) and immersed in paraffin. Colon snips were embedded in paraffin blocks using a Leica Biosystems embedding station (Leica Biosystems, Milton Keynes, UK), with the luminal surface of the colon exposed for tissue sectioning. 5 µm tissue sections were cut using a Leica Biosystems microtome and adhered to uncoated microscope slides (ThermoFisher Scientific). Slides were dried for 48 h at 50 °C before use. Histological analysis was used to determine that all five of the 18 week-old mdr1a −/− mice had indications of moderate or mild colitis, with a loss of healthy gut architecture 17 . Fluoresence in situ hybridisation (FISH). FISH was performed as described previously 17 . In brief, FISH staining was performed using the universal bacterial probe-EUB338 (5′-Cy3-GCT GCC TCC CGT AGG AGT -3′), followed by immunostaining with a rabbit polyclonal MUC2 antibody and goat anti-rabbit Alexa-Fluor 488 antibody (Life Technologies, Paisley, UK). Slides were imaged using a BX51 upright microscope and a Coolsnap EZ camera (OLYMPUS, Tokyo, Japan) and images were processed using Image J 19 .
16S rRNA gene sequencing processing. 16S amplicon sequencing targeting the V3 and V4 variable regions of the 16S rRNA (341F: 5′-TCG TCG GCA GCG TCA GAT GTG TAT AAG AGA CAG CCT ACGGGNGGC  WGC AG-3′ and 805R: 5′-GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA GGA CTACHVGGG TAT CTA  ATC C-3′) was performed on the Illumina MiSeq platform (Illumina, California, USA) according to manufacturer's guidelines and generated paired-end reads of 300 bp in each direction. DNA from all samples was extracted using the same extraction kit. However, they were sequenced across different runs, with technical replicate samples sequenced multiple times as an internal control between each run. Illumina reads were demultiplexed to remove adapter sequences and trim primers. Illumina paired-end reads were merged together using SeqPrep 20 and submitted to MG-RAST's metagenomics pipeline 21 . Reads were pre-processed to remove low-quality and uninformative reads using SolexQA 22 . The quality-filtering process included removal of reads with low quality ends (i.e. ambiguous leading/trailing bases) and the removal of reads with a read length two standard deviations below the mean. Artificial duplicate reads were then removed based on MG-RAST's pipeline.
The resulting FASTQ files for every sample were merged into a single file of 590,822 sequences to simplify processing, manually adding 3 known Archaeal 16S rRNA sequences from Acidilobus saccharovorans, Sulfolobus tokodaii and Methanobrevibacter smithii. Sequences were aligned using a specialist 16S RNA aligner using the Infernal algorithm 23 , via a web-based interface provided by the Ribosomal Database Project 24 . This file was then manually curated in R 25 . In brief, we determined the first and last position of each base for every sequence. 92% of all sequences started around position 710. 87% of sequences had their last base around position 2800. We therefore trimmed all positions before 710 and after 2800 to remove unnecessary spaces introduced by the aligner. The number of aligned bases in each sequence was then recorded and the distribution of continuously aligned bases was examined. The large majority of sequences (~ 84%) had > 437 bases and so any sequence that had less than 437 continuously aligned bases was discarded. The remaining 496,550 sequences were taken forward for analysis. All sequences were identified using BLAST+ and the top hit for each sequence was recorded 26 . The 'classification' function in the 'taxize' R package 27 was then used to assign full taxonomic information to each identified taxon where possible. Unless otherwise stated, all analyses were performed using custom scripts in R. www.nature.com/scientificreports/ Phylogenetic tree. A phylogenetic tree of all sequences was generated using FastTree 2.1 28 , using the general time reversible (GTR) + CAT model and default parameters. The tree was rooted using the archaeal sequences as an outgroup. Phylogenetic clades were obtained using the ' Ancestor' function in the 'phangorn' R package 29 . A relative abundance matrix, with abundance based on how many times sequences belonging to a phylogenetic clade appeared in a sample, was calculated.
Ordination. Bray-Curtis dissimilarity and Jaccard Index values were calculated among all samples (based on the relative abundance matrix) and used for non-metric multidimensional scaling (NMDS) via the 'MASS' 30 and 'ecodist' R packages 31 , checking to ensure convergence in all cases.
Machine learning. Random forest (RF) models were run using the 'randomForest' package 32 in R. Specifically, the clade relative abundance matrix was used as an input for the RF, using a forest of 100,000 trees and the mtry value was left at default settings (the square root of the number of clades). Separate forests were run to predict whether a sample was 6 or 18 weeks old, whether a sample was stool or mucus, whether it was a WT or an mdr1a −/− sample, what cage the sample was taken from and the mother of each respective offspring. Each forest was controlled for all other treatments (i.e. a random forest predicting age included genotype and microbial niche as explanatory variables, in addition to the generated clades). The 'MeanDecreaseAccuracy' (MDA) value was used as a measure of how important each clade (or treatment) was at predicting treatment information and the out-of-bag (OOB) error rate was used to determine the predictive accuracy of the model. Nodes were ranked based on MDA value, taking the five most important nodes, determining the descendant tips and confirming the identity of the tip sequences via the BLAST+ results 26 . Additionally, the depth of each node was determined using the 'distances' function in the igraph R package 33 . A phylogenetic tree annotated with the resulting information was plotted using the 'plot.phylo' function in the 'ape' package 34 .

Model validation.
In order to validate each model, we included a 'randomised' negative control RF where relative abundances of each node were permuted with respect to each sample and the predictive accuracy was assessed. In addition, we took the relative abundances of an important node for age and redistributed the abundance to only WT samples. The RF was repeated to investigate whether this node would appear as important for genotype. We also ran RF's with an increasing number of trees, using three different random seeds and performed Spearman's Rank correlation on the MDA values obtained among each set of three RFs of the same size. The Monod/Michaelis-Menten model was fitted, to determine how an increasing number of trees affected correlation of the MDA values. Finally, we included technical replicates of one stool sample that was used as an internal control between sequencing runs, in our forest models. We examined the MDA values for all the clades in each of these replicates to see how tightly correlated they were. Additionally, we compared the predictive accuracy of the RF model when using these different replicates.
Statistical analysis. Analysis of the real vs null RF models predictive accuracy was performed using 2-way ANOVA, with a Sidak's post hoc test in GraphPad Prism 8 (GraphPad Software, La Jolla, USA). Permutational multivariate analysis of variance (PERMANOVA) was used to determine interactions between taxa and treatment groups, using the 'adonis' function in the vegan R package 35 . Effects of age, niche, genotype and all possible interactions were considered. All significant effects (P < 0.05) are reported.

Results
Phylogenetic tree of 16S rDNA data derived from the gut microbiota. Microbiota  Strong separation of the gut microbiota by microbial niche, age and cage but not host genotype. To avoid bias by taxonomic level, we constructed a data matrix comprising the relative abundance [(number of tips in clade in sample)/(total number of tips in sample)] of clades corresponding to all 428,234 internal nodes of the phylogenetic tree in each of our samples. This avoided assigning OTUs, or using a reference database. To visualise the major differences in the microbial communities in an unsupervised fashion, Bray-Curtis dissimilarity values were calculated from our data matrix between all samples and used as an input for a 2-dimensional non-metric multidimensional scaling (NMDS) ordination (Fig. 2). A stress plot for this ordination (Supplementary Figure S3a), shows a stress value of 0.18, within the 0.20 acceptability threshold 36 . There was clear separation of samples by niche (PERMANOVA, p = 0.0001) (Fig. 2a). There was less visual separation by mouse age (6 vs 18 weeks, PERMANOVA, p = 0.0001) (Fig. 2b) which is in concordance with our previous work 17 . Samples from the same cage localised closely in the ordination. Cages containing different litters from the same mother (see Supplementary Figure S1) were adjacent or overlapping in the plot, suggesting maternal effects influencing, but not fully explaining, cage-specific microbiomes (Fig. 2c, www.nature.com/scientificreports/ www.nature.com/scientificreports/ or in interaction with age (p = 0.07) or niche (p = 0.1) (Fig. 2e). A similar analysis using only the presence/ absence of taxa (Jaccard Index) gives similar conclusions for niche (PERMANOVA, p = 0.01) and age (PER-MANOVA, p = 0.001), but visually separates treatments less clearly (Supplementary Figure S4). As with the Bray Curtis matrix, genotype had no significant effect on its own (PERMANOVA, p = 0.38) or in interaction with age (p = 0.38) or niche (p = 0.97). The stress value was 0.15 (Supplementary Figure S3b).
Specific microbiota are strongly associated with age, microbial niche, cage and mother but not host genotype. To determine the taxa driving the observed differences in community structure, the relative abundance matrix was used to construct machine learning models (random forests, RFs). Separate RF models were created to identify age, genotype, niche, cage and mother based on the relative abundance of the clades (as defined by the phylogenetic tree, Fig. 1b) in each sample. These models were compared against a null (negative control) model where relative abundances were permuted among taxa within samples to remove true associations, while keeping the characteristics of the individual samples (e.g. those arising from any differences in coverage). Further testing to validate the reproducibility of this approach is in the Supplementary Information (comprising Supplementary Figures S5-S7). Niche was determined from the microbiota with 92% accuracy, age and mother with ~ 98% accuracy and cage with 80% accuracy (averaged across six technical replicates), in all instances substantially higher than the negative control model (Fig. 3a) (Two Way ANOVA-Sidak's post hoc test: P < 0.0001). Genotype could not be determined from the microbiota using our RF models any better than in the negative control (Fig. 3a). Models considering genotype were therefore not considered further. The RFs give an importance value for each clade (the tree's internal nodes) in discriminating between groups. To identify which bacteria the clades encompassed, we used BLAST+ on all sequences, recording the taxonomic identity of the top hit (hits that had a percentage coverage < 100% were discarded). The finest-scale taxonomic grouping containing all sequences descending from the five most important clades is shown in Fig. 3b-e for niche, age, cage and mother RFs respectively. The named clades do not represent all bacteria within that taxon, rather they represent specific bacteria, all of which fall within the taxon. For microbial niche, the most important distinguishing clades were all Gram negative and mostly comprised Proteobacteria: the order Pseudomonadales and three clades in the families Burkholderiaceae and Deferribacteraceae and all these clades were more abundant in the mucus samples (Supplementary Figure S8a). The Deferribacteraceae containing clade mostly comprised Mucispirillum, a known mucus-associated bacteria 37 . The genus Porphyromonas was more abundant in stool samples. The most important clades separating ages were the families Erysipelotrichaceae and Lachnospiraceae within the Firmicutes phylum (which have each been specifically associated with young mouse microbiomes before 38 ) plus three genera: Natranaerovirga, Desulfovibrio, and Vampirovibrio in the Firmicutes, Proteobacteria and Cyanobacteria phyla respectively. With the exception of Natranaerovirga, all these bacteria were prevalent in the 18 week old mice (Supplementary Figure S8b). The most important clades separating cages and mothers were Natranaerovirga (a different clade from that separating ages) plus four clades within the order Bacteroidales-three comprising the genera Bacteroides and one comprising Barnesiella (Supplementary Figure S8c,

d).
Abundant, low-level taxa distinguish cage and maternal microbiomes but not age or niche. Having identified taxa at different phylogenetic levels as particularly important for separating microbiomes, we looked systematically at the phylogenetic scales that are important for separating different microbiomes. Clade importance was analysed as a function of the number of nodes between the clade and the root of the phylogenetic tree (Fig. 4) or the distance from each clade to the root (Supplementary Figure S9). These measures distinguish clades close to the root (high-level taxa with fewer nodes and shorter branch length from the base of the clade to the root) corresponding, e.g. to phyla, and clades far from the root (low-level taxa with more nodes and longer branch lengths from the base of the clade to the root) corresponding e.g. to genera. For both age and niche, neither the lowest nor the highest level clades were consistently important but the most clearly important clades were of intermediate taxonomic levels (Fig. 4a,b). For age, this separation between true and null models differs between the metrics: while the intermediate level taxa are important across both metrics, the highest level clades are the most important with the distance metric (Supplementary Figure S9b vs. Fig. 4b).
However, for differences among cages and mothers, while intermediate level clades were important, many of the most important groups were at the extreme of low level taxa, i.e. differences in sub-specific groupings (Fig. 4c,d,  Supplementary Figure S9c,d).
The number of sequences within a clade of the tree that were present in a particular set of samples is an estimate of its abundance in that microbiome. We therefore asked how abundance of bacterial taxa correlated with its importance in distinguishing microbiomes. We found that, for separating niche, age, cage or maternal microbiomes, moderately abundant taxa were important, whereas the rarest taxa were never important (Fig. 4e-h). The most abundant taxa were important for distinguishing cage and maternal microbiomes (Fig. 4g,h) but much less so for distinguishing different ages or niches (Fig. 4e,f) where the true and null models converge on the right).
Finally, we asked how the importance of particular taxa related across models distinguishing the different criteria. The distributions of importance values are wide for all RFs, including null models ( Supplementary Figure S10). Nonetheless, we expect a positive association between the importance of particular taxa in distinguishing cages and in distinguishing mothers, since individual litters were housed in separate cages (Fig. 2). These are indeed associated (rank correlation of 0.291, cf. 0.05 for the null model, Supplementary Figure S10). However, the importance of particular taxa was uncorrelated across all other forests that distinguished different criteria (age versus niche, age versus cage, age vs mother, niche versus cage and niche vs mother had rank correlations − 0.002, 0.005, 0.002, − 0.001 and − 0.0007 respectively, each less than their respective null models). Thus, while we can identify broad trends (Fig. 4) and a few key taxa associated with particular gut features (Fig. 3), we did not find widely applicable 'indicator' taxa that were individually sensitive to multiple effects on the gut microbiome.

Discussion
We were able to discriminate clearly between the microbiomes of 6 and 18 week old mice, mucus and stool samples, different groups of mixed genotype co-housed mice and mice with different mothers. This confirms the robustness of our models as it is consistent with others' work, for instance showing microbiome changes with age in both humans and mice 39,40 , work identifying microbial niche as the strongest factor for separation of the microbiota 17,41 and the impact of environment on the microbiome 4,11,42 . We associated the microbiota with cage with approximately 80% accuracy, suggesting that each cage has a distinct microbial signature. Mice were housed according to their litter and the mother could be distinguished with 98% accuracy, so the maternal microbiome is likely to be critical in determining this signature. Differences in environment can impact the microbiome and have functional impacts on the host, including altered permeability of the mucus barrier 4 . However, the risk of confounding phenotypes of interest with environmental variation is much broader-in humans, the microbiota is also both vertically transmissible (from mother to offspring) and horizontally transmissible (between household members) 43 . Nonetheless, it is predominantly animal studies that are used to assign causality of changes in the microbiome to phenotypic effects. Therefore, it is imperative that environmental factors, including both mothers and housing, are both reported and controlled for. Currently, this is by no means universal 12 .
Our mice are still relatively young, with initial samples taken only ~ 3 weeks after weaning and at 18 weeks old. Therefore the strong differences seen by age may be due to the microbiota still adjusting at 6 weeks due to changed diet from milk to solid food. Solid food may itself contain plant and bacterial sequences (including plant sequences that may be mistaken by taxonomy databases for cyanobacteria 44 ). It is also true that, even in our experimental design, because it is only possible to obtain a single mucus sample from a mouse, we have confounded the age effect with the cage and maternal effects, meaning that we cannot exclude the possibility that the true effect of age is in fact less distinct. Nonetheless, dietary-derived microbial changes in mice can happen within a much shorter timeframe 45,46 , and the nature of the age effect identified (Figs. 3c and 4b,f) is clearly distinct from the cage and maternal effects, consistent with it being a distinct age effect. The fact that we saw these differences clearly validates our approach to modelling them. In addition, the microbes identified as key to these changes include expected taxa such as the family Erysipelotrichaceae. This family distinguished our older mice (Fig. 3c) and has been associated with the development of IBD (both positively and negatively) 47,48 and colorectal cancer 49 .
We found no consistent differences between the gut microbiomes of wildtype and colitis-prone (mdr1a −/− ) genotypes co-housed together in mixed genotype cages. Differences in the microbiota of WT and mdr1a −/− mice have been reported 17,50 (with and without littermate controls respectively) so the fact that we do not see them here (Figs. 2e, 3a) is unexpected. Discrepancies in sample size between treatment groups can be a problem for RFs applied to such data 51 and machine learning typically uses much larger sample sizes. However, here sample sizes are well balanced (10 wildtype and 10 mdr1a −/− mice with 2 samples from each, albeit individual mothers gave rise to different numbers of offspring, Supplementary Figure S1) and sufficient for distinguishing other criteria. The older mdr1a −/− mice were starting to develop colitis. Therefore, changes in the microbiome with colitis may have obscured any consistent differences among genotypes. Alternatively, previous analyses may have been misled by large cage effects (Figs. 2c, 3a) into erroneously attributing some of that variation to differences among genotypes. For instance, studies finding differences between the stool microbiota of eosinophil-deficient mice compared to wildtype mice (e.g. 52 ) do not report controls for cage effects, e.g. via littermate controls. Thus, they cannot rule out the possibility that differences in the microbiota are due to environmental effects.
Changes in the gut microbiome, at any taxonomic level have been attributed as leading to functional impacts on the host. For instance, a reduction in the abundance and diversity of Firmicutes is associated with IBD in human patients [53][54][55] and Bacteroidetes has been shown to be both increased 56 and decreased 53 with respect to inflammation. However, our data did not find such high-level taxa to show consistent differences in any of our microbiome comparisons (Fig. 4a-d, Supplementary Figure S9). This could be because our phylogenetic tree does not fully capture the relationships among the highest level taxa (Fig. 1b), because there is limited phylogenetic information in amplicons from the subset of the 16S rRNA used. Even trees using the complete 16S rRNA sequence from carefully chosen bacteria do not fully capture their evolutionary history 57 and partial 16S rRNA trees can only be made to agree with accepted evolutionary relationships by incorporating many constraints (as done by Louca et al. 58 ). Here, we did not want to create the biases that such constraints would impose. Even taking this approach, phyla themselves are largely resolved (Fig. 1b), so inadequacies in the tree seem unlikely to account for the lack of consistent differences in these taxa among criteria (Fig. 4a-d, Supplementary Figure S9). These high-level taxa are also abundant taxa. While there has been a focus on the importance of rare taxa (e.g. 59 ), a priori, it might have been reasonable to expect that the more abundant taxa would have the most important Abundant, low-level taxa distinguish cage microbiomes but not age or niche. The phylogenetic scale of each clade was measured as the number of nodes in the phylogenetic tree between the clade and root (small values associated with large-scale taxa such as phyla). Phylogenetic scale was compared against the 'mean decrease in accuracy' (MDA) value when running a random forest that distinguished the niche (a), age (b), cage (c) and mother (d). Clade abundance was measured as the number of sequences (tips) descending from each clade. Each clade's abundance was compared against its MDA value, when running a forest that distinguished the niche (d), age (e), cage (f) and mother (g). The smoothed mean for the 'real' random forest model is illustrated in blue and for a null (negative control) random forest model in red. The grey areas refer to confidence intervals. Each small vertical 'rug' line above the horizontal axis indicates the location of a single taxon. Note the logarithmic scales on all axes. Similar plots using a different measure of phylogenetic scale (distance of clades from the root of the tree) are in Supplementary Figure S5. Figure produced in R 3.6.0 for Windows. www.nature.com/scientificreports/ functional consequences for the host (as posited in other microbiomes e.g. 60 ) and therefore be the most likely to differ between different circumstances. However, the only microbiome comparison in which we find the most abundant taxa to be important was in distinguishing among cages and mothers. In those cases, it was low-level taxonomic groupings (e.g. clades within the abundant genus Bacteroides), not phyla, that distinguished cagespecific microbiomes. This findings suggests the importance of relatively rare microbial species. Rare bacterial species are thought to play a large role in a range of ecosystems, including host and environmental microbiomes 61,62 . Specifically, rare taxa have been associated with inflammation 63 . Here however, we did not find the rarest taxa to be important in discriminating between microbiomes (Fig. 4e-h). This could be an artefact of the fact that, almost by definition in a complex microbiome, rare taxa are likely to be missed from at least a subset of samples through random sampling. Therefore, rare taxa would not show consistent differences among the factors considered (niche, age, cage or mother), as we find (Fig. 4e-h).

Scientific
The presence or absence of certain taxa will allow other bacterial families/species to flourish or be inhibited, which in turn will alter host/microbial homeostasis, emphasising the need to consider communities and not bacteria in isolation. Furthermore, changes in one bacterium may not be significant functionally if the clade as a whole is unaffected. RF models can account for such interactions among taxa, and the 'importance' assigned to a taxon (Fig. 3b-e) takes these into account 64 . RF approaches have previously proved effective where ratios of Firmicutes to Bacteroidetes could not 51 and could discriminate between patients with active Crohn's disease and those in remission with ~ 70% accuracy 65 . Here we go one step further, by using the full range of clades in a phylogenetic tree as explanatory variables in the RF model. This avoids over-stretching the data by assigning a sequence read to one taxon rather than another, when it is in fact similarly close or distant to both. It also ensures that we do not lose power that is in the data e.g. clear phylogenetic structure among sequences that are closer than a given threshold (typically 97% identity used for OTUs 66 ). The development of 'de-noising' approaches such as DADA2 67 and DEBLUR 68 to generate amplicon sequence variants (ASVs) also goes some way to avoiding the problems of using universal similarity thresholds to define OTUs. However, different de-noisers can lead to different results 69 . We avoid such issues by using all sequence variants, whether true ASVs or sequencing errors. Given our well-controlled experiment, we do not expect different sequencing errors in different treatments. This expectation is consistent with the fact that we find the very rarest variants, which will be highly enriched for sequencing errors, are no better than random at distinguishing any of our treatments (Fig. 4e-h). This approach and our focus on differences among treatments comes at a cost-we do not even attempt to estimate the 'true' community composition of any particular sample or how well it's captured by the data. Despite this, we are able to identify clear compositional differences and phylogenetic patterns in communities across the treatments studied.
In conclusion, taking a carefully designed factorial experiment involving co-housing of different genotypes of littermate mice, we have been able to identify major changes in the gut microbiome with age, niches, cages and mothers, but not genotype (Figs. 2, 3a). In particular, we highlight a clear impact on the gut microbial communities associated with these experimental factors, particularly cage and maternal effect with phylogenetic patterns that are in stark contrast to niche and age. Together, this work reveals the subtlety of the balance between homeostasis and difference in the gut microbiome, and emphasises the need to carefully account for host environment when performing future studies.

Data availability
The sequence data analysed during the current study is available on the European Bioinformatics Institute (EBI, https:// www. ebi. ac. uk/ ena) (study accession number PRJEB6905). Code to reproduce the main text figures produced in R will be available on FigShare (https:// doi. org/ 10. 48420/ 13649 837).