Understanding artificial mouse-microbiome heterogeneity and six actionable themes to increase study power

The negative effects of data clustering due to (intra-class/spatial) correlations are well-known in statistics to interfere with interpretation and study power. Therefore, it is unclear why housing many laboratory mice (≥4), instead of one-or-two per cage, with the improper use/reporting of clustered-data statistics, abound in the literature. Among other sources of ‘artificial’ confounding, including cyclical oscillations of the ‘cage microbiome’, we quantified the heterogeneity of modern husbandry practices/perceptions. The objective was to identify actionable themes to re-launch emerging protocols and intuitive statistical strategies to increase study power. Amenable for interventions, ‘cost-vs-science’ discordance was a major aspect explaining heterogeneity and the reluctance to change. Combined, four sources of information (scoping-reviews, professional-surveys, expert-opinion, and ‘implementability-score-statistics’) indicate that a six-actionable-theme framework could minimize ‘artificial’ heterogeneity. With a ‘Housing Density Cost Simulator’ in Excel and fully annotated statistical examples, this framework could reignite the use of ‘study power’ to monitor the success/reproducibility of mouse-microbiome studies.


49
Laboratory mice are critical to understanding human biology in a variety of fields, from inflammatory bowel 50 diseases, neurology, and cancer, to microbiome and nutrition. In the current era of microbiome research, multiple 51 factors are becoming evident as sources for confounding. Integrating microbiome science into animal research 52 necessitates that experiments control for confounding derived from emerging artificial factors, especially the 'cage 53 microbiome', 1-5 which we recently discovered causes 'cyclical microbiome bias' due to the periodic accumulation of 54 excrements in mouse cages. 1

55
Understanding the factors that contribute to research heterogeneity will address this need. Primary factors causing 56 artificial analytical heterogeneity and low study power include putting many mice into one cage, having insufficient 57 cages per group, and using statistical methods that assume multiple mice in a cage are independent instead of 58 clustered observations. 2 In statistics and science, heterogeneity is a concept that describes the uniformity and variability of an 60 organism, a surface, or the distribution of data. Sources of study heterogeneity can be natural or artificial. Artificial 61 heterogeneity refers to study variance introduced by humans or anthropological factors, including animal husbandry 62 and the 'cage microbiome', which non-uniformly affect mouse biology. Fundamental to hypothesis testing, data 63 heterogeneity determines which statistical methods are needed to decisively quantify if two independent naturally-64 heterogeneous groups, truly differ. To appropriately select statistics controlling for cage-clustered data, scientists      intake is perceived as a collective of all aspects consumed orally, including the microbial content of diet ( Figure 4D).

206
Clusters and scientific-financial discordance when housing five mice in a study of five mice.

207
To interrogate whether cost is a contributing factor to animal housing density practices, we posed two 208 identical multiple-choice questions that differed only by the assumption of financial vs. scientific preference.  individuals do not think that this practice is economically feasible (Figure 5B), which reflects current literature where 217 only 15% (95%CI=9.6, 20.3) of studies reported exclusively housing 1 MxCg (see Figure 2C).

218
Considering that the majority of respondents' facilities implement weekly or every 2 weeks 'cage change' 219 protocols, with a wide array of drinking water sources across facilities (Figure 5C-D), our data suggests that cage

230
Although scientists could argue that statistical methods exist to control for clustering, 58

236
Implementability of a multi-theme framework to favor study power and reproducibility.

237
To objectively determine if the 'Recommendations' described below (supporting a multi-theme actionable 238 framework, Figures 1 and 7A) were i) clearly drafted as a sentence (sentence clarity), ii) had the potential benefit to 239 improve power and reproducibility (potential benefit), and iii) were deemed appropriate for readers to recommend to 240 others (would you recommend it?), we asked active academicians and scientists conducting research to grade each 241 recommendation and provide comments to create an 'implementability grade metric' (Supplementary Table 3  Graphical summary on Framework Implementability Scores a Implementability scores b IGS *** *** *** *** ***

Gut Microbiota/Infections Effect
(see Table 2; effect on animal physiology)

Effect of Husbandry
(see Table 1; causes of cage-cage effect)

Internal Vs External Validity
Studies should have excellent internal validity and provide evidence for good external validity Solution: See Tables 1-3 for summary of published evidence, and consider suggestions to improve study power and decrease cage-cage variability Solution: Improve study powerful designs, decrease cage-cage variability, consider using secondary models (instead of repeating) to validate findings. See Figure 6.  Figure 7B).

247
The   Figure 3), studying/sampling mice in clean cages and/or the use of slatted floors 69

336
Recommendation theme 6 on 'Implementing statistical models to consider ICC in clustered data'.

337
Depending on the experiment, we recognize that it is not always possible to single-house mice. Our review 338 showed that scientists often analyze clustered observations using methods that mathematically function under the 339 assumption of data independence (student T-, Mann-Whitney, One-/Two-way ANOVAs), without implementing 340 statistics for intra-class ('intra-cage') correlated (ICC) cage-clustered data (Multivariable linear/logistic, Marginal, 341 Generalized Estimating Equations, or Mixed Random/Fixed Regressions). 47,76,77 The ICC describes how units in a 342 cluster resemble one another, and can be interpreted as the fraction of the total variance due to variation between 343 clusters. 47 Housing multiple MxCg as homogeneous densities across study groups is logistically challenging using 344 few cages. To expand the outreach of our multi-theme framework, and to support scientists with their analysis and 345 publication of justifiable/clustered experiments, we recommend to 'Use statistical methods designed for analyzing 346 clustered data when multiple mice are housed in one cage, and when data points are obtained from mice over time,

to i) properly assess treatment effects, ii) determine the intraclass correlation coefficient for each study, and then iii)
348 to use that information to rapidly generate experiment-specific, customizable study power tables to aid in the

355
The statistical example we provide is based on data extracted (using ImageJ 78 analysis) from a published 356 dot plot figure in a reviewed study that exclusively reported cohousing 5 MxCg, and where authors compared two 357 diets using 8 and 9 MxGr (2 TCgxGr; Figure 9A). The published p-value was 0.058, but to emphasize our message, 358 we slightly/evenly adjusted the extrapolated data to achieve a univariate p<0.050. By simulating 5 possible cage-359 clustering scenarios, Figure 8 was designed to help visually understand the benefits of computing ICC and 360 experiment-specific customizable power tables to determine whether more cages/group or mice/cage are needed to 361 achieve study powers of ideally >0.8.

362
When using clustered-data methods, we showed that only one of the five scenarios yielded a significant diet 363 treatment effect (i.e., scenario 2, where all cages were unbiased, having mice with high and low response values, 364 something unlikely to occur naturally in clustered settings, Figure 9B). Data proves that artificial heterogeneity due to 365 mouse caging and unsupervised 'cage-effects' lead to poor reproducibility (80% of cases would misleadingly show 366 that the test diet induces an effect on the mouse response). Graphically, we show that the variability of ICC 367 (computed after running the mixed-effect models) depends on the hypothetical mouse allocation to cages, which in 368 turn influences the post-hoc estimations of study power (Figure 9C-D).

369
As a final practical product in this manuscript, we provide the statistical scheme/code in the GitHub 370 repository (https://github.com/axr503/cagecluster_powercode) to implement this streamlined analysis and compute 371 comprehensive power tables based on the ICC derived for each simulation to help scientists determine the best mice-372 to-cage combinations to match resources ( Figure 9E) 1  9  17  25  33  41  49  57  65  73  81  89  97  105  113  121  129  137  145  153  161  169  177  185  193  201  209  217  225  233  241  249  257  265  273  281  289  297  305  313  321 Figure 9, study selection was based on 457 the use of 5 mice/cage, and that study results were published as dot plots (allowing us to infer the raw data for our 458 analysis) in the manuscript. Descriptive statistics for parametric data were employed if assumptions were fulfilled 459 (e.g., 1-way ANOVA). Non-fulfilled assumptions were addressed with nonparametric methods (e.g., Kruskal-Wallis).

460
As needed, 95% confidence intervals are reported to account for sample size (e.g., MxCg; surveyed participants) and 461 for external validity context. Significance was held at p<0.05. Analysis, study powers, and graphics were conducted 462 with R, STATA, Python 3.0 Anaconda, GraphPad and G*Power. 74