All scientific disciplines grapple with the issues of standardization of methods and reproducibility of results. Standardizing seems especially difficult when dealing with genetically modified mice, which have become an invaluable tool for studying the influence of genes on brain and behavior. Neuroscientists have struggled with this issue for decades: How can we design mouse studies so that they are reproducible? How should we control for genetic background in mice? And how can we control or standardize the mice's environment? Although we've made some progress in addressing potential genetic confounds, the issue of environmental standardization remains problematic. We need to be much more aware of the potential confounds introduced by genetic and environmental variability. As a first step, scientists must take greater care to fully detail the genetic backgrounds of the mouse strains that they use, as well as the conditions in which they house their mice and conduct their experiments.

Generations of inbreeding have resulted in substantial differences between strains of mice. Considerable information can be gained by associating the behavioral differences between strains of mice with their underlying genotype. However, strain differences also mean that each strain has the potential to have a unique reaction to an introduced mutation, which then might have a unique interaction with the mouse's environment. Confining all mutant studies to one specific background, however, may lead researchers to miss interesting and relevant phenotypes.

Related to this issue is mouse breeding and colony maintenance. Mutant mice and the wild-type mice that they're being compared with should not be produced from originally segregating populations and/or maintained as homozygous lines. However, even when mutant mice are maintained correctly as a randomly bred colony or as a continually backcrossed congenic strain, there are potential confounds. Although studying a mutation on a heterogeneous background has the advantage of revealing new phenotypes, intercrossing strains to transfer mutations can cause mutant and wild-type mice to differ in the alleles that flank the locus of the mutation, in addition to the mutated gene.

Effectively controlling for environmental variability across laboratories seems to be an even harder problem to solve. It is also not clear that standardizing environmental conditions would be optimal, as recent work1 suggests that limiting the variability in experiments may increase the number of significant, but irrelevant or spurious, behavioral differences observed in mice. Würbel and colleagues used a dataset that contained behavioral data from three different mouse strains, ordered in different batches, housed in either 'enriched' or 'unenriched' cages in three different laboratories. By simulating two different experimental designs, one in which the mice had the same combination of batch, housing and laboratory conditions (standardized replicates) and one in which the mice pseudo-randomly selected from different conditions (the heterogenized replicates), the authors found an increased rate of false positives in the standardized replicates. Würbel and colleagues suggested that environmental standardization may actually increase the number of spurious significant effects reported and ultimately decrease the applicability of a study's findings.

Given these problems, are there any guidelines that the community ought to follow? The editors of Genes, Brain and Behavior recently decided on specific guidelines for mutant studies to be considered by their journal2, which detailed acceptable breeding procedure and experimental controls. Although it is unclear whether all of these guidelines will lead to improved reproducibility of mutant mouse studies, their first guideline, which insists on increased transparency by requiring authors to present full strain and substrain information and sufficient details of breeding procedure that would allow others to replicate their experiments, is likely to help. Nature Neuroscience does not currently have a specific acceptability policy regarding mutant mouse studies. We feel that expert peer reviewers must evaluate studies on a case-by-case basis, also considering other lines of evidence presented in the paper. The feasibility of following the set guidelines remains to be shown, especially when dealing with double or triple mutant mice. However, we recognize that it would be the best practice for all studies using mice to disclose full strain information.

We therefore urge our authors to include this information in their manuscripts, whether in the Methods section or in Supplementary Information, and to consider discussing how their results may be affected or limited by strain choice and breeding practice. We also ask that authors present as accurate a picture as possible of the mouse's environment. Reporting environmental details is far less straightforward, but differences between laboratory-specific factors, such as testing equipment, personnel and animal housing, are just as critical to reproducibility as genetic differences. Behaviorists already introduce some within laboratory heterogeneity regarding experiment timing by testing small batches of animals starting on different days and stretched over weeks or months. Different members of the laboratory should try and replicate the results. Factors such as animal housing, handling, food, lighting and noise conditions, all of which effect behavior and brain chemistry, can be varied or not. The key to reproducibility is accurate reporting of these seemingly mundane details, which potentially have large effects.

Awareness of the potential problems and caveats that arise from such genetic and environmental variability is becoming increasingly relevant, as mice are now used to study mechanisms that may underlie complex human disorders such as psychiatric disease. Hopefully, increased awareness will help generate creative strategies for dealing with these complex issues, thereby avoiding wasted time and resources, both animal and financial, on poorly designed mouse studies.