Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Perspective
  • Published:

Systematic heterogenization for better reproducibility in animal experimentation

Abstract

The scientific literature is full of articles discussing poor reproducibility of findings from animal experiments as well as failures to translate results from preclinical animal studies to clinical trials in humans. Critics even go so far as to talk about a “reproducibility crisis” in the life sciences, a novel headword that increasingly finds its way into numerous high-impact journals. Viewed from a cynical perspective, Fett's law of the lab “Never replicate a successful experiment” has thus taken on a completely new meaning. So far, poor reproducibility and translational failures in animal experimentation have mostly been attributed to biased animal data, methodological pitfalls, current publication ethics and animal welfare constraints. More recently, the concept of standardization has also been identified as a potential source of these problems. By reducing within-experiment variation, rigorous standardization regimes limit the inference to the specific experimental conditions. In this way, however, individual phenotypic plasticity is largely neglected, resulting in statistically significant but possibly irrelevant findings that are not reproducible under slightly different conditions. By contrast, systematic heterogenization has been proposed as a concept to improve representativeness of study populations, contributing to improved external validity and hence improved reproducibility. While some first heterogenization studies are indeed very promising, it is still not clear how this approach can be transferred into practice in a logistically feasible and effective way. Thus, further research is needed to explore different heterogenization strategies as well as alternative routes toward better reproducibility in animal experimentation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Simplified schematic illustration of the standardization fallacy.
Figure 2: Systematic heterogenization over time (“batch heterogenization”).

Similar content being viewed by others

References

  1. Unreliable research. Trouble at the lab. The Economist (2013).

  2. Ioannidis, J.P. Why most published research findings are false. PLoS Med. 2, e124 (2005).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Bailoo, J.D., Reichlin, T.S. & Würbel, H. Refinement of experimental design and conduct in laboratory animal research. ILAR J. 55, 383–391 (2014).

    Article  CAS  PubMed  Google Scholar 

  4. Kola, I. & Landis, J. Can the pharmaceutical industry reduce attrition rates? Nat. Rev. Drug Discov. 3, 711–716 (2004).

    Article  CAS  PubMed  Google Scholar 

  5. Van der Worp, H.B. et al. Can animal models of disease reliably inform human studies? PLoS Med. 7, e1000245 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Mogil, J.S. Laboratory environmental factors and pain behavior: the relevance of unknown unknowns to reproducibility and translation. Lab Anim. (NY) 46, 136–141 (2017).

    Article  Google Scholar 

  7. Würbel, H. More than 3Rs: the importance of scientific validity for harm-benefit analysis of animal research. Lab Anim. (NY) 46, 164–166 (2017).

    Article  Google Scholar 

  8. Garner, J.P., Gaskill, B.N., Weber, E.M., Ahloy-Dallaire, J. & Pritchett-Corning, K.R. Introducing Therioepistemology: the study of how knowledge is gained from animal research. Lab Anim. (NY) 46, 103–113 (2017).

    Article  Google Scholar 

  9. Jarvis, M.F. & Williams, M. Irreproducibility in preclinical biomedical research: perceptions, uncertainties, and knowledge gaps. Trends Pharmacol. Sci. 37, 290–302 (2016).

    Article  CAS  PubMed  Google Scholar 

  10. Seok, J. et al. Genomic responses in mouse models poorly mimic human inflammatory diseases. Proc. Natl. Acad. Sci. USA 110, 3507–3512 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Scannell, J.W. & Bosley, J. When quality beats quantity: decision theory, drug discovery, and the reproducibility crisis. PLoS ONE 11, e0147215 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  12. Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016).

    Article  CAS  PubMed  Google Scholar 

  13. Voelkl, B. & Würbel, H. Reproducibility crisis: are we ignoring reaction norms? Trends Pharmacol. Sci. 37, 509–510 (2016).

    Article  CAS  PubMed  Google Scholar 

  14. Peng, R. The reproducibility crisis in science: A statistical counterattack. Significance 12, 30–32 (2015).

    Article  Google Scholar 

  15. Begley, C.G. & Ioannidis, J.P. Reproducibility in science. Circ. Res. 116, 116–126 (2015).

    Article  CAS  PubMed  Google Scholar 

  16. van der Staay, F.J., Arndt, S.S. & Nordquist, R.E. Evaluation of animal models of neurobehavioral disorders. Behav. Brain Funct. 5, 11 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Collins, F.S. & Tabak, L.A. Policy: NIH plans to enhance reproducibility. Nature 505, 612–613 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Prinz, F., Schlange, T. & Asadullah, K. Believe it or not: how much can we rely on published data on potential drug targets? Nat. Rev. Drug Discov. 10, 712 (2011).

    Article  CAS  PubMed  Google Scholar 

  19. Open Science Collaboration. Estimating the reproducibility of psychological science. Science 349, aac4716 (2015).

  20. Ioannidis, J.P. et al. Repeatability of published microarray gene expression analyses. Nat. Genet. 41, 149–155 (2009).

    Article  CAS  PubMed  Google Scholar 

  21. Freedman, L.P., Cockburn, I.M. & Simcoe, T.S. The economics of reproducibility in preclinical research. PLoS Biol. 13, e1002165 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  22. Begley, C.G. & Ellis, L.M. Raise standards for preclinical cancer research. Nature 483, 531–533 (2012).

    Article  CAS  PubMed  Google Scholar 

  23. Giles, J. Animal experiments under fire for poor design. Nature 444, 981 (2006).

    Article  CAS  PubMed  Google Scholar 

  24. Ioannidis, J.P. et al. Increasing value and reducing waste in research design, conduct, and analysis. Lancet 383, 166–175 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Macleod, M.R. et al. Risk of bias in reports of in vivo research: a focus for improvement. PLoS Biol. 13, e1002273 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  26. Reichlin, T.S., Vogt, L. & Würbel, H. The researchers' view of scientific rigor—survey on the conduct and reporting of in vivo research. PLoS ONE 11, e0165999 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. van der Worp, H.B., de Haan, P., Morrema, E. & Kalkman, C.J. Methodological quality of animal studies on neuroprotection in focal cerebral ischaemia. J. Neurol. 252, 1108–1114 (2005).

    Article  PubMed  Google Scholar 

  28. Vogt, L., Reichlin, T.S., Nathues, C. & Würbel, H. Authorization of animal experiments is based on confidence rather than evidence of scientific rigor. PLoS Biol. 14, e2000598 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. McNutt, M. Journals unite for reproducibility. Science 346, 679 (2014).

    Article  CAS  PubMed  Google Scholar 

  30. Kilkenny, C., Browne, W., Cuthill, I.C., Emerson, M. & Altman, D.G. Animal research: reporting in vivo experiments: the ARRIVE guidelines. Br. J. Pharmacol. 160, 1577–1579 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Kilkenny, C., Browne, W.J., Cuthill, I.C., Emerson, M. & Altman, D.G. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol. 8, e1000412 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  32. Baker, D., Lidster, K., Sottomayor, A. & Amor, S. Two years later: journals are not yet enforcing the ARRIVE guidelines on reporting standards for pre-clinical animal studies. PLoS Biol. 12, e1001756 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Lazic, S.E. & Essioux, L. Improving basic and translational science by accounting for litter-to-litter variation in animal models. BMC Neurosci. 14, 37 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Festing, M.F. Design and statistical methods in studies using animal models of development. ILAR J. 47, 5–14 (2006).

    Article  CAS  PubMed  Google Scholar 

  35. Halsey, L.G., Curran-Everett, D., Vowler, S.L. & Drummond, G.B. The fickle P value generates irreproducible results. Nat. Methods 12, 179–185 (2015).

    Article  CAS  PubMed  Google Scholar 

  36. Goodman, S.N. Aligning statistical and scientific reasoning. Science 352, 1180–1181 (2016).

    Article  CAS  PubMed  Google Scholar 

  37. Wainwright, P.E. Issues of design and analysis relating to the use of multiparous species in developmental nutritional studies. J. Nutr. 128, 661–663 (1998).

    Article  CAS  PubMed  Google Scholar 

  38. Zorrilla, E.P. Multiparous species present problems (and possibilities) to developmentalists. Dev. Psychobiol. 30, 141–150 (1997).

    Article  CAS  PubMed  Google Scholar 

  39. Holson, R.R. & Pearce, B. Principles and pitfalls in the analysis of prenatal treatment effects in multiparous species. Neurotoxicol. Teratol. 14, 221–228 (1992).

    Article  CAS  PubMed  Google Scholar 

  40. Lazic, S.E. The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neurosci. 11, 5 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Noble, W.S. How does multiple testing correction work? Nat. Biotechnol. 27, 1135–1137 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Festing, M.F. We are not born knowing how to design and analyse scientific experiments. Altern. Lab. Anim. 41, 19–21 (2013).

    Google Scholar 

  43. Sena, E.S., Van Der Worp, H.B., Bath, P.M., Howells, D.W. & Macleod, M.R. Publication bias in reports of animal stroke studies leads to major overstatement of efficacy. PLoS Biol. 8, e1000344 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  44. Cumming, G. The new statistics why and how. Psychol. Sci. 25, 7–29 (2014).

    Article  PubMed  Google Scholar 

  45. Poole, T. Happy animals make good science. Lab. Anim. 31, 116–124 (1997).

    Article  CAS  PubMed  Google Scholar 

  46. Garner, J.P. Stereotypies and other abnormal repetitive behaviors: potential impact on validity, reliability, and replicability of scientific outcomes. ILAR J. 46, 106–117 (2005).

    Article  CAS  PubMed  Google Scholar 

  47. Prescott, M.J. & Lidster, K. Improving quality of science through better animal welfare: the NC3Rs strategy. Lab Anim. (NY) 46, 152–156 (2017).

    Article  Google Scholar 

  48. Nuzzo, R. Statistical errors. Nature 506, 150 (2014).

    Article  CAS  PubMed  Google Scholar 

  49. Head, M.L., Holman, L., Lanfear, R., Kahn, A.T. & Jennions, M.D. The extent and consequences of p-hacking in science. PLoS Biol. 13, e1002106 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Simmons, J.P., Nelson, L.D. & Simonsohn, U. False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22, 1359–1366 (2011).

    Article  PubMed  Google Scholar 

  51. Festing, M.F. Reduction of animal use: experimental design and quality of experiments. Lab. Anim. 28, 212–221 (1994).

    Article  CAS  PubMed  Google Scholar 

  52. Beynen, A.C., Baumans, V. & Van Zutphen, L.F.M. in Principles of Laboratory Animal Science (eds. L.F.M. Van Zutphen, V. Baumans & A.C. Beynen) 103–110 (Elsevier, Amsterdam, 2001).

    Google Scholar 

  53. Würbel, H. Behaviour and the standardization fallacy. Nat. Genet. 26, 263 (2000).

    Article  PubMed  Google Scholar 

  54. Würbel, H. Behavioral phenotyping enhanced–beyond (environmental) standardization. Genes Brain Behav. 1, 3–8 (2002).

    Article  PubMed  Google Scholar 

  55. Crabbe, J.C., Wahlsten, D. & Dudek, B.C. Genetics of mouse behavior: interactions with laboratory environment. Science 284, 1670–1672 (1999).

    Article  CAS  PubMed  Google Scholar 

  56. Mandillo, S. et al. Reliability, robustness, and reproducibility in mouse behavioral phenotyping: a cross-laboratory study. Physiol. Genomics 34, 243–255 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Lewejohann, L. et al. Environmental bias? Effects of housing conditions, laboratory environment and experimenter on behavioral tests. Genes Brain Behav. 5, 64–72 (2006).

    Article  CAS  PubMed  Google Scholar 

  58. Wolfer, D.P. et al. Laboratory animal welfare: cage enrichment and mouse behaviour. Nature 432, 821–822 (2004).

    Article  CAS  PubMed  Google Scholar 

  59. Wahlsten, D. Standardizing tests of mouse behavior: reasons, recommendations, and reality. Physiol. Behav. 73, 695–704 (2001).

    Article  CAS  PubMed  Google Scholar 

  60. Wahlsten, D. et al. Different data from different labs: lessons from studies of gene–environment interaction. J. Neurobiol. 54, 283–311 (2003).

    Article  PubMed  Google Scholar 

  61. Crabbe, J.C. & Morris, R.G. Festina lente: late-night thoughts on high-throughput screening of mouse behavior. Nat. Neurosci. 7, 1175–1179 (2004).

    Article  CAS  PubMed  Google Scholar 

  62. Galsworthy, M.J. et al. A comparison of wild-caught wood mice and bank voles in the Intellicage: assessing exploration, daily activity patterns and place learning paradigms. Behav. Brain Res. 157, 211–217 (2005).

    Article  PubMed  Google Scholar 

  63. Talpos, J. & Steckler, T. Touching on translation. Cell Tissue Res. 354, 297–308 (2013).

    Article  PubMed  Google Scholar 

  64. Richter, S.H. et al. Touchscreen-paradigm for mice reveals cross-species evidence for an antagonistic relationship of cognitive flexibility and stability. Front. Behav. Neurosci. 8, 154 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  65. Richardson, C.A. Automated homecage behavioural analysis and the implementation of the three Rs in research involving mice. Altern. Lab. Anim. 40, 7–9 (2012).

    Google Scholar 

  66. Dingemanse, N.J., Kazem, A.J., Réale, D. & Wright, J. Behavioural reaction norms: animal personality meets individual plasticity. Trends Ecol. Evol. 25, 81–89 (2010).

    Article  PubMed  Google Scholar 

  67. Sarkar, S. From the Reaktionsnorm to the adaptive norm: the norm of reaction, 1909–1960. Biol. Philos. 14, 235–252 (1999).

    Article  Google Scholar 

  68. van der Staay, F.J. Animal models of behavioral dysfunctions: basic concepts and classifications, and an evaluation strategy. Brain Res. Rev. 52, 131–159 (2006).

    Article  PubMed  Google Scholar 

  69. Muma, J.R. The need for replication. J. Speech Lang. Hear. Res. 36, 927–930 (1993).

    Article  CAS  Google Scholar 

  70. Würbel, H. & Garner, J.P. Refinement of rodent research through environmental enrichment and systematic randomization. NC3Rs 9, 1–9 (2007).

    Google Scholar 

  71. Richter, S.H., Garner, J.P. & Wurbel, H. Environmental standardization: cure or cause of poor reproducibility in animal experiments? Nat. Methods 6, 257–261 (2009).

    Article  CAS  PubMed  Google Scholar 

  72. Richter, S.H., Garner, J.P., Auer, C., Kunert, J. & Würbel, H. Systematic variation improves reproducibility of animal experiments. Nat. Methods 7, 167–168 (2010).

    Article  CAS  PubMed  Google Scholar 

  73. Grafen, A. & Hails, R. Modern statistics for the life sciences (Oxford University Press, Oxford, 2002).

    Google Scholar 

  74. Walker, M. et al. Mixed-strain housing for female C57BL/6, DBA/2, and BALB/c mice: validating a split-plot design that promotes refinement and reduction. BMC Med. Res. Methodol. 16, 11 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  75. Festing, M.F. & Altman, D.G. Guidelines for the design and statistical analysis of experiments using laboratory animals. ILAR J. 43, 244–258 (2002).

    Article  CAS  PubMed  Google Scholar 

  76. Richter, S.H. et al. Effect of population heterogenization on the reproducibility of mouse behavior: a multi-laboratory study. PLoS ONE 6, e16461 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. Würbel, H., Richter, S.H. & Garner, J.P. Reply to: “Reanalysis of Richter et al. (2010) on reproducibility”. Nat. Methods 10, 374 (2013).

    Article  PubMed  CAS  Google Scholar 

  78. Jonker, R.M., Guenther, A., Engqvist, L. & Schmoll, T. Does systematic variation improve the reproducibility of animal experiments? Nat. Methods 10, 373 (2013).

    Article  CAS  PubMed  Google Scholar 

  79. Wolfinger, R.D. Reanalysis of Richter et al. (2010) on reproducibility. Nat. Methods 10, 373–374 (2013).

    Article  CAS  PubMed  Google Scholar 

  80. Paylor, R. Questioning standardization in science. Nat. Methods 6, 253–254 (2009).

    Article  CAS  PubMed  Google Scholar 

  81. Chesler, E.J., Wilson, S.G., Lariviere, W.R., Rodriguez-Zas, S.L. & Mogil, J.S. Identification and ranking of genetic and laboratory environment factors influencing a behavioral trait, thermal nociception, via computational analysis of a large data archive. Neurosci. Biobehav. Rev. 26, 907–923 (2002).

    Article  PubMed  Google Scholar 

  82. Chesler, E.J., Wilson, S.G., Lariviere, W.R., Rodriguez-Zas, S.L. & Mogil, J.S. Influences of laboratory environment on behavior. Nat. Neurosci. 5, 1101–1102 (2002).

    Article  CAS  PubMed  Google Scholar 

  83. Karp, N.A., Melvin, D., Mott, R.F. & Project, S.M.G. Robust and sensitive analysis of mouse knockout phenotypes. PLoS ONE 7, e52410 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Sorge, R.E. et al. Olfactory exposure to males, including men, causes stress and related analgesia in rodents. Nat. Methods 11, 629–632 (2014).

    Article  CAS  PubMed  Google Scholar 

  85. Sittig, L.J. et al. Genetic background limits generalizability of genotype-phenotype relationships. Neuron 91, 1253–1259 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

Current research on heterogenization and reproducibility is funded by the German Research Foundation (DFG, RI 2488/3-1). Furthermore, I would like to thank Norbert Sachser, Hanno Würbel, Sara Hintze, Niklas Kästner, and Vanessa von Kortzfleisch for their helpful comments on earlier drafts of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S Helene Richter.

Ethics declarations

Competing interests

The author declares no competing financial interests.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Richter, S. Systematic heterogenization for better reproducibility in animal experimentation. Lab Anim 46, 343–349 (2017). https://doi.org/10.1038/laban.1330

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/laban.1330

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing