Experiments that use only a small number of animals are common, but might not give meaningful results. Credit: Kevin Glover/MRC

Replace, refine, reduce: the 3 Rs of ethical animal research are widely accepted around the world. But now the message from UK funding agencies is that some experiments use too few animals, a problem that leads to wastage and low-quality results.

On 15 April, the research councils responsible for channelling government funding to scientists, and their umbrella group Research Councils UK, announced changes to their guidelines for animal experiments. Funding applicants must now show that their work will provide statistically robust results — not just explain how it is justified and set out the ethical implications — or risk having their grant application rejected.

The move aims to improve the quality of medical research, and will help to address widespread concerns that animals — mostly mice and rats — are being squandered in tiny studies that lack statistical power.

“If the study is underpowered your results are not going to be reliable,” says Nathalie Percie du Sert, who works on experimental design at the National Centre for the Replacement, Refinement and Reduction (NC3Rs) of Animals in Research in London. “These animals are going to be wasted.”

Researchers say that sample size is sometimes decided through historical precedent rather than solid statistics. There is also a lack of clarity: last year, an analysis of selected papers published in Nature or Public Library of Science journals describing animal experiments revealed that few reported the use of statistical tests to determine sample size, even though both publishing groups had endorsed guidelines to improve reporting standards (D. Baker et al. PLoS Biol. 12, e1001756; 2014).

Animals feature in a wide range of experiments (see ‘Animal use’), many of which are designed to test drugs before trials are done in people. The effects that researchers are looking for in these preclinical studies are often subtle, and ‘power calculations’ are needed to reveal the number of animals needed to show an effect. But an international academic partnership called the CAMARADES project (Collaborative Approach to Meta Analysis and Review of Animal Data from Experimental Studies), has shown that many animal studies are underpowered: studies in stroke, for example, are typically powered at between 30% and 50%, meaning that there is just a 30–50% chance of detecting a biological effect if it exists.

Credit: Source: UK Organisation Data Service

Malcolm Macleod, a neuroscientist at the University of Edinburgh, UK, blames, among other things, a lack of training and support in experimental design, as well as limited funds: animals are expensive to work with.

Some say that the pressure to ‘reduce’ may be one of the reasons for small experiments, but others counter that this is a misinterpretation of the 3 Rs because small experiments are ethically problematic if they have low statistical power.

The problem is not limited to Britain: last year, Francis Collins, director of the US National Institutes of Health (NIH), and Lawrence Tabak, NIH deputy director, warned about a lack of reproducibility in preclinical research and mentioned a dearth of sample-size calculations as one of the problems (see Nature 505, 612–613; 2014).

The situation infuriates animal-welfare proponents. “It’s completely unethical to use animals in studies that aren’t properly designed,” says Penny Hawkins, head of the research-animals department at the Royal Society for the Prevention of Cruelty to Animals in Southwater, UK.

Boosting the number of animals in specific experiments need not mean more animals are used overall because multiple small experiments can often be replaced by fewer, larger ones “One potential implication is we need to ask for money to do larger studies,” says Marcus Munafò, a psychologist at the University of Bristol, UK.

Another way to increase sample sizes would be to link up researchers working on similar topics. Munafò notes that this is what geneticists now do for studies that require scanning a large number of genomes. “That template already exists,” he says. “The question is, how do you initiate that cultural change?”

More immediately, du Sert is developing an online tool for the NC3Rs that will help researchers to design robust studies. “We’re not blaming anyone for the way they were doing things before,” she adds. “That was the practice at the time.”