Ten years ago this week, researchers pointed out that many important findings in human experimental psychology cannot be generalized because study participants are predominantly drawn from a small, unrepresentative subset of the world’s population: societies that are Western, Educated, Industrialized, Rich and Democratic (WEIRD)1. Mounting evidence suggests that there could be similar sampling problems in research on animals. Behavioural studies of a wide range of species — from insects to primates — could be affected, with researchers testing individuals that are not fully representative of the wider populations they seek to understand. For example, certain sampling protocols are likely to trap the boldest animals, potentially skewing experimental results2.
It is high time for scientists who work on animal behaviour to identify, and mitigate, potential sampling biases. Simply gathering more data is not a solution, because researchers should always strive to minimize the number of experimental animals used. Instead, we propose a framework with a fitting acronym — STRANGE — that researchers can use to interrogate how unusual their study subjects are. This will aid the design of new studies, and enable potential biases to be declared and discussed when publishing completed work.
In June 2010, human-evolutionary biologist Joe Henrich and his co-authors published a landmark paper challenging the widely held assumption that human behaviour varies little across populations1. They highlighted that the vast majority of research on human behaviour, and its cognitive basis, was conducted using participants from societies who are often outliers in broader comparisons. For example, WEIRD people are unusual in how they find their way around, what they consider to be fair, and their willingness to punish others1.
Several research fields chart the behaviour of non-human animals, including comparative psychology, ethology, behavioural ecology, evolutionary biology and conservation science. They attempt to uncover the evolutionary origins, developmental pathways, adaptive function, underlying mechanisms and conservation relevance of a wide variety of behaviours. Their common goal is to identify general biological principles that apply within and across taxa. Many animal studies are susceptible to sampling biases.
One well-known source of bias is the excessive reliance in some research fields on a few model organisms3, such as fruit flies (Drosophila spp.) and laboratory mice (Mus musculus), which inevitably limits the generality of findings. Less obvious, but perhaps much more problematic, is the fact that many species exhibit substantial variation in behaviour, and only certain subsets of that diversity might (unintentionally) turn up in test samples. In primates, for example, individuals of particular ‘personalities’ can be more likely to take part in trials voluntarily44 (see ‘Beware of STRANGE’). Life-long participation in experiments can also significantly alter subjects’ natural behaviour.
When developing the framework, our starting observation was that an animal’s behaviour is shaped5 by its genetic make-up, experience and social background. Ensuring that subjects are representative with regards to these three factors is paramount. Adding a few specific effects, which are well-documented yet often overlooked, provides the acronym STRANGE. It stands for: Social background; Trappability and self-selection; Rearing history; Acclimation and habituation; Natural changes in responsiveness; Genetic make-up; and Experience. STRANGE-related biases can affect both laboratory and field studies. They can influence which animals are sampled for testing, the extent to which they participate in experiments and, importantly, the behaviours that they exhibit during trials. This, in turn, can complicate comparisons between studies, hampering both the generalizability and the reproducibility of findings.
Our approach uses a simple test to evaluate the robustness of a completed or planned study. Researchers should ask: are my animal subjects strange when compared to the wider population for which I wish to make inferences, in any of the seven categories of the framework? Here, using selected examples, we showcase how sampling biases could affect the behaviours observed in animal studies (for further examples, see ‘Beware of STRANGE’ and Supplementary Information, Table S1).
Social background. This includes an animal’s social status, the nature and frequency of its interactions with others, and its past opportunities to learn socially from other individuals or their products. For example, when pheasants (Phasianus colchicus) were given a spatial-discrimination task to learn which of two holes contained bait, they performed better when they had been housed in groups of five rather than in groups of three6.
Trappability and self-selection. These are closely related processes. They mean that individuals with particular traits2 are most likely to be caught or to participate voluntarily in experiments (see ‘Beware of STRANGE’). In a classic study on pumpkinseed sunfish (Lepomis gibbosus), individuals collected using funnel traps (into which fish have to actively swim) were faster to start eating in the laboratory compared with those trapped using more indiscriminate nets7.
Trappability effects are expected to be prominent in bio-logging studies in which animals are fitted with electronic tags for remote observation. For example, when researchers tag surfacing whales or seabirds in breeding colonies that are difficult to access, they might inadvertently obtain non-random samples. Self-selection biases are a well-known — but usually neglected — problem in laboratory and field studies of animal cognition4.
Rearing history. This describes an animal’s developmental experiences, including the extent to which it has been exposed to a stimulating physical environment, to other animals and to humans. One study showed that captive-reared jumping spiders (Phidippus audax) were less active, less exploratory and had reduced interest in prey than were those collected from the wild8. Exposure to enrichment, social stimulation and exercise during development can affect brain development and, in turn, cognitive and motor performance9. This relationship is well established for rats and mice10.
Acclimation and habituation. These refer to behavioural changes over time following handling, tagging or exposure to new testing situations. For instance, green turtles (Chelonia mydas) spent less time swimming and more time feeding the day after they were tagged with video cameras than they did on the day of tagging. This probably reflects the turtles’ recovery from the tagging procedure and their habituation to the cameras11.
Natural changes in responsiveness. These sometimes follow daily, reproductive or seasonal cycles, or the transition from one life stage to another. This means that the timing of experiments is often crucial, as highlighted by a study that found that honeybees (Apis mellifera) learn more effectively in the morning than at other times of the day12.
Genetic make-up. This can have profound effects on behaviour. For example, the experience of losing territorial fights early in life has different effects in wild-type brown rats (Rattus norvegicus) than in individuals from a common genetic strain. The laboratory rats end up spending less time investigating intruders as adults13. There can also be marked differences in behaviour between wild populations, and between males and females, as has been shown in an experimental study on anti-predator behaviour in Trinidadian guppies (Poecilia reticulata), in which females spent more time shoaling than did males, for instance14.
Experience. This encompasses opportunities for individual learning, such as participation in earlier experiments. (This means there can be overlap with some of the other framework categories.) After male chaffinches (Fringilla coelebs) had been lured by a playback of a rival’s song, captured, handled and released again, they sang fewer territorial songs in response to simulated territory intrusions than did birds that had not been captured15. Long-lived animals can accumulate complex experimental histories in research laboratories, which in our view must be better documented and accounted for.
Some general points about the framework are worth highlighting. First, there are important conceptual differences between WEIRD and STRANGE. The former identifies attributes of a particular demographic group; the latter refers to a suite of factors that can affect behaviour. The seven categories of STRANGE are not problematic by themselves. In fact, they are often the focus of well-designed research projects, such as those we mention here, or are confounding factors that have been explicitly controlled for. Concerns arise whenever samples of study subjects are unwittingly biased with regards to any of these categories, and when researchers overlook that fact.
Second, there is overlap and strong interdependence between some STRANGE categories. For example, the origin of an animal — whether it was wild-caught or captive-bred — will often simultaneously affect its genetic make-up, social background, experience and rearing history. Third, we designed the categories to be broad enough to accommodate future extensions. Finally, although STRANGE refers to samples of animal subjects, effects can be moderated by study protocols. For instance, depending on the species, testing with or without others present can significantly affect an animal’s willingness to participate, as often observed in fish, birds and primates.
What will our critics say? Some might note that several of the effects we discuss — such as self-selection and experience biases — have been highlighted as problematic in the past. We feel that, because research practice seems to have improved little in response to specific warnings, it is time to introduce a memorable framework that integrates all factors that could affect the generalizability of animal-behaviour studies.
Others might point out that most of the examples we mention simply illustrate drivers of behavioural variation, and not sampling bias. As we will explain, systematic studies are urgently required to assess our contention that the potential for bias is both widespread and routinely ignored.
It is possible to identify and mitigate STRANGE-related biases at little or no extra cost, with a simple ‘3D’ approach: design, declare, discuss (see also Supplementary information, Box S1).
Design. There are many opportunities for researchers to make their test samples more representative (see Box S1, step 1). For example, we recommend that projects that rely on trapping wild subjects consider using a variety of trap types or bait preparations. They can also sample across multiple populations to reduce systematic biases. Similarly, in studies in which animals effectively select themselves for participation, we encourage researchers to think about ways of altering the testing environment or the task itself, to encourage more inclusive participation. Anecdotal evidence reveals that some crows, for instance, are hesitant to approach experimental tasks that are placed on the floor, but will readily engage with them when they are mounted just one metre above the ground.
Declare. When submitting a manuscript, researchers should supply — and journals should ask for — an objective evaluation of a system’s ‘STRANGEness’ so that editors and reviewers can gauge the scope for bias. Although reporting standards have significantly improved over the past few years, a surprising number of journals — including many specializing in animal behaviour and cognition — still lack robust reporting policies. We urge journals to insist that all behavioural studies report detailed subject-attribute data as set out in the ARRIVE guidelines (Animal Research: Reporting of In Vivo Experiments)16, with some additions (such as a full declaration of the attributes of non-participating subjects; see Box S1, step 2a).
Our STRANGE framework can then be used together with this information to evaluate the scope for sampling biases (Box S1, step 2b), and to describe which precautionary steps were taken to avoid bias, if any (Box S1, step 2c). These declarations enhance transparency, provide a valuable resource for systematic reviews and formal meta-analyses, and will hopefully encourage better planning of future studies.
Discuss. It is essential to detail any potential issues prominently in the main body of research papers (Box S1, step 3). This would force authors to explicitly link their findings to the studied sample, rather than to the population or species as a whole17.
As well as supporting all steps of this 3D approach, STRANGE provides a convenient memory aid: we encourage animal-behaviour researchers to routinely ask how unusual their samples are. (For further questions about specific categories, see Table S1.)
There is a large body of literature demonstrating how the categories in our STRANGE framework can affect the behaviours researchers observe. On the basis of this evidence, our personal research experience and our discussions with many colleagues, we suspect that STRANGE-related problems are widespread. We now urgently need retrospective analyses of published work that quantify how often test samples are biased with regards to these categories, when researchers failed to account for — or declare — these confounding factors.
Human experimental psychology has made great strides towards addressing sampling biases by improving reporting standards. For example, the Association for Psychological Science recommends that authors identify the participant population, explain their selection and consider how generalizable their findings are. However, despite these efforts, problems are surprisingly persistent, with many published studies still providing insufficient detail about their participants17.
Animal-behaviour scientists have a lot to learn from the WEIRD debate. We hope that our STRANGE framework will help to improve how animal-behaviour research is conducted, reported and interpreted.
Nature 582, 337-340 (2020)