Evaluation of the effects of space allowance on measures of animal welfare in laboratory mice

We studied how space allowance affects measures of animal welfare in mice by systematically varying group size and cage type across three levels each in both males and females of two strains of mice (C57BL/6ByJ and BALB/cByJ; n = 216 cages, a total of 1152 mice). This allowed us to disentangle the effects of total floor area, group size, stocking density, and individual space allocation on a broad range of measures of welfare, including growth (food and water intake, body mass); stress physiology (glucocorticoid metabolites in faecal boli); emotionality (open field behaviour); brain function (recurrent perseveration in a two-choice guessing task); and home-cage behaviour (activity, stereotypic behaviour). While increasing group size was associated with a decrease in food and water intake in general, and more specifically with increased attrition due to escalated aggression in male BALB mice, no other consistent effects of any aspect of space allowance were found with respect to the measures studied here. Our results indicate that within the range of conditions commonly found in laboratory mouse housing, space allowance as such has little impact on measures of welfare, except for group size which may be a risk factor for escalating aggression in males of some strains.

S3 cages). A third mouse was then added to the cages containing groups of 8 (Step 3; 3 cages).
Steps 1 to 3 were then repeated, moving onto the second box of animals when the first was empty and then, steps 1 to 2 were repeated. Cage position on the rack (height and side) was counterbalanced by sex, strain, and group size between batches.

d) Ear-Tattooing of subjects
All mice were individually marked by ear tattoo. Animals of batch 1 were tattooed on the day of arrival immediately after allocation to cages. However, due to the death of one animal during restraint, mice of batches 2 to 6 were tattooed on the day following arrival to permit for recovery from transport and for acclimatization to re-housing. The same two experimenters (JDB and EM) tattooed the animals across all batches. Briefly, each animal was restrained by the scruff (JDB) and then tattooed (EM) in one of eight pre-defined ear locations.

Supplementary Methods 2: Open Field
The open field test was originally developed by Hall 1 , as a means of assessing emotionality in rats. Since then, many variations of this test have been produced, including differences in light intensity, time of testing, size of the apparatus, and measurement of outcome variables, to name a few. Thus, the term "open field test" has virtually no meaning without explicit specification of what construct is being measured [2][3][4] .
The open field remains a popular apparatus in rodent research, and in particular, has been used to identify and validate behavioural differences related to anxiety in both rats and mice 5,6 . More specifically, longitudinal assessment of the pattern of locomotor behaviour in the open field across repeated exposures has been demonstrated to provide information about how groups of animals cope with the presentation of a stressor and how the HPA axis differentially operates between groups of animals 7 .
In the present experiment, four open field arenas made of polycarbonate with dimensions 45 x 45 x 45 cm 3 were used to test mice in squads of four. Behaviour in the arena was digitally recorded (960P, 30 fps), across four days between 10:00 to 13:00. Overhead lighting was maintained at 40 lux in the centre of each field, and at 30 lux near the walls based on the recommendations of Martin-Arenas and Pintado 8 . Mice from four cages were tested per squad in a pseudo-random manner by two experimenters (EM and JDB), counterbalanced between batches for strain, sex, and treatment. Nine squads of animals (n = 36 mice, 4 per squad) were tested from batches 1 to 4 and seven squads (n = 27 mice, 3 per squad) from batches 5 to 6. Data from four squads (12 animals) on day 1 of testing were lost from batch 2 because of digital recorder failure.
Briefly, the cages which contained the four animals to be tested were brought to the testing room and the doors to the housing room were closed. The overhead lights were turned on and the four animals were removed from the cages. Each animal was placed into a single arena and allowed to explore for ten minutes. The animals were then weighed, replaced in the S5 home-cage, and returned to the housing room. The arenas were then cleaned with 70% ethanol and the next squad was tested.
Digital recordings were processed in Noldus EthoVision XT (version 11.5) by JDB. In EthoVision, the unit of distance was centimetres (cm) and the unit of time was seconds (s).
The floor of the arena was divided into sixteen equal squares (11.25 x 11.25 cm 2 ), with the four squares in the middle representing the centre of the arena and the remaining squares the periphery. The detection settings for tracking were selected so that both the percentage of samples in which the subject was not found and the percentage of samples skipped were less than 1% per trial. To ensure accuracy, a human observer verified the tracking of the software live as the recordings were analysed. Furthermore, each trial was edited within EthoVision such that each point was accurately scored and issues associated with automated tracking were eliminated 9 . The outcome variables of interest across the four days of testing were: 1) distance travelled, and 2) time spent in the centre.

Supplementary Methods 3: Guessing Task
In the present study, we used the same apparatus as described by Novak and colleagues 10 . The apparatus consisted of a box made of polycarbonate (20 x 50 cm 2 ) with a start box (10 x 10 cm 2 ) and two goal compartments (10 x 20 cm 2 ) each containing a goal-pot (Supplementary Figure 3). Subjects from each batch were tested in two cohorts in a pseudorandom manner by the same two experimenters (JDB, EM), counterbalanced for sex, strain, and treatment under red light in the dark phase. The timeline for a single cohort is shown below (Supplementary Figure 4).
Food Restriction: On the day following the completion of the open field test, cages were changed. The first squad of animals was fed 85% of their ad libitum daily consumption (based on the previous week's intake) for the next seven days. Of the 85% ad libitum ration, approximately 3 g of Bio-Serv™ dustless chocolate precision pellets (20 mg) were were placed in a goal-pot and put in the home cage on the first three days of food restriction, to reduce neophobia to the reward and associate the goal-pot with the presence of reward. All animals in the cage were weighed daily to ensure that body weight did not fall below 85% of the pre-restriction weight. If an animal's weight dropped below 85%, it was placed in a separate cage and fed ad libitum for 30 minutes.
Habituation: Habituation to the apparatus occurred across three days. On the second day of food restriction, the focal test animal and one randomly selected cage-mate were placed into the apparatus. Both goal-pots were present and six pellets were placed in each goal-pot as well as on the floor. After ten minutes, both animals were removed, the entire cage weighed, and then fed. On the third day, the same procedure was repeated but with only the focal animal. The procedure for the fourth day was the same as on day three, with the exception that there were two sessions of habituation (morning and afternoon) for five minutes. Each cage was fed only after the second session.

S7
Shaping: On the fifth day of food restriction, each mouse received 12 training trials across two sessions (morning and afternoon), in which both goal-pots were baited. In all trials, both goal-pots contained five inaccessible pellets at the bottom which was covered with wire mesh and served as a control for odour cues. Between mice, but not between trials, the apparatus was cleaned with a 70% ethanol solution. As soon as the mouse entered one compartment, access to the other compartment was blocked by closing the guillotine door. If the mouse chose the same side three times in succession, that side was closed in the following trial to avoid shaping the mouse to one side. A trial was completed when the animal's head (nose) was above the goal-pot, after which the animal was left to eat the reward. The mouse was then returned to the start box and the next trial begun. The cage was fed only after the second session.
Testing: The test phase consisted of 100 trials conducted over a maximum of three sessions; although all but 3 animals completed 100 trials across 2 days. For each trial, the start box door was opened and when the animal selected a goal-pot, the other compartment was closed. The animal was left to eat the pellet (if the choice was correct) and then returned to the start box. Each session was terminated after 30 minutes or as soon as the mouse started showing off-task behaviour 11 .
For each trial, only one goal-pot was baited, with a probability equalling the proportion of responses to the other side in the previous 20 trials. This randomization procedure was used to eliminate side biases which may confound the experimental paradigm and was determined by a custom written computer program 12 . In trials 1 to 19, side bias was calculated from all previous trials. Although reward side is unpredictable, choosing each side equally often will maximize the number of rewards. The mouse can do so by producing either a random or patterned sequence of responses. Patterned sequences (which show high sequential dependence) can be apparent as either series of repetitions or alternations or more complex sequential patterns, and indicate recurrent perseveration 13,14 .

S8
Perseveration score (logit P) was used as the primary outcome measure of recurrent perseveration. The score was calculated using 3rd order Markov chain analysis 15 , which describes the probability of a behaviour occurring as a function of previous behaviour (where the 3rd order considers the three previous behavioural responses) and provides a method to assess sequential independence while controlling for side bias. These analyses were performed using a custom written computer program which calculated the observed and   Table 3. Ethogram for the recording of stereotypic behaviour. Behaviour patterns were considered stereotypic if the same movement sequence was repeated continuously for at least 3 s (bar-mouthing) or at least three times in a row without pauses longer than 3 s between bouts (circling, cage-top twirling, back-flipping, route-tracing). Table 4. Distribution of stereotypic behaviour by strain and sex. Numbers outside brackets represent the number of cages (focal animal/cage). The proportion of the total number of cages is expressed within brackets. For stereotypy types, proportions are calculated from the total number of animals displaying stereotypic behaviour. Table 5. Mean proportion of intervals (± 95% CI) of observed stereotypic behaviour by type. Average values are calculated within stereotypy type, while the estimate of stereotypic behaviour is calculated between stereotypy type. S11 Supplementary Table 6. Distribution of missing data (%) for BALB males that reached early termination criteria. Numbers outside brackets represent the number of cages removed due to attrition from the experiment and the proportion of the total number of cages, n=4, is expressed within brackets. Table 7. Summary of outcome variables relating to sex and age by strain S12 Supplementary Table 8. Distribution of outcome data, collected and missing, by treatment, strain and sex. As the treatments in this experiment were contingent upon the maintenance of a constant group size throughout the duration of the experiment, complete data was not available for some of our outcome measures. However, for outcomes with repeated measurement such as body weight, data at available time points were used to inform parameter estimates of values when possible.