Novel insights into the behavioral analysis of mice subjected to the forced-swim test

The forced-swim test (FST) is one of the most widely used rodent behavioral assays, in which the immobility of animals is used to assess the effectiveness of antidepressant drugs. However, the existing, and mostly arbitrary, criteria used for quantification could lead to biased results. Here we believe we uncovered new confounding factors, revealed new indices to interpret the behavior of mice and propose an unbiased means for quantification of the FST.


INTRODUCTION
Major depressive disorder is a key public health concern. The results of the first Global Burden of Disease study in 1990 revealed a group of disorders that primarily cause nonfatal burden, quantified by years lived with disability and disability-adjusted life years. 1 According to data from the Global Burden of Disease study 2010, 2-4 mental and substance disorders accounted for 7.4% of disability-adjusted life years, and they were the leading cause (22.9%) of years lived with disability worldwide. 4 Within the mental and substance disorders group, major depressive disorder contributed most of the disability-adjusted life years and years lived with disability, with a global point prevalence of 4.7%. 5 From 1990 to 2010, the contribution of major depressive disorder increased by 37% to both disability-adjusted life years 3 and years lived with disability. 2 Also, major depressive disorder was predicted to be the second leading contributor to the burden of diseases by 2020. 6 Clinically, depression is a heterogeneous disorder in humans, and it is difficult to model the psychological aspect of the disease in animals. 7 There are just a limited number of tests available for detecting a 'depressed' phenotype in rodents; 7-9 among those, the forced-swim test (FST) is one of the most widely used tests across laboratories for assessing symptoms of depression. FST was originally introduced by Porsolt et al. in rats 10,11 and mice 12 for the screening of antidepressants. In this test, if treatment with a drug reduced behavioral immobility, 10,12,13 thought of as a measure of despair, the drug could be considered as an 'antidepressant'. [10][11][12] The advantages of the FST consist of its ease of use, reliability across different laboratories and trials and the ability to detect a broad spectrum of antidepressants. 14,15 However, the test also has some drawbacks, such as its unreliability in the detection of the effects of selective 5-HT reuptake inhibitors (selective serotonin reuptake inhibitors), which is a major group of antidepressant drugs. 13 Another important question is how to interpret the immobility of the animals in the FST. Researchers have been arguing that immobility is largely dependent on learning and memory in the rat FST, 14,16 but only limited research has been done on the role of cognitive processes in the mouse FST. As the FST was originally developed for rats, a species much better adapted to water than mice, some technical issues remain related to using mice in FST, a species for which the test remains to be adequately adapted. 7,8 In Porsolt's original FST for rats, the animals will go through two exposures. 10,11 The first exposure was to induce a stable level of immobility, and the second exposure was to quantify the immobility after drug or vehicle treatment, whereas for reasons yet unclear, mice show a sufficiently stable level of immobility during the last 4 min of a 6-min swim test. 13 Also, particularly in mice, the immobility may be influenced by factors other than a variation in emotional state, 17 which renders the interpretation of the results dependent on the strain of mice, 18-20 the water temperature 21,22 or possibly other factors. Despite the disadvantages inherent to carrying out FST in mice and the inability to really measure depression, the test is still widely used as some quantification of depressive behavior. The ability and potential to modify mice genetically, thus enabling better insights into molecular mechanisms underlying mental disorders, has created a demand for better adapting the test to this rodent species. To date, almost 40 strains of mice have been generated with a depression or antidepressant-related phenotype. 7 Over the years, researchers have made modifications to the FST to enhance its sensitivity, specificity and reliability. 15,17,[23][24][25] In this study, we set out to determine whether it is possible to quantify some of the factors affecting the behavior of mice. First, we revealed buoyancy as a confounding factor in the FST. We also devised new unbiased quantitative measurements of the behavior of mice, including developing a systematic way to determine the latency to immobility, and discovering an oscillatory pattern of behaving mice in the FST.
facilities. Mice were kept in the vivarium on a 12-h light/dark cycle with free access to water and food. Calbindin knockout (Calb1 −/ − ) mice weighing between 32 and 43 g (male, 62-70 weeks old) were used for the customized 8-min FSTs (see below). Alpha5 subunit containing GABA A receptor knockout heterozygous (Gabra5 +/ − ) mice (male and female, 10-13 weeks old) were used for some of the standard 6-min FSTs. For all other FSTs, wild-type (WT) mice weighing between 20 and 25 g (male, 10-18 weeks old) were used. Standard 6-min FST All 6-min FSTs were carried out using C57BL/6J mice (male and female, 8-70 weeks old, 20-43 g, see Animals section). The animals swam in standard 2 l glass beakers (diameter 13.1 cm) filled with water to 4.5 cm from the top so that the animals could neither touch the bottom with their tails, nor escape from the top. Before each test, the container was thoroughly cleaned. Water temperature was kept between 23°C and 26°C. [10][11][12]17 At the start of each test, an animal was gently picked up by its tail from the home cage and rapidly placed into the middle of the container. At the time the animal was placed in the water, the recording time was started and the duration of each standard FST was set to 6 min (for a customized FST, see below). The entire FST session was videotaped for later analysis (Figure 1b). After the test, the animal was removed from the water, dried with a towel and put into a warm cage (temperature of bedding 31-33°C) for 15 min before returning to their home cage. All animals were first-time swimmers and none were used for multiple FSTs.

Customized 8-min FST
For the customized FST, the setup for the FST was adapted for liquid exchange (Figure 1a). The container was made of transparent plexiglass. It was designed to be large enough (diameter 14.6 cm, height 27.9 cm) so that the animals could neither touch the bottom with their tails, nor escape from the top. To keep the level of water constant in the container during a liquid exchange, a thick (diameter 2.6 cm) vertical plastic pipe was connected to the container at its side, making a communicating vessel structure with the container, so that any excess liquid above 21.0 cm from the bottom could flow out. A 1.0-cm diameter plastic inflow pipe was fixed above the container to allow the exchange of solutions. The inflow was positioned above the center of the container so that the incoming liquid would not disturb the swimming animals, as mice swim along the edge of the container. In this manner, the entire process of exchanging all of the 3 l capacity of the container could be done within 30 s (Figure 1b). In contrast to the 6-min FST with 2 min of acclimatization and 4 min for quantification, the 8-min FST had two swim epochs. In epoch 1, mice swam in water for 4 min. After this, epoch 2 started with a liquid exchange lasting o30 s. After a total duration of 4 min for epoch 2, the animals were removed form the water and dried with a towel and put into a warm cage (temperature of bedding 31-33°C) for 15 min before returning to their home cage. All the animals were first-time swimmers and none were used for multiple FSTs.

Analysis
(1) Definition of immobility. All FST videos were scored by one individual in a double-mask manner. The scores were assigned as '0' for immobility and '1' for mobility with a time resolution of 0.1 s. A mouse was considered immobile when floating and/or making only necessary movements to keep the balance of its body or to keep its head above the water. [10][11][12][13] According to the current standard of general FST analyses, latency to immobility (t lat ) was defined as the time from start to the first bout of immobility lasting longer than 1 s, 13 unless otherwise stated. Fractions of total immobile time (F im ) were calculated as the total immobile time divided by the total swimming time used to calculate the immobility.
(2) Scoring of the FSTs. For the standard FST (6 min), the first 2 min were considered a time for the animals to explore and acclimate to the environment. For the full 6 min, mobility/immobility were scored, but for the calculation of F im , only the data of the last 4 min were used (F im = total immobile time during the last 4 min/240 s). For the calculations involving the calculation of t lat , data of all 6 min were used.
For the customized FST (8 min), the first 1 min at the start of each epoch was considered a time for the animals to explore and acclimate to the environment. In each epoch, the full 4 min mobility/immobility were scored, but for the calculation of F im, only the data of the last 3 min were used (F im = total immobile time during the last 3 min/180 s). For the calculations involving the calculation of t lat , data of all 4 min in each epoch were used. (3) Angle during immobility. We defined the angle (α) of an immobile mouse with respect to the water surface. After the acclimatization period (2 min for standard FST, 1 min for each epoch of the customized FST), one frame was taken out from each video at the first moment when the mouse stopped swimming and was in full profile view. The angle α was determined using the depth of the base of the tail and the distance between the intersection of the body with the water and the base of the tail (white lines, Figure 2b).
(4) Defining new thresholds for screening the first critical immobility. The durations of bouts of immobility (t im ) were taken from the entire swim session of each mouse. The cumulative probability of the immobility durations (Φ(t im )) was fitted with two cumulative normal distributions with the following equation: where A 1 +A 2 = 1.
In the equation, t im is the duration of each bout of immobility; A, μ and σ are the amplitudes, means and standard deviations, respectively, of the two normal distributions. The distributions were fitted using the NORM. DIST function in Microsoft Office Excel 2011.
To make sure that the Φ(t im ) of t im can be fitted with two rather than one normal distribution, we did a F-test to validate whether two normal distributions fit the data significantly better than a single distribution. For the F-test, we fitted the Φ(t im ) curves with both one and two normal distributions. When fitting with one distribution, A 1 = 1 and A 2 = 0 in the equation shown above. The F-values were calculated with the following equation: in which RSS 1 and RSS 2 are the residual sums of squares from fitting with one and two normal distributions, respectively; p 1 and p 2 are the number of parameters used for fitting with one and two normal distributions, respectively; and n is the number of points used for fitting. According to the equation of the normal distribution, p 1 = 2 and p 2 = 5. We then performed the F-test and calculated P-values were using the F. DIST function in Microsoft Office Excel 2011. As an example, for the mouse (e and f) Significant difference in F im (e) and α (f) of mice with water (Water, n = 11) or 1% soap solution (Part soap, n = 11) applied to their caudal areas. WT, wild-type.
Quantitative analysis of FST in mice L Chen et al shown in Figure 3f, when we were fitting the Φ(t im ) curve with one normal distribution (Figure 3f, green line), the RSS was RSS1 = 0.201, whereas the RSS decreased to RSS2 = 0.007 when fitting with two normal distributions (Figure 3f, black line). Accordingly, the F-value calculated from RSS1 and RSS2 was 223.689, and the P-value we got from the F-test was 1.897e − 19 , which means that two distributions fit the Φ(t im ) curve significantly better than one distribution for this mouse.
Of the 68 mice used in our tests, only in 7 mice (~10%) two normal distributions failed to fit the data significantly better than a single normal distribution. Therefore, there are two distinct populations in the distributions of t im s in 90% of the swim tests.
When fitting Φ(t im ) with two normal distributions, the new threshold (t c, critical threshold) for screening the first critical immobility was defined as a weighted mean as follows: In the equation, A and μ are the amplitudes and means, respectively, of the two normal distributions. For the seven cases with a single distribution, t c = μ 1 was used as the threshold value.
The new latency to immobility (t lat ) was then calculated as the latency to the first bout of immobility that was longer than t c .

Statistics
All the results are expressed as mean ± s.e.m. Statistical differences between control groups and experimental groups were determined by unpaired two-tailed Mann-Whitney test unless otherwise stated. P o0.05 was considered to be statistically significant.

Equipment
The behavior of mice during the swim test was recorded using a Casio Exilim camera (Dover, NJ, USA). The software used for scoring was Etholog (Eduardo B. This soap solution was used in certain swim tests to reduce surface tension of the water, diminishing air trapped in the fur. Consequently the total buoyancy of animals in a soap solution will be decreased.

RESULTS
For research not specifically described here, WT and α5 subunit containing GABA A receptor heterozygous (Gabra5 +/ − ) There was a significant difference between the fractions of total immobile time (F im ) of WT and Gabra5 +/ − mice (F im_WT = 30.4 ± 3.7% vs F im_Gabra5+/ − = 58.9 ± 5.9%, P = 0.0010, Figure 2a). However, we also observed remarkable differences in the animals' floating postures, which we quantified by measuring the angles formed between the surface of the water and the animals' axis while immobile (α, Figure 2b, See Materials and Methods). In WT mice, α was significantly larger than in Gabra5 +/ − mice (α WT = 61 ± 5°vs α Gabra5+/ − = 35 ± 6°, P = 0.0028, Figure 2c). Furthermore, α inversely correlated with F im (F im = − 0.7(±0.1)α+79.0(±5.2), Po 0.0001, R 2 = − 0.57, Figure 2d), indicating that mice with Quantitative analysis of FST in mice L Chen et al narrower angles swam less than those with wider angles. These results indicate that the different outcomes of the FST for Gabra5 +/ − and WT animals are conceivably not caused by a difference in emotional/behavioral state (for example, depression or helplessness), but by physical properties that also underlie the differences in α. The major upward force supporting an animal in water is buoyancy, which should be reflected by the floating angles during immobility. A wider α corresponds to a larger part of the body being submerged, that is, the animal is less buoyant as it needs more supporting force provided by the displaced water, and vice versa. The inverse correlation between F im and α may mean that buoyancy of mice could be a confounding factor in the FST.
To investigate this potential confound, we manipulated the buoyancy of mice by altering the air trapped in their fur, a key factor for keeping them afloat in water. [26][27][28] Less air should be trapped in fur when reducing the surface tension of water with surfactants. Hence, soap may be used to decrease the animal's buoyancy.
To change buoyancy during the FST, we designed a customized setup for liquid exchange (Figure 1a) and devised a customized 8-min FST protocol (Figure 1b; See Materials and Methods). At the half-time point of the customized FST, we exchanged water for either water (control) or a low concentration (0.5%) soap solution. Adding soap eliminated immobility (F im_epoch1 = 56.2 ± 3.6% vs F im_epoch2 = 0.0 ± 0.0%, P o 0.0001, paired two-tailed t-test, Figure 1c) making it impossible to measure α (as previously defined; See Materials and Methods). In the mild soap solution, animals assumed a vertical body posture, completely lost their ability to float and consequently were forced to swim constantly. For continuously swimming mice, it was impossible to exactly measure α, but as all these mice were almost straight down in the water we estimated α to be 90°. Therefore, adding soap solution significantly increased α (α epoch1 = 28 ± 2°vs α epoch2 = 90 ± 0°, P o0.0001, paired two-tailed t-test, Figure 1d). These results support the notion that buoyancy has confounded the results of the example experiment between Gabra5 +/ − and WT animals described above. Or, in more general terms, it shows that buoyancy can affect the outcome of the FST.
In the FST, the latency to first immobility (t lat ) is also considered a key measure for quantifying the state of 'behavioral despair' of the animal. [29][30][31][32][33] Traditionally, t lat is the delay to the first bout of immobility lasting longer than 1 s 13,33 or sometimes 2 s. 30,31 As far as we can tell, no reason is ever given why this exact threshold of 1 or 2 s has been chosen. Hence, this threshold of 1 or 2 s is entirely arbitrary, despite the fact that it can determine the final conclusion reached (see below). To address this, we started a systematic study of the durations of immobility (t im ) of all the mice used in the standard FST in water. We observed that different mice have very different distributions of t im s (Figures 3a and b): for some (for example, Figures 3a and b; open circles), most of the stops were shorter than 1 s, whereas for others (for example, Figures 3a and b; filled circles), most of the stops were much longer than the 1 or 2 s threshold. We then analyzed the very first stops of all the mice we used for standard 6-min FSTs and observed that the cumulative probability plot of the duration of the first bouts of immobility (t first_im s, Figure 3c) shows a wide distribution (n = 57 mice), with most t first_im s longer than 1 or 2 s. The histogram of the t first_im values follows a log-normal distribution (mean = 0.5 ± 0.1 log(s), s.d. = 0.4 ± 0.1 log(s), R 2 = 0.65, corresponding to a mean of 3.2 × / ÷ 1.3 s, Figure 3d). The wide distribution of the t first_im s may indicate that all t im s during a given FST in a single animal may be equally dispersed. Indeed, we found that in a given animal, the t im s greatly vary during the FST (Figures  3a and e). We further discovered that the cumulative probability of the t im s of a single mouse could be best fitted with the sum of two normal distributions, or in a few cases (7/68) with one normal distribution (Figure 3f; See Materials and Methods). Thus, it appears that most mice (61/68) have distinct short and long t im s. To account for the highly variable t im s, we propose that the threshold for screening the first critical immobility (t c ) should be objectively determined by taking into account the means and fractional contributions of the two distributions (See Materials and Methods).
To apply and test this objective method for determining t c , we compared the effects of using different t c s to calculate t lat of WT and Gabra5 +/ − (Hets) mice in the standard FST ( Figure 5, See Materials and Methods). Using the customary and arbitrary threshold of 1 s for all mice, there was a significant difference in the t lat s between the two groups (t lat_WT = 5.6 ± 1.4 s vs t lat_Hets = 10.1 ± 1.7 s, P = 0.0231, Figure 5a). However, this significance was not there when we chose the equally arbitrary threshold of 2 s (t lat_WT = 15.3 ± 6.6 s vs t lat_Hets = 11.9 ± 1.7 s, P = 0.0760, Figure 5b). Also, the latencies of the WT mice were significantly different when calculated with the two commonly used thresholds of 1 and 2 s (P = 0.0313). Therefore, choosing arbitrary thresholds is utterly unreliable for calculating t lat . Next, we applied our systematic and unbiased method to define the critical thresholds (t c , See Materials and Methods). First, we used the distribution of all the t im s taken from all the mice in both groups, and got a single t c of 5.53 s for all the animals. Using this t c , the t lat s in the two groups were not significantly different (t lat_WT = 64.3 ± 22.2 s vs t lat_Hets = 23.4 ± 2.3 s, P = 0.9172, Figure 5c). Then, we calculated t c s for each group separately (t c_WT = 3.20 s, t c_Hets = 10.30 s) by using the distribution of all the t im s from each group of animals. Using these two t c s, the t lat s in the two groups became significantly different (t lat_WT = 17.9 ± 6.9 s vs t lat_Hets = 38.1 ± 3.4 s, P o 0.0001, Figure 5d). We noticed that t c_WT and t c_Hets have a considerably large difference, so we then wanted to statistically test whether there is a difference between the t c s of the two groups. To do this, we used the distribution of the t im s of each mouse and determined an individual t c for each animal. We calculated the new latencies using individual t c s, and the t lat s were significantly different between the two groups (t lat_WT = 23.5 ± 7.8 s vs t lat_Hets = 39.5 ± 4.1 s, P o0.0001, Figure 5e). Moreover, we did find a significant difference in the t c s between the two groups (t c_individual_WT = 3.7 ± 0.5 s vs t c_individual_Hets = 12.6 ± 1.6 s, P o0.0001, Figure 5f). Therefore, the new determination of t c introduced by us can reflect important differences between the behavior of mice during the FST. Since various thresholds for both of the two groups yielded significantly different t lat values (P o 0.0001, Friedman test, Figures 5g and h), the critical measurement of t lat using objectively determined t c s should be used to determine the differences in t lat .
In the current quantification of the FST, all measured criteria (for example, F im and t lat ) reflect the activities of the animals as single stationary values applied to the entire duration of the FST. But why not quantify FST behavior as it continuously changes during the test? Such a continuous quantification of behavior has been used before for the FST 34 in rats, using kicking frequency as a readout. We devised a continuous FST behavior plot (Figure 4a) that for a given subject marks two binary states (mobile and immobile) plotted against time (resolution = 0.1 s, top two plots, Figure 4a). The group behavior can be expressed by averaging the states of all animals at every time point, resulting in the probability of being mobile (P mob ) as a function of time (bottom plot, Figure 4a). The plots are from the same mice shown in Figures 2e and f. Consistent with their overall activity, mice partially treated with soap had a higher P mob throughout the test. The plots reveal that the animals started out being mobile, changing their states more frequently at first, but later becoming immobile for longer periods, resulting in a lower P mob .
More thorough analysis of the P mob plots revealed similar synchronous behaviors in the two groups. There was an oscillation in P mob plots in both groups regardless of the treatment (Figures  4b and c), occurring at comparable times (174.2 s and 172.5 s, Figures 4b and c) after the start of the FST, with strong behavioral alternations (controls: 0.067 ± 0.004 Hz and part soap: 0.060 ± 0.004 Hz, P = 0.263), indicating that the animals change their mobile/immobile states with a cycle of 15 − 17 s. Notably, these analogous temporal sequences in behavior were present in both groups despite differences in their static behavioral measures (Figures 2e and f). Furthermore, the onset of this oscillatory swimming behavior must be fairly synchronous between the animals otherwise it would disappear in the averaged signal. Also, since the oscillatory behaviors start in both P mob plots at around 173 s, all the animals appear to have a similar 'waiting time' before starting the stop-and-go swimming behavior.

DISCUSSION
We addressed some pitfalls in the measurements currently used for the FST in mice. First, we uncovered buoyancy as a confounding factor in the quantification of immobility, and we propose that buoyancy of mice should be quantified to account for its impact on the interpretation of the FST. Second, we devised a systematic analysis of t im s in individual mice, defining an objective t c for the calculation of t lat . Last, we conceived a new analysis for the temporal profile during FST behavior. Our new findings will help obtain more quantifiable results and will provide better insights into the complex behavior patterns of mice during the FST.
According to our findings, buoyancy of mice, reflected by the measured angle α, is a confounding factor in FST. As buoyancy partly results from air trapped in the fur, it will be affected by fur characteristics that help trap air, such as the amount of surface lipids, the length of the fur and so on. [35][36][37][38] As shown here, the animals' buoyancy could be accounted for by angle measurements and should be considered when interpreting FST results as various drugs or treatments may alter the factors responsible for trapping air in the animal's fur. We have also shown a large individual variability in the t c that defines immobility. Therefore, we propose that t c should henceforth be objectively defined for each animal subjected to the FST as a new variable for the test. This value should then be used for an unbiased determination of the latency to immobility (t lat ) to uncover potential differences in behavior that would have previously gone unnoticed. Interestingly, we have discovered some distinct oscillatory patterns in the swimming behavior of mice during the FST. These results may indicate the presence of an invariant intrinsic biological clock that influences swimming patterns in mice, providing ground for further exciting investigations.