Introduction

Collective behavior emerges based on interactions between individuals in a group. This is observed at many different scales, from wound healing at the cellular level1, to task allocation in social insects2, group search behavior3, and information exchange on human social networks4. Hierarchical structures are common in animal groups, for example, in the grooming relationships of chimpanzees5, leadership and movement of pigeons6, and reproduction in cichlid fish7. Social network structure influences decision-making8,9, and dominance position within a network can influence an individual’s fitness10.

With rats, previous work has shown that individuals within a group have a social status related to dominance and that aggression and avoidance behavior are key elements of social interactions11,12,13. However, it is not known how individual interactions lead to the overall social structure of the group and how social structures change over time14. Fortunately, new automated tracking methods enable long-term tracking of individuals within social groups, providing a quantitative description of behavior and interactions15. For example, recent work has analyzed the ontogeny of collective behavior in zebrafish16, lifetime behavioral differences in honeybees17, and how genetic relatedness corresponds to group social structure in mice18.

While it is acknowledged that long-term multi-modal characterizations are required to describe complex social behaviors, there are still a number of challenges19. An approach applicable to both lab and field conditions is to tabulate appropriate pairwise interaction information of animals in groups and use this to define measures of an individual’s position in the social hierarchy. With mice, both aggressive and non-aggressive interactions have been used to define dominance hierarchies20,21,22,23,24,25,26. With primates, for example, pairwise “supplanting” interactions, such as when one individual displaces another from a food source, have been used to determine an individual’s ranking27,28,29.

Automated identification of approach-avoidance interactions has been used in previous work as a scalable method to characterize group hierarchy and leader-follower relationships6. Previous work with rodents, including rats30 and mice15, has also considered approach and avoidance events. An approach-avoidance event occurs when one individual approaches another, but the other individual moves away (either by retreating or escaping).

There are multiple ways to compute social rankings based on pairwise interactions such as approach-avoidance events. The Elo score, which was originally developed to rank chess players and predict the outcome of future matches31, is commonly used in animal behavior to calculate an individual’s ranking using pairwise contest or interaction information32,33,34. If network “flow” is defined from winners to losers in an interaction network, the network measure of “flow hierarchy,” which refers to how information flows through the network, can also be used to quantify hierarchical structure. The local reaching centrality (LRC) considers flow hierarchy in a network and quantifies a node’s ability to efficiently reach other nodes in its immediate neighborhood35. Since “flow” occurs in the direction from winner to loser (or dominant to less-dominant), dominant individuals have higher local reaching centrality. Global reaching centrality (GRC) is calculated using the distribution of LRC scores, and this metric has been used to quantify the steepness of hierarchical structures in many different systems, including groups of horses36, ant colonies37, brain networks38, industrial trade networks39,40, and scientific citation networks41. In animal behavior, in addition to the individual Elo scores, metrics employed to measure social dominance hierarchies also include directional consistency of contests or interactions, proportions of interactions based on rank, transitivity, linearity of the hierarchy, and David’s score or Elo score distributions across the group24,42,43,44,45.

In this work, we developed an open-source vision-based automated tracking and behavioral characterization system to analyze the social behavior of small animals like rodents. We use this system to continuously monitor interactions and behavior of rat groups, enabling us to quantitatively examine the temporal evolution of social structure and the roles of individuals within these groups. We note, however, that rodent groups have inherently complex behavior and social structures, making it difficult to draw broad conclusions about the rules and processes that govern observed structures. We, therefore, focus on characterizing the results of our experiments and on using multiple metrics to describe various dimensions of the social structure.

For an extended 36-week period, we tracked the social behavior of 28 rats divided into several groups, and calculated behavioral metrics and interactions to analyze both individual and group behavior. We examined how individual behavioral differences persist and combine to form new social structures when the composition of the group is altered. At first, the rats were divided into 4 groups of 7, and following this, we merged groups. For the final sequence of group experiments, we then created 4 new groups of 7 and altered the size of the living areas. Following the completion of the group experiments, we ran individual behavioral assays on each rat and compared the results to those of the group experiments. This combination of methods and experiments enabled us to (1) identify a wide range of individual locomotion and social behaviors, (2) investigate the formation and details of dominance hierarchies, (3) investigate the effect of group composition changes and the associated social stress on the behavior of rats living in groups in enriched environments, and (4) compare behavioral assay results to behavior observed in a group setting. Overall, our work demonstrates scalable methods for describing long-term changes in animal group social structure and emphasizes the need to use such methods to obtain a full picture of group social structure and interactions in natural or semi-natural environments.

Results

Long-term tracking and quantifying individual behavior

We tagged individuals with color markers and employed automated tracking to determine each rat’s movement over time (Fig. 1A–D). Over the course of the experiment, we performed manipulations to alter the group composition and the living area available for the group to use. We used two different breeding lines of Wistar laboratory rats, denoted A and B (see “Methods” section for details), with associated individual labels \(a*\) or \(\alpha *\) and \(b*\) or \(\beta *\), respectively (see Fig. 1E). We initially divided the rats into four groups of seven, with A rats in groups A1 and A2 and B rats in groups B1 and B2. The rats remained in these groups for the first observation period, which lasted a total of 21 weeks—we denote this time as phase 1. Following this, in phase 2, we merged groups A1–A2 and B1–B2 for three weeks and then merged all for three weeks by opening portals between their compartments. For the final series of group experiments in phase 3, rats from each original group were mixed together to create four new groups. The reshuffling in phase 3 was done according to body mass at the end of phase 2 (mean = 480 g; min = 364 g, max = 613 g; Q1 = 423 g, median = 481 g, Q3 = 532 g) , allocating rats to new groups by ensuring that each group had the full range of masses and included members from every previous group (\(G1_{min}\) = 394 g, \(G1_{max}\) = 541 g, \(G2_{min}\) = 372 g, \(G2_{max}\) = 587 g, \(G3_{min}\) = 400 g, \(G3_{max}\) = 613 g, \(G4_{min}\) = 364 g, \(G4_{max}\) = 555 g). Figure 1E shows the experimental structure and the associated measurement periods.

We calculated automated trajectory-based behavioral metrics to quantify behavior over the duration of the experiment. We calculated and averaged each metric over successive time periods of 3 weeks (denoted as Pd), with associated numbers 1–12. We use the summary metrics to ask how behavior changes over time, how individuals differ, how groups differ, and how previous individual and group behavior predicts changes when new groups are formed.

Fig. 1
figure 1

Experiment setup and timeline. (A) Photo of the rats with color-codes for individual identification and tracking. (B) Still image from the video that was used for tracking (from group G1, during Pd 10) taken by a light-sensitive camera at low lighting conditions. Image overlaid with labels indicating the important objects (water, nestbox, etc.). (C) Continuous tracking allowed for the reconstruction of each individual’s space use. The heatmap shows the space use of two rats during a 3-week period at the beginning of phase 3. Areas used only by a3 are shown with red, only by \(\beta 1\) with green, and areas visited by both (e.g. at the water and the feeder) are shown with yellow. (D) Trajectories were used to identify dominance interactions in the form of approach-avoidance events, where one individual approaches another, but the other moves away (by backing up or fleeing). Shown is an example of trajectories from group G3 in period 10. Lines show locations for 60 seconds, with the semitransparent circles of increasing size showing the more recent positions. (E) Overview of experimental manipulations. We calculate behavioral metrics over each 3-week “period” (abbreviated as Pd). Phase 1 had rats in original breeding line-sorted groups A1–A2 (line A), B1–B2 (line B), for a total of 7 periods. Each rat is labeled with lowercase letters a/\(\alpha\) or b/\(\beta\) according to breeding line. Individual numbers within each group are sorted in ascending order according to rank as determined by Elo score at the end of phase 1, i.e. a1/a7 were the highest/lowest ranking individuals in A1 during Pd 7, \(\alpha 1\)/\(\alpha 7\) were the highest/lowest in A2, etc. In phase 2, the groups were mixed together by breeding line during Pd 8, and then all together for Pd 9. At the beginning of phase 3 (Pd 10), new groups were formed (G1–4). During Pds 11 and 12 in phase 3, the compartment area sizes were changed (see “Methods” section and Fig. S8). At the end of the experiments, individual behavior was assessed by traditional individual and pairwise assays.

To assess dominance-related interactions and social structure, we tabulated approach-avoidance events between all pairs of rodents in each group. This automated method defines “events” as when a pair of rats come close to each other: the “displacer”, i.e. the dominant rat in an event, subsequently stays in place or continues moving forward, while the other (the “displaced”, i.e. subordinate rat) move away6. This type of approach-avoidance interaction can also be dynamic, such as when one individual chases another. We use the matrix of approach-avoidance events to calculate metrics that describe the dominance structure of each group and each individual’s position in this structure.

Breeding line and group differences

We use automated measures of space use and pairwise interaction events to characterize individual and group behavior. We first examine general differences between breeding lines.

In the beginning, the rats were juveniles and were growing rapidly, as shown by the large increases in body mass during this time period. The A rats were, on average, significantly larger than the B rats during each period (T-test comparing average mass of A rats to B rats yields p < 0.001 for each period). All rats had approach-avoidance events during the experiment, and there were no consistent significant differences among the breeding lines. However, there was an increase in the number of events per rat in phase 3 compared to phases 1 and 2 (Mean number of events per rat in phases 1, 2, 3, respectively: 447, 494, 976; T-test mean of phase 1 to phase 2, p = 0.64; mean of phase 1 to phase 3, p = 0.0044; mean of phase 2 to phase 3: p = 0.0176).

The metrics of time at feeder, distance from wall, home range46, time at top of nestbox, and time on wheel describe space use. While the breeding lines did not have general differences in time at feeder or time on wheel, line A rats tended to be farther from the wall, visited more parts of the living compartment (larger home range), and spent less time on top of the nestbox in comparison with B rat groups. However, while these differences were clear during phase 1, the differences in distance from wall and home range decreased when the lines were mixed, with home range no longer significantly different from Pd 9 onward, and distance from wall no longer significant in Pd 12. Breeding line differences in time spent at top of nestbox showed a large increase when group membership was changed in Pds 9 and 10, but subsequently decreased and were not significant in Pds 11 and 12 (Fig. 2). Note that one group (G1) in phase 3 displayed a different pattern of wheel usage than other groups, with several rats spending a very large amount of time on the wheel at the same time and thus unable to use it for running (Figs. S3, S4); however, there were no breeding line differences in this behavior. Overall these metrics suggest that the different breeding lines differed in their space use tendencies, but differences decreased when rats were placed in mixed groups in phase 3.

Fig. 2
figure 2

Breeding line comparison and correlation. (A) Per-line body mass, average number of events, and space use metrics. Significant differences between breeding lines for a designated period, as determined with a T-test for difference in means, are denoted as follows: p < 0.05 with *, p < 0.01 with ** and p < 0.001 with ***. See also Fig. S3 for space use compared according to group, and Fig. S4 for space use metrics for each individual rat. (B) Correlation with the previous period, calculated across all rats with respect to a particular metric. Shaded area shows confidence interval calculated via bootstrapping. Note that values significantly different from zero are when the confidence intervals do not contain zero. (C) Correlation with Pd 7 (the last measurement in phase 1). Shaded area shows confidence interval calculated via bootstrapping.

We quantify changes in individual behavior using the correlation coefficient across periods. This shows that individuals have consistency in number of events and space use, as demonstrated by the generally positive correlations during the entire observation period (Fig. 2B). However, while there is consistency from one period to the next, Fig. 2C shows that small behavioral shifts over time can accumulate. Moreover, we see that the re-groupings facilitated changes in behavior. This is demonstrated by the sharper decrease in the correlation of behavioral metrics with Pd 7 in phases 2 and 3 compared to that in phase 1 preceding Pd 7. In particular, while the correlation coefficient for home range and time at top of nestbox during Pds 11 and 12 showed high correlations (Fig. 2B), the correlation of these 3 measurements with Pd 7 values was lower (Fig. 2C). For example, for Pd 12, the correlation with the previous period for home range was 0.77 (95% CI [0.68 0.87] and for top of nestbox was 0.89 (95% CI [0.71 0.95]), while the correlation values with Pd 7 were 0.26 (95% CI [− 0.13 0.57]) and − 0.17 (95% CI [− 0.5 0.16]), respectively. This indicates that the new behavioral routines of phase 3 differed from those of phase 1.

Metrics for group social structures

With the pairwise approach-avoidance interaction matrices for each period, we use multiple metrics to characterize different aspects of group social structure and an individual’s placement in this structure. The metrics to characterize individual social placement include Elo score, David’s score, local reaching centrality, and fraction of events dominated, and those to characterize group social structure include Elo score steepness, David’s score steepness, global reaching centrality, directional consistency index, and triangle transitivity index. In this section we use idealized networks (shown in Fig. 3) to illustrate what the group social structure metrics represent. Note that while other work has used similar idealized or artificial networks as “categories” to label group social structure47, here we use the ideal networks (including connected hierarchy, line, layered hierarchy, layered-half, non-transitive, single dominant, single out, and symmetric) not as categories, but rather to give intuition for how the different metrics describe different aspects of the social structure. In the following section, we report the metrics for each group and use them to describe the experimentally observed structures.

The Elo score steepness (ESS) is a measure of the spread of the distribution of Elo scores across the group. It is calculated by converting the Elo score to a success probability, summing normalized values across group members, and calculating the slope of a linear regression fit to the resulting values45. The David’s score steepness (DSS) (often referred to simply as hierarchy ‘steepness’, or ‘classic steepness’42,45) is calculated as the slope of a linear regression fit to the normalized David’s scores among group members48. Individual local reaching centrality (LRC) uses the directed network of excess pairwise event outcomes (positive entries for rats in a pair that was dominant in more events, and zero for the other rat—see “Methods” section) in order to assign higher scores to individuals in higher positions within a group hierarchy. For an unweighted directed network, LRC is the fraction of nodes reachable by any given node; a generalization of the metric accounts for weighted connections35. Global reaching centrality (GRC) is the average difference of nodal LRC with that of the highest LRC of any node in the graph, and a higher GRC indicates a more hierarchical network35.

The directional consistency index (DCI) is the fraction of events dominated by the more dominant individual of each pair, with 1 corresponding to perfect predictability in the outcome of a pairwise event (i.e. one individual is always dominant), and 0 representing an exchange of approach-avoidance outcomes (i.e. each individual dominates the same number of events)42,49. The triangle transitivity index (TTRI) is the fraction of triad relationships that show transitivity in pairwise event dominance outcomes (i.e. if \(a\rightarrow b\) and \(b \rightarrow c\), then \(a \rightarrow c\) for a transitive triad)43,50.

From Fig. 3 we note that the ESS and DSS, which both aim to measure the steepness of hierarchy within a group, show similar trends at times and differ at others; both have high values for the connected hierarchy network but differ for the line network. We also note that the aspects of the network structure described by ESS/DSS versus GRC are different (c.f. differences in the connected hierarchy, layered-half, and single dominant networks); the former is maximized when a well-connected structure exists (i.e. the hierarchy shows a clear distribution that lends itself to a linear regression fit), while the latter is maximized when more extremes in hierarchical structures exist (for example, the single dominant). Although a comprehensive evaluation of these metrics is beyond the scope of this study (see, for example45), here we calculate and examine multiple metrics to ensure a robust interpretation of the data, as well as to facilitate comparison of our findings with other assessments of group social structure found in the literature.

Fig. 3
figure 3

Idealized networks and group social metrics. The table at the top shows the scores calculated: Elo score steepness (ESS), David’s score steepness (DSS), global reaching centrality (GRC), directional consistency index (DCI), and triangle transitivity index (TTRI). Note that the GRC is not defined for the symmetric network, and the TTRI is not defined for networks that do not contain any dominance triads. The different idealized networks have 7 nodes, and individual entries are either 100 or 0 (for the layered-half network, 100, 1, and 0 are used). The connected hierarchy network has a non-symmetric structure. The line network has a single “line” of pairwise interactions, where each individual only interacts with one other. The layered hierarchy network has a single individual who dominates all others and two other sub-dominant individuals who only dominate the four others below them. The layered-half network has the same structure but lower values for the subordinate individuals. The non-transitive network has individual 1 dominating 2–3, 2–3 dominating 4–7, but 4–7 dominating 1.The single dominant network only has events with the dominant individual. The single-out network only has events with the subordinate individual. The symmetric network has equal event dominance probability among all pairs. In the table, the highest score for each metric (rows) is in bold, and the lowest score is in italics. The ESS is highest for the connected hierarchy and line networks, the DSS is highest for the connected hierarchy network while low for the line network, and the GRC is highest for the layered-half (i.e. structured but nonlinear hierarchy) and single dominant networks. For the symmetric network, the DCI is 0 because there are no consistent dominance relationships; for other networks, dominance is one-sided and the DCI is 1. The TTRI is 0 for the non-transitive network, and 1 for other networks where dominance triads are predicted. In addition to matrix plots, each network is visualized by showing connections in the direction of the more to less dominant individuals in each pair (note that no connections are shown for the symmetric network because, in this case, there are no differences in event dominance probability). We show both a circular layout and a layout based on Elo scores, where individual nodes with higher Elo scores are shown higher up on the y-axis.

Group social structures

We find that groups differ in their social structure, but within-group structure shows consistency when group membership remains unchanged. We compare phase 1 and phase 3 group social structures because the associated periods all featured groups of 7 rats. We show results for all group social structure metrics, as well as the mean number of events and fraction of events with the dominant rat, and note instances where the trends for the metrics are similar versus contrasting.

In phase 1, in contrast to space use, which showed clear breeding line-based differences (Fig. 2), we do not see clear differences between lines A and B in terms of overall group social structure. While the fraction of events with the dominant rat was higher for the A groups in comparison to the B groups, other metrics do not show large or consistent differences (Fig. 4). Within phase 1, each group showed consistency in the social structure over time, with the exception of Pd 6 for group A2, where a large change in the individual rank ordering in the social structure of the group took place. This is seen in the network visualizations, as well as in the positive correlation of Elo scores from one period to the next (Fig. 5A,B).

Fig. 4
figure 4

Measures of group social structure. The mean number of pairwise events and the fraction of total events with the most dominant rat are shown in addition to the hierarchy-related metrics of Elo score steepness (ESS), David’s score steepness (DSS), global reaching centrality (GRC), directional consistency index (DCI), and triangle transitivity index (TTRI). The fraction of events with the dominant rat (where “dominant” is defined as the individual with the highest Elo score) is analogous to the measure of “despotism” used in other work22; the dashed line shows the expected value if all pairs of rats have the same number of events. The metrics are calculated for each period and are shown as boxplots for each phase 1 and phase 3 group. See also Fig. S7 for values for each group over time.

Fig. 5
figure 5

Social structure network visualizations. (A,C) A visualization of group networks during (A) the last 3 periods of phase 1, and (C) phase 3. Columns correspond to different groups and rows for each period. The position of each individual on the y-axis is set according to their Elo score. The direction of each connection indicates which individual dominated more events in the pair (e.g. a connection \(a4\rightarrow a6\) indicates that a4 more often displaced a6 than vice versa), the color indicates the fraction of events dominated, and the transparency is proportional to the total number of pairwise events relative to the mean for that group and period. (B,D) The correlation of individual Elo scores with the previous period, for (B) phase 1 groups, and (D) phase 3 groups. Dashed line shows baseline correlation value calculated by shuffling groups and periods during phase 1 (B) or phase 3 (D). See also Fig. S5 for individual metrics (including num. events, fraction dominated, Elo score, David’s score, and reaching centrality) for each individual rat plotted for each period.

In general, we saw larger differences between the groups in phase 3 in comparison to those in phase 1. In phase 3, G1 and G3 each had a single consistent dominant individual, G2 had ongoing changes in social structure, and G4 had a stable hierarchy but with ongoing events. All groups had consistency in structure, but the correlation of individual scores with the groups was higher for groups G1, G3, and G4 in comparison with group G2 (Fig. 5C,D).

Compared to other phase 3 groups, G1 and G3 had a relatively low number of events and a high fraction of events with the dominant individual. Each of these groups had a single individual that was consistently ranked as most dominant (see Fig. 5C). However, in comparison to G1, G3 had on average higher DSS, ESS, and GRC. This and the higher DCI index suggests that group G3 had a steeper hierarchical structure than group G1.

In contrast to groups G1 and G3, group G2 did not have a single individual that remained dominant during each period. Group G2 had many events, the lowest fraction of events with the dominant rat, and the lowest transitivity (TTRI) of the groups. This and the lower correlation coefficient in Elo scores compared to other phase 3 groups suggest an ongoing struggle for position within the social network where ongoing events maintained pairwise relationships. However, we note the differences obtained in the hierarchy steepness measures for group G2: the David’s score steepness suggests a weak hierarchy, while the Elo score steepness and GRC suggest hierarchies definitely exist.

Group G4 had similar patterns of metrics to Group G3, but with several distinct differences: these include lower magnitudes of ESS, DSS, GRC, and DCI, a lower fraction of events with the dominant individual, and overall many more events (although the mean number of events decreased dramatically from Pd 11 to Pd 12—see Fig. S7). With this, we can describe G4 as having a middle-steep hierarchy that was maintained by many ongoing events among pairs. This differs from G1 and G3, where the high fraction of events with the dominant individual suggests that the hierarchy was maintained mostly by these events.

During phase 3, the area available to each group was changed during Pds 11 and 12 by moving the compartment borders that separated the groups. In Pd 11, G1 & G4 had a larger area and G2 & G3 a smaller area. Pd 12 had these sizes switched. These manipulations did not have a consistent effect on space use or group social structure metrics (Fig. S8).

Individual social rankings, changes over time, and body mass

We found that previous social status in phase 1 did not predict an individual’s placement in the new group social structures of phase 3 (Fig. 6A). This result holds if instead of using absolute Elo score values as shown in Fig. 6A, rank scores of subordinate and dominant are used for the lowest two and highest two Elo scores, with other assigned as middle ranking (see Fig. S6). Because the individual ranking metrics are correlated (Fig. S5), for clarity we focus on showing results with the often-used Elo score32,33,34; however, we also note that the other individual social metrics (including faction of events dominated, David’s score, and local reaching centrality) showed similar trends (Fig. S6) with respect to predicting individual placement.

Because social ranking can be used to regulate access to resources such as food, we further examine the relationship between social rank and weight gain/loss. We compare average individual social rankings to weight gain or loss during phase 3. At the beginning of phase 1, all rats were young and gaining weight. However, by the end of phase 1, the average weight gain from the previous period was small, and not all rats were still gaining weight. When the groups were merged in phase 2, the average change in body mass (\(\Delta\) body mass) continued to decrease and was negative for the last period of phase 2 and the first period of phase 3. Specifically, in the new groups of phase 3, the variance in the distribution of \(\Delta\) body mass increased, with one rat (rat \(\alpha 4\), which was subsequently permanently excluded from the experiment) losing nearly 100g relative to the previous period (Fig. 6B).

Figure 6C shows that although the two most dominant rats during phase 3 gained the most weight (rats a3 and a1 from groups G1 and G3, respectively), average social dominance rank was not a robust general predictor of body mass changes across all rats. Including these dominant rats, the relationship between the average Elo score during phase 3 and the change in body mass during phase 3 has a significant correlation with the values shown in Fig. 6C. However, this result is not robust: if these two rats are removed, the correlation drops greatly to \(r=0.16\) and is no longer significant (p = 0.44). This result also holds if subordinate-middle-dominant ranks are used instead of absolute Elo score values (\(r=0.48\) and p = 0.013 including all individuals; \(r=0.26\) and p = 0.23 removing rats a3 and a1). This demonstrates the complex relationship between dominance and body size in rodent social groups51,52. While there was likely a feedback regarding social rank and body mass for the two dominant rats in groups G1 and G3, respectively, it is difficult to link weight gain/loss to social rank in a general sense.

Fig. 6
figure 6

Phase 1 to phase 3 individual rankings, body mass changes and social ranking. (A) Comparison of individual rat ranking metrics at the end of phase 1 (Pd 7) to those at the start of phase 3 (Pd 10), after the new groups were formed. (B) Distribution of body mass changes over time for all individuals. The middle line is the mean, and rugged curves indicate the maximum/minimum across all individuals. (C) Average Elo score during phase 3 compared to body mass change during phase 3 for each rat. Change in body mass during phase 3 is calculated as the body mass in Pd 12 minus body mass in Pd 10.

Individual metrics compared to behavioral assays

We used individual and pairwise assays performed after the group experiments to test the behavior of each rat. The individual black and white box, canopy, and elevated plus-maze results were used to define a composite boldness score. A pairwise social test with an unfamiliar rat, where two individuals are placed together and various behaviors characterizing interactions, such as sniffing the other, are scored (see “Methods”; Fig. S10), was used to define a social interaction score. This social test, which is also referred to in the literature as the “reciprocal social interaction test”, has been widely employed for behavioral phenotyping related to anxiety and autism53,54,55,56.

In general, we find low and/or inconsistent relationships between behavior in groups and behavior in the assays (Fig. 7A). This is shown by the comparison of behavioral metrics from the last period in phase 3 with the individual boldness and social interaction scores. In particular, the social metrics measured in a group setting, including the number of events and the Elo score, do not exhibit consistent or significant correlations with the pairwise social interaction score. Although we see positive correlations for the boldness score compared to the related metrics of distance from wall and home range, and a negative correlation with top of nestbox, these correlations are not significant (p > 0.05), with a notable remark that the 2 most dominant individuals (a3 of G1, a1 of G3) have high boldness scores within their group. However, when considering the individual assays separately, we do find a significant correlation between top of nestbox and time spent in the open area during the elevated cross assay (Fig. S11A). We also find that breeding line and group membership do not consistently predict differences in individual test scores (Fig. 7B).

The social interaction score shown in Fig. 7A,B was obtained via pairwise behavioral assays performed with an unfamiliar rat. We repeated these tests with a second unfamiliar rat in order to test repeatability (tests were also performed with a familiar rat from the respective phase 3 groups—see Fig. S13). The composite scores from the tests with the second unfamiliar individual show a low correlation with the scores obtained with the first unfamiliar individual (Fig. 7C, p > 0.05).

Other work has noted that individual behavior in assays may depend not only on an individual’s social dominance status, but also on the nature of the social hierarchy of the group to which the individual previously belonged47. To test this, we fit a linear regression model predicting individual behavior assay results based on a combination of individual metrics and the Elo score steepness (ESS) of the group where the individual was located during Pd 12. While including this additional information increased explanatory power, we did not observe consistent significant patterns (Fig. S12).

Fig. 7
figure 7

Behavioral metrics at the end of phase 3 compared to assays. (A) Pearson correlation values for space use and social behavioral metrics from the final period in phase 3 (Pd 12) with individual assay scores. Labels and color scales denote correlation values. Note that none of the correlations are significant (all p-values \(>0.05\), calculated using t-distribution). (B) Individual score distributions according to breeding line (left), and by phase 3 group membership (right). Scores are normalized by the mean and standard deviation of values measured for all rats. (C) Comparison of social interaction scores calculated from tests with a first unfamiliar rat (x-axes; values shown in A,B), with scores calculated from tests with a second unfamiliar rat (y-axes). See also Figs. S11 and S13.

Discussion

We utilized automated tracking techniques to describe how rat groups develop and maintain a dynamic social structure over time, as well as how the social structure changes after regrouping. Across successive periods, we observed a general consistency in the behavior of both individuals and groups. However, considering longer periods of time across multiple periods, we observed that the gradual accumulation of small changes can result in substantial behavioral changes over these longer time scales. In addition, when the group composition was altered, we observed accelerated changes in behavior. Multiple metrics, including the Elo score, David’s score, and reaching centrality, were employed to describe the overall hierarchical structure of each group as well as individual social rankings; these metrics revealed both similar and dissimilar aspects of the structure. Different groups can vary a lot in their structure, and in particular, we found different and distinct social structures in the newly formed groups of phase 3. We found that an individual’s new position in the social hierarchy cannot be predicted based on their prior status when group composition changes. While conventional individual assays (including the elevated cross, canopy, and other tests) produce consistent test results, we found that these measures have little correlation with individual behavior in a group setting. Moreover, we found low repeatability in the scores measured with standard social test assays by performing the same test with different individuals; this also contributes to why behavior assays have little correlation with group behavior.

At the beginning of phase 1, the rats were still juveniles. Social interactions, particularly those related to aggression and dominance, are known to develop over time11,57,58. Our observations are consistent with this, in particular, because we observed an increase in the number of approach-avoidance events and fights at the end of phase 1 (Fig. S2). It is likely that in addition to group composition and interactions, the development stage also influenced the number of events and the differences in social structure from phase 1 to phase 3. In a natural population, groups consist of individuals of different ages57. Targeted group mixing experiments—for example, with both juveniles and adults in the same group—could be used to ask how these effects interplay to generate overall emergent group social structures.

Behavioral assays are often used to quantify the behavior of rodents, and many new tools are being developed for both individual and social tracking and behavioral scoring59,60,61,62,63,64. Tests that have been used to quantify social behavior in rodents include, for example, reciprocal social interaction, social approach partition, social preference, social transmission of food preference, food allocation, and reciprocal cooperation13,54,65.

Most of these tests rely on creating artificial situations in order to measure the corresponding behavioral outcome. We compared social behavior measured in a group setting to scores of individuals measured with the reciprocal social interaction test. The pairwise social interaction test has known limitations, such as environmental dependencies, the possibility of aggression, and limited clarity as to which rat-initiated interactions66,67. Although other behavior assays, such as the social preference test using a T-maze or modified home-cage tests, attempt to address limitations, these assays also have their own limitations66,67. It is an open question as to whether the social behavior measured in such tests can predict the social behavior under more natural conditions that include complex social interactions68. In this respect, social interest/interaction and social dominance may represent different aspects of behavior, with the latter possibly only able to be measured in a group setting. Our comparison highlights the need for further work in this area.

We also note that while our group-living experiments provided space for complex environmental and social interactions, the conditions in the experiments were still much different from those that rats experience in the wild. In this respect, our methods are similar to recent studies with mice, such as experiments with groups in the “social box”20,21,22. These experimental setups can, therefore, be described as semi-natural. A unique aspect of our study is the extended observation period, which allowed us to examine not only group social structure but also its change over time.

The study of social behavior is particularly important in animal models utilized for the understanding and treatment of social-related neuropsychiatric disorders. Rodents such as mice and rats have been indispensable as model biological organisms, with particular relevance to clinical research due to their short lifespan and tolerance to laboratory environments69,70,71. However, the laboratory environment typically restricts the development of complex social behaviors; for example, rats are often kept in small cages and then transferred to separate environments to examine social interactions using simplified behavioral assays like the 3-chambered social preference test56. It is therefore not surprising that the translational relevance of these tests is limited72,73. Incorporating environmental and social complexity into experiments can increase the generalizability of conclusions drawn from laboratory studies74,75. Moreover, long-term studies to examine group behavior may be an essential component to include in translational research applications, for example, in the testing of psychotherapeutic drugs to treat social anxiety72,73. In particular, an important topic for future work is to establish standardized and reproducible tests and measures that are properly representative of a full range of social behavior75,76. Furthermore, we note that much of our neuroscientific understanding of social behavior comes from dyadic interactions and reduced forms of social interactions74,77,78,79. In light of the growing interest in the neuroscience of natural social behavior68, going beyond basic social testing paradigms lends the opportunity to unravel a richer repertoire of neural mechanisms.

We note that while our social behavior analysis was used with video tracking data, it could be applied to other types of data, for example, data derived from markerless tracking methods, motion capture or QR-code tracking. Future work can continue to expand on methods in this area, for example, including more detailed posture data, which can be used to describe social interactions in more detail80,81,82,83. Systems in this direction have already been developed for use with mice61. While we used automated detection of approach-avoidance interactions to define pairwise events, we note that a more detailed approach could use a combination of multiple behavioral interactions in order to define event “winners” and “losers”84. Moreover, detailed insight could be gained by using a hybrid method when the automated detection is followed by manual scoring of the behavior85,86. Approach-avoidance as a measure of dominance has limitations, as the subordinate animal can freeze in place, with lack of movement signifying its subordinate status87 and this would be missed by the automated scoring scheme. Another area for future work is testing the functional consequences of group composition and social structure on individual or group performance, for example, with respect to search3.

Methods

Experimental model and subject details

Subjects

We used 28 Wistar male rats from 2 inbred breeding lines (14 Crl:WI BR and 14 HsdBrlHan:WIST, 7 litters/line, 2 individuals/litter; ordered from Toxi-Coop Zrt, Hungary) in this study. The rats arrived on 24 May 2011 at an age of 6 weeks. Rats were separated into 4 same-line groups (i.e. the phase 1 groups), each containing 7 rats from different litters, and were housed and tested together.

Each rat was marked with a unique 3-color barcode on its back using nontoxic “Special Effects” hair dye in 5 distinctive colors (Red: Nuclear Red, Orange: Napalm, Green: Sonic Green, Blue: Londa color 0/88, Purple: 4 units Atomic Pink and 1 unit Wild Flower). These codes were applied/renewed every 3 weeks.

Ethical guidelines

The procedures comply with national and EU legislation and institutional guidelines. The experiments were performed in the animal facilities of Eötvös University, Hungary, and in accordance with Hungarian legislation and the corresponding definition by law (1998. évi XXVIII. Törvény 3. §/9.—the Animal Protection Act), which states that noninvasive studies on animals bred for research are allowed to be performed without the requirement of any special permission (PE/EA/1360-5/2018).

Method details

Experimental conditions and monitoring

Animals were housed in 4 compartments (sized 100 \(\times\) 125 \(\times\) 100 \(\hbox {cm}\)) with polypropylene covered wooden walls and sawdust changed weekly on a tiled floor (see Fig. S1). Their room was kept at a controlled temperature of 21 ± \(2\,^{\circ }\hbox {C}\), and with controlled light conditions featuring a daily cycle with 13h/11h dark/light. The dark (active) period was from 6 am to 7 pm with illumination at floor level \(\sim\) 3–4 lux; the light period followed from 7 pm to 6 am of the following day with illumination of 300 lux. We video recorded the compartment 24/7 using a low-light sensitive camera fixed to the ceiling (Sony HDR-AX2000, 2.9 \(\times\) 1.8 \(\hbox {m}\) field of view, 1920 \(\times\) 1080 resolution, 25 fps de-interlaced). Rats had ad libitum access to water and a shelter (nestbox), and access to food based on a fixed schedule for automated feeding. The feeding schedule followed a weekly cycle: 3 days (Sat, Sun, Mon) access to food 3 times for 1 h (at 6 am, noon, and 6 pm); 3 days (Tue, Wed, Thu) access to food 2 times for 1 h (at 6 am and 6 pm); and 1 day (Fri) access to food ad libitum between 6 am and 7 pm). The housing compartments were cleaned once a week.

We measured the weight of each individual three times a week (Mon, Thu, Fri; between 5 pm and 6 pm) and inspected their health. Some rats had injuries, and when a rat had a larger wound, we temporarily removed it until it was recovered. This happened on one occasion: \(\alpha 7\) was taken out from week 33 to week 36 of the experiment. One rat (\(\alpha 4\)) was permanently removed from the experiment due to weight loss at week 31 at the age of 37 weeks.

Individual and social interaction tests

These tests were performed at the end of the group measurement period, and included a total of 27 male rats at the age of 44 weeks. Subjects completed a test battery consisting of seven subtests examining fear-related and social behaviors in the following order (see descriptions below): black and white box, canopy, elevated plus-maze, social interaction test with outgroup conspecific, and social interaction test with in-group conspecific. The illumination during these tests was set according to the dark period mentioned above. Body weights were 480 ± 70g (mean ± SD) at the time of these tests. The behavioral tests were conducted on three consecutive days between 10 a.m. and 6 p.m. (26 to 28 Feb 2012). The apparatuses were constructed from plastic sheets and cleaned between tests. Depending on the test, either automated analysis was used to obtain trajectories or the behavior was coded by observers using the Solomon Coder software (beta 19.08.02). To ensure inter-observer reliability, pairs of observers overlapped in 20% of the behavioral tests they scored. Using this overlap, the inter-observer reliability was calculated using the intraclass correlation coefficient (ICC) for all variables except the video frame variables, and we found all observations to be reliable (ICC > 0.9).

  1. 1.

    Black and white box As described in Ramos et al.88, the apparatus had a black and white compartment (each sized 27 \(\times\) 27 \(\times\) 27 cm). The white compartment was strongly illuminated by a white bulb (\(\sim\) 825 lux), while the black compartment was illuminated with a red bulb (\(\sim\) 90 lux; Fig. S9A). The bulbs were positioned 37 cm above the apparatus floor. Each rat was initially placed in the center of the white compartment in a direction facing the opposing black compartment, and behavior was then recorded for 5 min. We tallied the number of video frames when the animal was in the (1) white compartment, (2) black compartment, or (3) at the border of the two areas and thus could not be clearly assigned (labeled as “both” in the data).

  2. 2.

    Canopy The apparatus consisted of a circular platform (104 cm in diameter) and a canopy (semitransparent red Plexiglas of 70 cm diameter) 10 cm above the platform (Fig. S9B). The mean illumination was 90 lux under the canopy and 400 lux outside of the canopy. At the beginning of the test, the animal was placed under the canopy. The test lasted for 5 min. We counted the number of video frames when the animal was (1) under the canopy, (2) in the exposed zone.

  3. 3.

    Elevated plus-maze Based on Ramos et al.88, the apparatus had four elevated arms (66 cm from the floor), 45 cm long and 10 cm wide (Fig. S9C). Two closed arms enclosed by a 50 cm high wall were located on opposing sides, and two open arms on the other two sides; the wall structure led to different illumination, with 25 lux for the closed arms and 65 lux for the open arms. The central platform (10 \(\times\) 10 cm) connected the four arms to allow access to any. Each rat was first placed in the central platform facing an open arm, and subsequently, behavior was recorded for 5 min. To describe behavior, we counted the number of video frames in which the animal was in the (1) closed arms, (2) open arms, and (3) central platform (labeled as “both” in the data).

  4. 4.

    Social interaction test with an unfamiliar (out-group) conspecific In an open field arena, we placed an unfamiliar adult male next to a focal rat that had been part of the long-term experiment. Two different unfamiliar rats were used for each phase 3 group (i.e. G1 rats were tested with unfamiliar rats 1 and 2, G2 rats with unfamiliar rats 3 and 4, etc.). The test apparatus was made out of glass, with a green floor of 74 \(\times\) 74 cm and transparent walls (\(\sim\)40 cm high; Fig. S9D). The unfamiliar rats were significantly younger and smaller than the focals (12-weeks-old and 360 ± 20g (mean ± SD)). We recorded the behavior for 10 min. The test was repeated with other rats (unfamiliar and familiar) after a break that lasted for on average 75 ±42 min. but a minimum of 35 min. The coded behavior included the following: duration of bipedal orienting stance (%), duration of self-grooming (%), duration of exploration, duration of sniffing non-genital body parts (%), duration of sniffing the genitals of the partner (%), number of steps on the partner, number of fights. The coded parameters were stored separately from the trials with unfamiliar rats 1 and 2 for each focal rat.

  5. 5.

    Social interaction test with a familiar (in-group) conspecific The social interaction was also performed with a familiar conspecific, chosen as a random groupmate from their phase 3 group. Each individual was measured with four randomly selected members from their group. The coded variables were the same as above, but they were averaged over the trials for each focal rat.

Quantification and statistical analysis

Data processing and behavioral metrics

We calculated metrics of space use from the individual trajectory data for each rat. Time at feeder is the fraction of total time spent at the feeder during nightlight (active period). Distance from wall is the average distance from the walls during nightlight. Home range is the area of an individual’s space-use heatmap during all times (number of bins where it was detected more than 10 frames per day, calculated using bins with a linear size of \(\sim\) 2 mm) (see Fig. 1C), normalized by total number of bins and frames counted. Top of nestbox is the fraction of total time spent on top of the nestbox/shelter area during nightlight.

An approach-avoidance (AA) event was defined for a given pair of individuals (\(i \ne j\)) if, for a 0.4s long time window, the time-averaged dot product of i’s velocity (\(v_i(t)\)) and the normalized relative direction vector pointing from i to j—a unit vector \({\hat{d}}_{ij}(t)={d}_{ij}(t)/|{d}_{ij}(t)|\), where \({d}_{ij}(t)=x_j(t)-x_i(t)\) is the relative position—were within predetermined thresholds for both individuals. The thresholds used were \(AA_{ij}=\langle {(v_i(t) \cdot {\hat{d}}_{ji}(t)} \rangle _t>0.8\) for the approacher, and \(AA_{ji}<-0.5\) for the avoider. In addition, we used the requirement that i and j were within 40 cm of each other (\(|d_{ij}(t)| \le d_{max} =40\) cm) and both i and j were moving at speeds of at least 0.25 m/sec (\(|v(t)| \ge v_{min} =0.25\) m/sec).

We use the approach-avoidance event network to quantify the social interaction structure in each group. The values \(A_{ij}\) are the number of times rat i dominated approach-avoidance events with rat j. Using this, the number of events rat i dominated is \(w_i = \sum _{j}A_{ij}\), and the number of events lost is \(l_i = \sum _{j}A_{ji}\). The fraction of events dominated for rat i is then

$$\begin{aligned} f_i = \frac{w_i}{w_i+l_i}. \end{aligned}$$
(1)

Reaching centrality is calculated using the normalized network of excess wins, \(W_{ij}\). This network has positive entries for the rat in a pair that dominated in more events and zero for the other rat, and is determined as

$$\begin{aligned} W_{ij} = \frac{1}{Z}max\left( A_{ij}-A_{ji}, 0\right) , \end{aligned}$$
(2)

where Z is a normalization factor, which we define so that the maximum entry of \(W_{ij}\) is equal to 1. This network is then provided as input to the networkx function local_reaching_centrality to calculate the local reaching centrality (LRC) for each individual, and to the function global_reaching_centrality to calculate global reaching centrality (GRC) for the group. Note that we set the flag normalized=False for calling these functions, because we use the definition in Eq. (2) where \(W_{ij}\) is already normalized. With this, the LRC and GRC scores are in the range of 0 to 1.

We used the EloRating package89 to calculate the individual David’s score, David’s score steepness (DSS), directional consistency index (DCI), and triangle transitivity index (TTRI). For the individual Elo scores, we used the EloChoice package89, which has an improved and more efficient implementation of the randomization of interaction sequences used to calculate the Elo score. We used the EloSteepness package90 to calculate the Elo score steepness (ESS). These R packages were integrated into our Python-based analysis code using rpy2.

Individual and social interaction assays

We used principal component analysis (PCA) to define the boldness and social interaction scores for each rat from the individual and pairwise assays.

Boldness score

The 8 videoframe variables calculated from the individual tests (black-and-white box, elevated plus-maze, and canopy test) were used to define the boldness score, which reflects the time spent in exposed portions of an unfamiliar environment. We converted each frame count to a fraction of the test time and normalized the input variables before applying PCA. The first component explains 44.7% of the variance, and positive projections onto this component represent more time in open areas (Fig. S10A). We used the projection of each rat onto the first component as the “boldness” score. For comparison, we also calculated fractions of open-area time for each test: fraction in the white area during black and white black box test (BWB-whitefrac = BWB-White/(BWB-White + BWB-Black)), fraction of time in open during the elevated cross test (ElevX-openfrac = ElevX-Open/(ElevX-Open + ElevX-Closed)), and fraction of time out during the canopy test (Canopy-outfrac = Canopy-OUT/(Canopy-Out + Canopy-Under)).

Social interaction score

We applied PCA to measures from the pairwise social interaction tests and used results to define a composite score related to social interaction, and additionally used the 2nd PCA component to compare a “self grooming” score. The variables included are duration of sniffing genitals (%), duration of sniffing nongenital body parts (%), duration of bipedal orienting stance (standing up) (%), number of steps on the partner, number of mating attempts, number of fights, duration of exploration (%), and duration of self grooming (%). The components shown in Fig. S10B were determined using data from the first test with an unfamiliar rat; the scores for the other tests with a familiar rat and a second unfamiliar rat were calculated by projecting the associated variables onto these components. The first PCA component represents interaction with the other rat, with positive projections indicating more interactions. We use this component as the “social interaction” score. The second PCA component is weighted most strongly by self grooming (positive) and exploration (negative)—we addtionally compare this component as the “self grooming” score.

Figure S13 shows a comparison of the scores, including the boldness score and measures from the individual tests, and the social interaction and self grooming scores from the first test with an unfamiliar rat as well as tests with a familiar rat and a second unfamiliar rat.