Extant mammals exhibit a wide diversity of grouping arrangements or social organizations, including solitary living, pair-living, and various forms of group-living1, e.g., multilevel society (e.g., Rhinopithecus spp.)2 and eusociality (e.g., Heterocephalus glaber)3. Mammals also show an extreme 100-fold variation in maximum lifespan (or longevity), ranging from ~2 years in shrews (e.g., Sorex spp.) to more than 200 years in bowhead whales (Balaena mysticetus)4. The evolutionary relationships between sociality and longevity in mammals are complex5 yet important for understanding evolutionary strategies, i.e., life history diversity across organisms. In mammals, most of the evidence for links between sociality and longevity comes from single species. For example, affiliative social bonds, which are pervasive among group-living species, can extend a species’ lifespan by decreasing mortality and enhancing health and survival outcomes. In humans, strong social relationships can reduce the risk of physiological dysregulation6. With respect to other mammals, female chacma baboons (Papio ursinus) with strong and stable social bonds live longer than those with weak connections7,8; similar results have been reported in rhesus macaques (Macaca mulatta)9. Conversely, a negative correlation between affiliative relationships and longevity has been reported in female yellow-bellied marmots (Marmota flaviventer)10. Even though a small number of cross-species studies have tested the association between sociality and aging or longevity, they primarily focused on eusocial species11,12,13 and cooperatively breeding species14,15. Therefore, it remains unclear whether associations between longevity and other types of social organization are a common feature across the mammalian phylogeny.

In addition, the molecular mechanisms underlying the evolutionary association between social organization and longevity are not fully understood. Previous studies have suggested some possible processes, e.g., stress reduction, parasite infections, and pace of life (fast-slow continuum)16. According to the stress-buffering hypothesis, strong social bonds or social support can reduce adverse environmental stimuli or stress and enhance health and longevity in humans17. Social organization can also influence the spread of parasites in the population. For example, group-living species are vulnerable to infectious diseases because of the high social contact rates and close social interactions among individuals, but social species may have evolved a strong immune defense to minimize disease risk and protect themselves against pathogens18. One more possible link between social organization and longevity is the pace of life, which reflects an organism’s strategic allocation of resources between survival and reproduction. Species with a fast life history are characterized by rapid development, high reproductive rates, and short lifespan, whereas species with a slow life history are characterized by slow development, low reproductive rates, and a long lifespan19. Given that sociality and fitness are positively associated in some mammals20,21,22 and social bonds require major time investments before they yield survival benefits, social bonds are expected to have evolved in species with a slow life history and a longer lifespan23. In summary, we are only beginning to understand the evolution and the molecular mechanisms underlying the diversity of social organizations24,25,26,27,28 and longevity29,30.

In this work, we compare models with different evolutionary conjectures between social organization (i.e., solitary, pair-living, and group-living)31 and longevity across ~1000 mammals using a Bayesian framework. Moreover, we conduct a comparative brain transcriptomic analysis of 94 mammals to detect candidate genes and pathways associated with social organization and longevity, after controlling for body mass, ecological factors, life history traits, and phylogenetic relationships. We show that group-living species lived longer than solitary species and identify 31 genes, hormones, and immunity-related pathways involved in the correlated evolution of social organization and longevity.


Evolutionary pathways for social organization and longevity

To assess the evolutionary transitions among social states and evolutionary pathways for longevity, we collected data on the social organization for as many extant mammalian species as possible through a comprehensive literature survey. We assigned 974 species into three types of social organization: solitary (n = 497), pair-living (n = 115), and group-living (n = 412). Fifty species had more than one state (Fig. 1a, details of classification in “Methods”). Data on body mass and longevity (defined by the maximum lifespan of a given species) for these species were also collected (Supplementary Table 1 and Supplementary Data 1). We first used phylogenetic comparative methods to calculate the phylogenetic signal of social organization and longevity. The maximum likelihood estimates of Pagel’s λ for the three social states was 0.94 when taking into account social polymorphism for a given species (phylogenetic signal test: n = 974, logλ = −788.03, log0 = −1344.67, P < 0.001) and was 0.94 when using the uni-state species subset (n = 924, logλ = −535.38, log0 = −938.78, P < 0.001). Pagel’s λ for longevity was 0.97 (n = 974, logλ = 319.18, log0 = 1475.26, P < 0.001), illustrating that closely-related taxa generally have similar social organizations and longevity.

Fig. 1: Evolutionary analyses of social organization and longevity in 974 mammalian species.
figure 1

a Phylogenetic distribution of social organization, adult body mass, and longevity (n = 974). The inner circle represents a species’ social organization: solitary (blue), pair-living (orange), and group-living (red). The middle layer indicates the absolute adult body mass (g) and the outer layer indicates the longevity (years); both variables were log10 transformed. Colors of shadings distinguish different mammals’ orders. b Difference in absolute longevity and c relative longevity (residuals of longevity, which was adjusted for body mass) across the three social organization states (solitary: n = 491, pair-living: n = 65, group-living: n = 368). We accounted for the effects of phylogenetic non-independence among species using a Phylogenetic ANOVA. Two-sided and Hommel method adjust P values are reported. The white dot represents the median in two violin plots, and the black box represents interquartile ranges (IQRs), i.e., the 25th and 75th percentiles. The whiskers extend up to the largest value within 1.5-fold IQR. Species numbers (n) are indicated in each social organization, respectively. Correlated evolution analysis for absolute short-lived (cyan) or long-lived state (purple): d non-solitary or solitary (blue); e non-pair-living or pair-living (orange); f non-group-living or group-living (red). d and f demonstrate correlated evolution. The number of species used in the analyses was n = 974. Arrows depict the likelihood of a transition between states, and their thickness corresponds to the magnitude of the various rates. Numbers indicate the transition rate across ten independent runs, and data are presented as mean ± SD. Silhouette images of animals are from PhyloPic database []. Source data are provided as a Source data file.

To determine the evolutionary pathways connecting the three states of social organization within mammals, we tested four models that allowed the transition rates between any of two states to vary differently. These four alternative models were: (a) the equal rates model (ER), in which all transition rates were the same; (b) the increasing complexity model (IC), which allowed transitions between solitary and pair-living, pair-living and group-living, but not solitary and group-living; (c) the all-rates-different model (ARD) or parameter-rich model24, in which all transition rates were different; and (d) the reversible-jump Markov chain Monte Carlo-derived model (RJ-MCMC), which is derived from the data using the reversible-jump procedure in Bayes Traits and has the highest posterior support32 (Supplementary Fig. 1a–d, Supplementary Table 2). Model comparisons showed that the ARD model was the best-supported model, which was significant against the RJ-MCMC model (Log BF = 9.24), the ER model (Log BF = 33.36), and the IC model (Log BF = 71.70) (Table 1). The ARD model showed that the transition rates varied across the three states of social organization (Supplementary Fig. 2a). For example, the transition rate from pair-living to solitary was 14 times higher than from solitary to pair-living (qpair-living-solitary = 4.00 ± 1.55 × 10−3; qsolitary-pair-living = 0.29 ± 1.50 × 10−4), suggesting that the pair-living state was relatively unstable.

Table 1 Comparison of evolutionary models for social organization and longevity

We then reconstructed the evolutionary pathway of longevity by constructing three alternative models (i.e., ER model, ARD model, and RJ-MCMC model) (Supplementary Fig. 1e–g). Comparisons of these three models showed that the RJ-MCMC and ARD models were better supported, on the basis of Bayes factors (BF), than the ER model (Table 1 and Supplementary Table 3). The average transition rate from a long-lived state to a non-long-lived state (qabsolute = 2.09 ± 3.85 × 10−4; qrelative = 1.71 ± 6.23 × 10−4) was about four times greater than that from a non-long-lived to a long-lived state (qabsolute = 0.54 ± 1.42 × 10−4; qrelative = 0.53 ± 1.13 × 10−4) (Supplementary Fig. 2b, c).

Correlated evolution of social organization and longevity

We conducted phylogenetic ANOVA analyses to estimate differences in longevity among the three states of social organization while controlling for phylogenetic non-independence among species. Longevity was significantly different between the solitary state and the group-living state, with group-living species showing higher longevity than solitary species (phyloAVOVA: nmulti-states = 974, t = 12.40, P-adjust = 0.04; nuni-state = 924, t = 12.28, P-adjust = 0.02, Fig. 1b). Since longevity is correlated with adult body mass (Spearman’s rank test: r = 0.71, P < 2.20 × 10−16), we also measured relative longevity which was calculated using the body mass adjusted residuals with the equation from the AnAge database (“Methods”). Similar results were obtained for relative longevity (phyloAVOVA: nmulti-states = 974, t = 12.01, P-adjust = 4.80 × 10−2; nuni-state = 924, t = 11.94, P-adjust = 0.02, Fig. 1c). In addition, we conducted MCMCglmm models to control for phylogeny, body mass and factors related to external mortality: activity (diurnal, nocturnal or others), lifestyle (terrestrial, aerial, arboreal, semi-arboreal, freshwater, marine, or terrestrial-marine), and fossoriality (non-fossorial or subterranean). The results consistently showed that pair-living or/and group-living species lived longer than solitary species when using multi-states of the social organization dataset (MCMCglmm: nmulti-states = 947, pair-living vs. solitary, post mean = 0.10, pMCMC = 1.11 × 10−3; group-living vs. solitary, post mean = 0.06, pMCMC < 6.00 × 10−4; pair-living and group-living, post mean = 0.06, pMCMC = 0.03) and uni-state of the social organization dataset (nuni-state = 897, pair-living vs. solitary, post mean = 0.10, pMCMC < 6.00 × 10−4; group-living vs. solitary, post mean = 0.06, pMCMC = 1.11 × 10−3). The results of activity, lifestyle and fossoriality are shown in Supplementary Table 4.

To evaluate whether changes in longevity depended on social organization, we compared the independent and dependent RJ-MCMC models of three combinations of variables: non-solitary/solitary and absolute short-lived/long-lived (>26 years), non-pair-living/pair-living and absolute short-lived/long-lived as well as non-group-living/group-living and absolute short-lived/long-lived. The results favored the dependent model for both solitary (Log BF = 3.18, Table 2 and Supplementary Table 5) and group-living (Log BF = 9.58, Table 2 and Supplementary Table 6), suggesting the existence of correlated evolution between social organization and longevity across the mammalian phylogeny. We also considered the effect of taxonomic sampling and the different classifications of long-lived species on the correlated evolution analyses. Random taxon sampling (i.e., randomly selecting 50 to 95% of the total number of species at a 5% interval) and repeated comparisons of independent and dependent RJ-MCMC models provided further support for the correlated evolution of solitary living and longevity (>26 years, 51% of model comparisons), as well as group-living and longevity (>26 years, 63% of model comparisons, Supplementary Fig. 3a). Consistent with these findings, random taxon sampling and model comparisons with two different cut-offs for long-lived species (>17 or >35 years), suggested the correlation between social organization and longevity. There was a correlation between solitary living and longevity when using the 17-year cut-off (79% of model comparisons) and 35-year cut-off (52% of model comparisons); and also a correlation between group living and longevity when using the 35-year cut-off (98% of model comparisons, Supplementary Fig. 3b, c). In addition, when taking the uncertainty of phylogenetic relationships into account and using a different phylogenetic tree33, the correlated evolution between solitary and longevity was supported by the analyses with the 26-year and the 17-year cut-off; the correlated evolution between group-living and longevity was supported by the analyses with the 17-year and 35-year cut off (nmulti-states = 969; Supplementary Table 7).

Table 2 Likelihoods of dependent and independent models estimated for the correlated evolution of social organization and longevity

We then attempted to determine whether transitions to a long-lived state were more likely in group-living than solitary species. Model estimation revealed that the transition rate from a short-lived state to a long-lived state was higher for non-solitary than solitary species (qnon-solitary = 12.44 ± 1.84 × 10−3; qsolitary = 2.77 ± 3.79 × 10−3), and higher for group-living than non-group-living species (qgroupliving = 11.86 ± 1.55 × 10−3; qnon-group-living = 2.86 ± 9.89 × 10−4; Fig. 1d–f). This result is consistent with the prediction that group-living species are more likely to be long-lived. We then tested if transitions to a group-living state were different for long-lived and short-lived species; we found that the transition rate from a solitary to a non-solitary state was the same in long-lived species and short-lived species (qlong-lived = 12.47 ± 2.52 × 10−3; qshort-lived = 12.44 ± 2.37 × 10−3; Fig. 1d). The transition rate from a non-group-living state to a group-living state was also the same in long-lived species and short-lived species (qlong-lived = 11.86 ± 1.53 × 10−3; qshort-lived = 11.86 ± 1.54 × 10−3; Fig. 1f), suggesting that longer longevity does not promote the formation of group-living. In addition, the correlated evolution of social organization and longevity was also supported when body mass was taken into account (Table 2; residuals of longevity > 1.38, solitary: Log BF = 17.68; group-living: Log BF = 8.28, Supplementary Fig. 4a–c). The random taxon sampling of different classifications of relative long-lived species further supported the correlated evolution between solitary living and longevity (residuals of longevity >1.38, 92% of model comparisons; residuals of longevity >1.83, 98% of model comparisons; Supplementary Fig. 5a–c), and the correlated evolution between group living and longevity (residuals of longevity >1.38, 55% of model comparisons; residuals of longevity >1.83, 95% of model comparisons; Supplementary Fig. 5a–c). Similarly, when a different phylogenetic tree and different cut-offs of the residuals of longevity were used, the correlated evolution between social organization and relative longevity was also supported (nmulti-states = 969; Supplementary Table 7).

In addition, to investigate whether longevity favors any social organization transformation, we compared the independent and dependent RJ-MCMC models using species with a uni-state of social organization. The results supported that social organization transformation favors longer life during solitary transit to the group-living state rather than from solitary transit to pair-living, or from pair-living transit to the group-living state (Supplementary Table 8). The transition rate from short-lived to long-lived species was higher in group-living than solitary species (absolute longevity: qsolitary = 2.98 ± 6.97 × 10−3; qgroup-living = 10.45 ± 5.29 × 10−3; relative longevity: qsolitary = 3.37 ± 8.70 × 10−3; qgroup-living = 10.30 ± 9.51 × 10−3, Supplementary Fig. 6a–f).

Gene expression of social organization and longevity

To identify genes that could underpin the correlated evolution of social organization and longevity, we generated brain transcriptomics of 94 mammals belonging to 14 orders, 39 families, and 67 genera (Fig. 2a, Supplementary Data 2, “Methods”). Specifically, 57% of species and 62% of samples (166 samples of 54 species) were newly collected in this study. The sampled species were assigned to three states of social organization (solitary, n = 26; pair-living, n = 11; group-living, n = 65); eight species had more than one state. The sampled species also covered a longevity range from 3.2 years in Chinese mole shrew (Anourosorex squamipes) to 122.5 years in Homo sapiens (Fig. 2a, Supplementary Data 3, and Supplementary Table 9). Using the human coding sequences as a reference, we employed a reciprocal-blast approach to identify the orthologous gene set. The orthologous genes that were shared by >70% of the total number of species (i.e., 66 of 94 species) were selected for subsequent analyses (“Methods”). Finally, gene expression for 13,402 orthologous genes was measured across all brain samples. We then used MCMCglmm models to identify genes whose expression significantly correlated with any of the social organization states; these models also controlled for phylogenetic relationships and other confounding factors, including adult body mass, activity (nocturnal, diurnal, and other), diet (carnivore, herbivore, and omnivore), and lifestyle (non-aerial and aerial). Hundreds of genes were significantly associated with solitary living (up: 366 genes, down: 254 genes), pair-living (up: 393 genes, down: 66 genes), and group-living (up: 162 genes, down: 321 genes) (Supplementary Fig. 7a–f, Supplementary Data 4). There were three overlapping genes among the three states of social organization: ATP1A2, ALDH1L2, and WDFY1. We also detected genes that were shared by two states (solitary-pair-living: 21 genes; solitary-group-living: 284 genes; pair-living-group-living: 14 genes, Supplementary Fig. 7f). We detected 262 genes whose expression was significantly correlated with longevity; this was supported by all four different models in the MCMCglmm analyses (“Methods”, Supplementary Fig. 8, Supplementary Data 5; see Supplementary Data 6 for the results of each model).

Fig. 2: Genes and pathways whose expression was correlated with social organization and longevity in 94 mammalian species.
figure 2

a Species (n = 94) with RNA-seq and six life history traits (social organization, activity, diet, lifestyle, adult body mass, and longevity) used in the MCMCglmm analyses. Colorful shadings display different mammals’ orders. Silhouette images of animals are from PhyloPic database []. b Venn diagram showing the number of significant overlapping genes across solitary, pair-living, group-living, and longevity. c Clusters according to the function of 31 genes that were significantly associated with social organization and longevity. Each node represents a pathway from the Reactome database34, 35; pathways in which significant genes were involved are colored brown. Light green shading represents two clusters of gene function. d, e Example of a significant gene (XRCC6) that was downregulated in solitary species, upregulated in group-living species and also positively correlated with lifespan; the regression lines were generated from the linear regression model. Purple range display 95% confidence interval around the smooth line. Coefficients (post mean) and P-values (pMCMC) from the MCMCglmm analyses are also shown. The number of species used in the MCMCglmm was n = 94. f A heat map showing pathways that were significantly associated with social organization and longevity. S: solitary; PL: pair-living; GL: group-living; ML1–ML4: longevity in model 1 to model 4 (see “Methods”). Color code for social organization and longevity: blue = solitary; orange = pair-living; red = group-living; purple = lifespan. Source data are provided as a Source data file.

In total, we found 31 genes whose expression levels were significantly associated with both social organizations and longevity (Fig. 2b, Supplementary Data 7). The pathway topology analysis of these genes using the Reactome platform34,35 revealed two strong clusters. The first cluster included immune-related genes (Fig. 2c). Nine genes (i.e., UBL7, TNNT3, XRCC6, ATP2A2, NPHS1, KALRN, C1QC, MCL1, and ZFP36) that are involved in the innate immune response36. Gene C1QC (MCMCglmm: solitary, post mean = 1.16, pMCMC = 0.04; longevity, post mean = −2.33 ± 0.05, pMCMC = 0.03 ± 7.17 × 10−3) participates in encoding a complex heterotrimer C1q that plays a vital recognition role in the complement pathway. C1q has diverse biological functions, including complement activation, innate immune defense, cellular regulation, reproduction, development and neurodegenerative disorders37,38. The well-known immune gene ZFP36 (MCMCglmm: solitary, post mean = 1.07, pMCMC = 0.02; longevity, post mean = 2.21 ± 0.07, pMCMC = 0.02 ± 5.05 × 10−3) modulates anti-viral immunity by controlling T cell activation39 and protects against inflammatory diseases through regulating inflammatory cytokines, such as TNF-α40,41. ZFP36 also plays a role in neuroprotection and inhibits neuronal apoptosis42. Another gene of interest was XRCC6 (MCMCglmm: solitary, post mean = −1.50, pMCMC = 3.33 × 10−3, Fig. 2d; group-living, post mean = 1.39, pMCMC = 4.44 × 10−3, Fig. 2e; longevity, post mean = 2.27 ± 0.05, pMCMC = 0.01 ± 4.37 × 10−3), which encodes subunit p70 of the p70/p80 autoantigen. A recent study has shown that a splicing variant in XRCC6 may cause autism, a disorder that causes significant social, communication, and behavioral challenges43. Knockout of XRCC6 decreases lifespan in mice44 and the high expression of XRCC6 leads to a longer average lifespan in humans45. Thus, this gene likely plays a role in both longevity and social organization.

The second cluster of genes whose expression was correlated with longevity and social organization consisted of genes involved in the regulation of hormones, neural systems, and signal transduction (Fig. 2c), e.g., MTM1, SLC29A2, ATP2A2, KALRN, RHOBTB2, SLC6A19, and MCL1. Some of these genes are suggested to play a role in social behavior. For instance, the KALRN gene (MCMCglmm: group-living, post mean = −1.16, pMCMC = 0.02; longevity, post mean = −2.40 ± 0.11, pMCMC = 0.02 ± 6.93 × 10−3) produces several alternatively spliced forms of kalirin, which is essential for synaptic connections, spine development, cognition, learning, fear conditioning and social behavior46. Knockout of this gene in mice caused working memory deficits, locomotor hyperactivity and reduced social behavior47,48. Gene SLC29A2 (MCMCglmm: pair-living, post mean = −1.65, pMCMC = 0.02; longevity, post mean = 2.30 ± 0.05, pMCMC = 0.04 ± 3.18 × 10−3) is linked to the development of depression. Knockout of the ATP2A2 (MCMCglmm: pair-living, post mean = −1.60, pMCMC = 0.02; longevity, post mean = 2.42 ± 0.21, pMCMC = 0.03 ± 0.02) impaired fear memory and changed behaviors in novel environments49. Nonetheless, the contribution of these genes to longevity is currently unknown and worthy of further exploration.

To gain an overall view of gene expression related to social organization and longevity, we employed a modified summary statistic approach (i.e., the polysel method, “Methods”). This approach identifies pathways that show accumulated correlation rather than outlier genes50,51. The sum of the posterior means (generated from MCMCglmm models) of the genes in each pathway was calculated as the SUMSTAT score and compared to a null distribution of random gene sets. We found 56, 56, and 45 pathways showing significant correlations with solitary, pair-living, and group-living species compared with non-solitary, non-pair-living, and non-group-living species, respectively (Supplementary Fig. 9, Supplementary Data 8). We also identified 14 longevity-associated pathways that occurred in four models (Supplementary Fig. 10, Supplementary Data 9; see Supplementary Fig. 11 and Supplementary Data 10 for the results of each model). A total of 10 pathways showed accumulated correlations with both social organization and longevity (Fig. 2f, Supplementary Data 11). Among them, the hormones-related pathway “G-protein-coupled receptors (GPCRs), class B secretin-like” was positively associated with both solitary living and longevity, but negatively associated with group living (polysel: solitary, score = 6.18, P = 4.71 × 10−2; group-living, score = −6.04, P = 3.96 × 10−2; longevity, score = 7.15, P = 3.83 × 10−2). The secretin-like family of GPCRs include receptors for polypeptide hormones, such as secretin, parathyroid hormone and vasoactive intestinal peptide, which play vital roles in physiological homeostasis, nervous diseases, the stress response and longevity52,53,54. Acting as a catalyst in steroid hormone synthesis55, the cytochrome P450-related pathway has been enriched (“drug metabolism - cytochrome P450”, polysel: pair-living, score = 18.52, P = 1.40 × 10−4; longevity, score = 16.00 ± 0.04, P = 0.02 ± 1.06 × 10−3). The mutation of cytochrome P450 has been shown to increase longevity in Caenorhabditis elegans56,57. In addition, cytochromes P450 regulate inflammation and infection58 and the generation of eicosanoids (the “eicosanoid synthesis” pathway, polysel: solitary, score = 7.94, P = 7.71 × 10−3; group-living, score = −9.07, P = 7.77 × 10−4; longevity, score = 8.11, P = 4.68 × 10−2). Eicosanoids have a broad range of functions, including reproduction, physiological homeostasis, and cell growth regulation; in particular, they play a role in regulating immune response and inflammatory processes in various diseases59,60,61,62. Another immunity-related pathway “immunoregulatory interactions between a lymphoid and a non-lymphoid cell” was negatively correlated with longevity (polysel: longevity, score = −24.20 ± 0.28, P = 7.78 × 10−3 ± 1.24 × 10−3). Interestingly, this pathway is downregulated in solitary species, but upregulated in group-living species (solitary, score = −11.19, P = 0.01; group-living, score = 11.99, P = 5.52 × 10−3), and may be an immune response to elevated pathogen transmission among hosts and infectious disease risks in a gregarious setting. Taken together, both the function annotation of overlapping genes and the gene set enrichment analysis of all genes identified the hormones and immunity processes underlying the association between social organization and longevity.

Selection features of social organization and longevity

Whether social organizations or longer lifespans are under selection remains controversial63,64. To characterize the selection features of social organization and longevity, we used RELAX65 to estimate the selection coefficients (K) for orthologous genes under different states of social organization (solitary, pair-living, and group-living) and longevity (long-lived vs. short-lived) (Fig. 3a–d, Supplementary Data 12, Supplementary Data 13). In solitary branches, genes mostly experienced intensified selection (5448 genes, K > 1, likelihood ratio test (LRT) P < 0.05) rather than relaxed selection (3200 genes, K < 1, LRT P < 0.05, Fig. 3a). We identified 3747 genes that showed evidence of intensified selection and 4589 genes that showed evidence of relaxed selection in the pair-living state (Fig. 3b). A larger number of genes associated with group-living experienced relaxed selection (5570 genes, K < 1, LRT P < 0.05) than intensified selection (3170 genes, K > 1, LRT P < 0.05, Fig. 3c). Longevity appeared to be under intensified selection (Fig. 3d), as more genes were subjected to intensified selection (5364 genes, K > 1, LRT P < 0.05) than relaxed selection (3564 genes, K < 1, LRT P < 0.05) in the long-lived state. Moreover, a larger number of genes that experienced intensified selection for longevity were found under intensified selection in solitary rather than group-living species (Pearson’s chi-squared test: χ2 = 527.96, df = 1, P < 0.001). By contrast, a greater number of genes under relaxed selection in the long-lived state also experienced more relaxed selection in group-living species than solitary species (Pearson’s chi-squared test: χ2 = 430.42, df = 1, P < 0.001). These results suggest that the long-lived state in group-living mammals involves relaxation selection.

Fig. 3: Cross-talk between expression and selection in social organization and longevity.
figure 3

The pattern of selection characterizing social organization identified with a RELAX analysis (species: n = 94): a solitary species, P = 1.95 × 10−7, b pair-living species, P = 5.89 × 10−6, c group-living species, P = 5.81 × 10−7, and d longevity, P = 6.76 × 10−7. P value was calculated using Likelihood-ratio test (LRT). K < 1 indicates relaxed selection and K > 1 indicates intensified selection. Genes were under purifying selection when dN/dS < 1 and positive selection when dN/dS > 1. Arrows represent the direction of change in dN/dS. The median value of the proportion of sites is shown with a bar plot. Social organization and longevity are colored as follows: solitary: blue; pair-living: orange; group-living: red; short-lived state: cyan; and long-lived state: purple. e A heat map of significant pathways that overlapped between social organization and longevity. S: solitary; PL: pair-living; GL: group-living; ML: maximum lifespan. f A Venn diagram showing the number of genes with changes in expression levels and selection among solitary, pair-living, group-living, and longevity. Examples of genes that were associated with expression and selection in both social organization and longevity: g gene XRCC6 was under relaxed selection in group-living species (P < 1.00 × 10−17) and h long-lived species (P = 1.06 × 10−5). P value was calculated using Likelihood-ratio test (LRT). The interpretation of K and dN/dS is the same as (ad). The number of species used in RELAX analyses was n = 94. i Network of pathways showing correlations with expression (red round) and selection features (purple round) in both social organization and longevity. The circle size represents the number of genes in this pathway. The thickness of connective lines displays the number of shared genes between two pathways. Source data are provided as a Source data file.

We then employed the same pathway enrichment approach as described above (polysel method) using the selection coefficient (K) of each gene as the statistic. For the trait social organization, we identified 132 pathways under significant intensified or relaxed selection (solitary: 50 pathways; pair-living: 56 pathways; group-living: 53 pathways; P < 0.05, Supplementary Data 14; Supplementary Fig. 12a). We detected more intensified pathways than relaxed pathways in solitary species (34 vs. 16) and pair-living (36 vs. 20), but fewer intensified than relaxed pathways in group-living species (21 vs. 32). One pathway was shared among all three states of social organization (“Spliceosome, U2-snRNP” pathway), and a few pathways were also identified in two states (i.e., solitary-pair-living: 2; solitary-group-living: 12; pair-living-group-living: 4). For long-lived species, we also detected signatures of intensified selection in 40 pathways and relaxed selection in 23 pathways (Supplementary Fig. 12b, Supplementary Data 15). In total, 20 overlapping pathways were detected under selection for both social organization and longevity (Fig. 3e, Supplementary Data 16); however, most of these pathways did not show an identical trend in the selection force. For example, “B cell receptor signaling pathway” showed accumulated relaxed selection in group-living species, but intensified selection in long-lived species (polysel: group-living, score = −21.03, P = 3.04 × 10−3; longevity, score = 36.61, P = 2.42 × 10−3); “glycosphingolipid biosynthesis-lacto and neolacto series” experienced accumulated intensified selection in group-living species, but relaxed selection in long-lived species (polysel: group-living, score = 5.37, P = 0.03; longevity, score = −3.07, P = 0.04). These findings suggest that even though common pathways can be utilized by natural selection for longevity and social organization, the underlying molecular mechanisms and regulatory approaches are different.

Cross-talk between gene expression and selection

We further discovered eight genes with changes in their expression levels and selection for both social organization (Supplementary Fig. 12c–e) and longevity (Supplementary Fig. 12f), i.e., SHKBP1, MTM1, XRCC6, UBL7, VWA5A, PUS3, MCL1, and COX7A1) (Fig. 3f, Supplementary Data 17). In particular, gene XRCC6 was not only identified in the gene expression analyses (see above), but also experienced selection in solitary, group-living and long-lived species (RELAX: solitary, K = 2.58, P = 4.34 × 10−13; group-living, K = 0.31, P = 1.00 × 10−17, Fig. 3g; longevity, K = 0.70, P = 1.06 × 10−5, Fig. 3h). MTM1 was upregulated in both solitary and long-lived species (MCMCglmm: solitary, post mean = 1.38, pMCMC = 0.03; longevity, post mean = 2.55 ± 0.21, pMCMC = 0.03 ± 0.01). This gene was also under intensified selection in solitary, but relaxed selection in long-lived species (RELAX: solitary, K = 2.52, P = 1.00 × 10−17; longevity, K = 0.52, P = 2.05 × 10−7). Loss-of-function of MTM1 leads to a genetic neuromuscular disorder, X-linked centronuclear myopathy66, as evidenced by a decreased lifespan in knockout mice67. Another gene, MCL1, was upregulated (MCMCglmm: pair-living, post mean = 1.31, pMCMC = 0.04) and under relaxed selection in pair-living species (RELAX: K = 0.54, P = 2.97 × 10−5). In addition, MCL1 was negatively associated with longevity (MCMCglmm: post mean = −2.33 ± 0.13, pMCMC = 0.04 ± 8.48 × 10−3) and experienced intensified selection in long-lived species (RELAX: K = 2.28, P = 4.55 × 10−6). As a notable member of the anti-apoptotic Bcl-2 family, MCL1 can regulate cell cycle, cell proliferation, and DNA damage repair, which may contribute to longevity68. MCL1 is critical for neuronal development69, where the loss of MCL1 leads to apoptosis of neuronal progenitors70.

One pathway was correlated with the expression and selection features for both social organization and longevity: “sulfur relay system” (polysel: expression in solitary, score = −4.17, P = 0.02, in pair-living, score = 5.37, P = 0.03 and in longevity, score = 6.81 ± 0.20, P = 0.02 ± 3.41 × 10−3; selection in solitary, score = 3.40, P = 0.03 and in longevity, score = 4.26, P = 0.03). The sulfur relay systems are involved in the complex process of trafficking and delivery of sulfur, which is an essential element for living organisms and a component of major biomolecules71,72. For example, sulfur-containing nucleosides in tRNA molecules have diverse functions, including stabilization of tRNA structure, proper codon-anticodon base pairing, and insurance of accurate and efficient translation73,74. Sulfur-containing modification at tRNA position 34 has revealed a biological role of sulfur in growth, oxidative stress, and metabolic cycles in yeast73. The lack of this modification can cause myoclonic epilepsy with ragged-red fibers (MERRF), which clinically manifests as cerebellar ataxia in humans75. Besides, only one pathway whose expression was associated with both social organization and longevity was under significant selection in social organization, but not longevity: “GPCRs, class B secretin-like.” In addition, five pathways that were under selection for both social organization and longevity also showed significant changes of expression levels in social organization (i.e., “tight junction interactions,” “axon guidance”) or longevity (i.e., “SRP-dependent cotranslational protein targeting to membrane,” “mitochondrial protein import,” “GPCRs, class A rhodopsin-like”). Among them, “GPCRs, class A rhodopsin-like” (polysel: expression in longevity, score = 42.45 ± 2.02, P = 4.86 × 10−3 ± 1.25 × 10−3; selection in solitary, score = −37.57, P = 2.50 × 10−6 and in longevity, score = 29.16, P = 1.27 × 10−3) is the largest group of GPCRs, representing members such as light, hormones and neurotransmitter receptors76,77. These receptors are associated with regulation of neuroendocrine function, sleep-wake cycle, energy metabolism, feeding, anxiety, and stress responses78,79. Since mutations in class A GPCRs can lead to a large number of diseases, including depressive disorders, schizophrenia, and bipolar disorder, they also serve as drug targets in humans80,81. Another pathway, “axon guidance” (polysel: selection in pair-living, score = 33.48, P = 6.17 × 10−4 and in longevity, score = 22.66, P = 0.04), is key to brain development and neural circuit formation82. In addition, the “tight junction interactions” pathway (polysel: selection in solitary, score = −7.03, P = 0.02; in group-living, score = 9.27, P = 1.25 × 10−4 and in longevity, score = 8.20, P = 0.01) regulates cell-cell communication and cellular growth, development, differentiation, and pathogen infection83. Dysregulation of tight junctions not only increases the entry and spread of viruses or bacteria84, but also affects age-related neurodegenerative disorders85. Similarly, “SRP-dependent cotranslational protein targeting to membrane” (polysel: expression in longevity, score = 38.39, P = 0.02; selection in solitary, score = −26.09, P = 2.45 × 10−4 and in longevity, score = 37.12, P = 2.50 × 10−6) regulates viral infections86, but its function in longevity remains unclear. Very few pathways were shared by social organization and longevity in the gene expression and selection analyses. This finding points to a fine-tuned network (e.g., Fig. 3i), which involves beneficial mutations and changes of expression in different, but functionally connected genes or pathways, and is a favored approach to maintain the plasticity and stable evolution of social organization and longevity.


In this study, we provide evidence for the correlated evolution of social organization and longevity across the mammalian phylogeny and show that group-living species lived longer than solitary species. There was no significant difference in longevity between pair-living and group-living species, or between pair-living and solitary species, suggesting that pair-living alone is unable to mediate lifespan extension despite it can generate an association between a pair of individuals. Long lifespan favored by group-living species may be because group living reduces extrinsic mortality by limiting the risks of predation and starvation, and the strong and stable social bonds formed among group members have the power to enhance longevity8,9,87. These benefits are expected to override the costs inherent in group living, such as competition for mating partners and food, stress from higher-ranking individuals, and the spread of infectious diseases via social contacts88. Another explanation for the correlated evolution is that kin selection may be a driver of longevity89. Group living leads to the co-residence of males or females and sex-specific philopatry. Preferential association among kin can influence coalition formation90, cooperative breeding91, parallel dispersal92, and the establishment of social hierarchies93, which further enhance individual health or offspring survival7,93,94 and ultimately increase an individual’s or their relatives’ evolutionary fitness95. The length of a lifespan may be affected by inclusive fitness benefits. For example, to maximize the rate of offspring survival, the lifespan of parents or grandparents may be extended to allow for the provision of parental care or even grandparental care to offspring16,96.

The transcriptomic features associated with the correlated evolution of social organization and longevity indicated that hormonal regulation and immunity constitute the mechanistic foundation for the association between social organization and longevity. Peptide hormones (e.g., growth hormone, insulin-like growth factor-1, and insulin) have also been shown to perform crucial roles in aging and longevity. For example, defects in growth hormone production extend longevity in Snell dwarf mice97. Reduced insulin-like growth factor-1 signaling has been shown to increase lifespan in mice, fruit flies, yeast, and worms98,99. Other types of hormone, steroids (e.g., testosterone, estradiol, and progesterone), control a range of social behaviors, including copulatory behavior, aggression, grooming behavior, and paternal behavior100,101. Specifically, neuroactive steroids produced in the nervous system and their receptors also play a role in the regulation of learning capacity, memory, decision-making, and depression102. A number of studies have demonstrated that steroid hormones also regulate lifespan; for example, the inhibition of sulfatase increases lifespan in C. elegans103.

The immunity or inflammation pathways and genes identified in this study support the view that immunity is instrumental to the correlated evolution of sociality and longevity104. Social organizations can affect immune responses. For example, in captive group-living long-tailed macaques (Macaca fascicularis), a higher rate of affiliation enhances an individual’s immune response105. In contrast, social isolation or a limited number of social ties can activate neuroendocrine regulation, accumulated inflammation burden, and impair immune function106,107. Moreover, several lines of evidence have demonstrated the effects of immunity and inflammation on social behavior, fitness, health and lifespan of social mammals. For example, interleukin-17a (IL-17a), a well-described mediator in inflammatory diseases, can rescue sociability deficits in offspring mice exposed to maternal immune activation by directly affecting their neuronal activity108. Immunity is also linked to reproductive behavior and thus indirectly affects the fitness of an individual109. A recent study has shown that male mice avoid mating when a female mouse is unhealthy110. Research has also shown that age-related changes in immunity, such as inflammatory markers interleukin-6 (IL-6) and TNF-alpha, increase with age in humans111 whereas the proportion of naive CD4 T cells in the blood declines with age in wild Soay sheep (Ovis aries)112.

We assumed that solitary species are generally less social than pair-living species, and both are less social than group-living species. However, mammal societies vary enormously in individual composition, size, patterns of parental care, cooperation, social relationships, and spatiotemporal dynamics of group members. Although some studies have provided conceptual frameworks and indices to quantify sociality or social complexity31,113, a consensus and more accurate measurements that could be used in large-scale comparative studies are needed. The accumulation of long-term field data on variables such as relatedness, affiliative relationships, social network cohesion, cooperation, and agonistic relationships among individuals will solidify our understanding of the evolutionary interplay between sociality and longevity. In summary, our study provides insights into the correlated evolution of social organization and longevity and serves as a basis for experimental validation and follow-up studies on the mechanistic drivers of this correlated evolution.


Data collection and compilation

All animal care and research protocols of this study were approved by the Institute of Zoology, Chinese Academy of Sciences (No. IOZ-IACUC-2021-129). We compiled data on various mammalian traits, including life history, social system, behavior, and habitat. These data were obtained from the literature114,115,116,117,118, reviews25,26, and databases, such as PanTHERIA119, PHYLACINE120, and AnAge4. The last search date was August 5, 2022. The sources of each data are listed in Supplementary Data 1, Supplementary Data 3 and Supplementary References. A complete dataset comprising data on adult body mass, maximum lifespan, and social organization, activity, lifestyle, fossoriality for 974 mammal species was used for subsequent analyses (Supplementary Data 1). To identify the candidate genes associated with social organization or longevity, we further generated brain transcriptomes for 94 mammal species. To control for possible effects of confounding variables on gene expression in the regression analyses, data on six traits (i.e., adult body mass, maximum lifespan, social organization, activity, diet, and lifestyle) for these 94 species were included in the models (Supplementary Data 3). Adult body mass and maximum lifespan were log10 transformed prior to analysis.

Moreover, we measured the adult body mass of 7 bat species in the wild for which no data were available in the existing literature and databases. We used imputation methods121 to estimate the maximum lifespan of 35 species that were missing from the database. In this estimation, adult body mass and female time to maturity were used since they are strongly correlated with maximum lifespan. The dataset included 1250 species with information on adult body mass and 816 species with information on adult body mass, female time to maturity, and maximum lifespan. In brief, we first used the finalfit package in R122 to evaluate the missing data pattern of the maximum lifespan (6% of NAs) and female time to maturity (30% of NAs) in the dataset with 1250 species: (a) missing completely at random (MCAR), assuming that data values do not relate to any other variables; (b) missing at random depending on adult body mass (MAR-BM), assuming that data values relate to adult body mass; and (c) MAR depending on phylogeny (MAR-Phy), assuming that data values relate to phylogeny. Then, to find the best imputation approach for our dataset, we employed three popular imputation methods: mice123, missForest124, and Phylopars125, following the process of Penone et al.121. The complete dataset with three traits for 816 species was set as the test dataset. We introduced missing data (5%, 10%, 20%, 25%, 30%, 40%, and 50%) in the test dataset and repeated the analyses at least ten times. Five approaches were then used to impute each missing dataset: mice with and without phylogeny, missForest with and without phylogeny, and Phylopars. With mice and missForest, we also obtained ten imputed datasets per imputation. The Phylopars approach used the phylogenetic trees directly, whereas the mice and missForest approaches used the phylogenetic eigenvectors from principal coordinate analysis in the R package PVR126. The normalized root mean squared error and Bias (see Penone et al.121 for equations) were calculated to compare the imputation methods121. Finally, we selected the best-performed method, Phylopars, to impute missing values in the dataset with 1250 species.

Classification of social organization, longevity, and other traits

Social organization, i.e., the size, composition, and kinship structure of a basic social unit, varies across mammal species31. We used the criteria developed by Kappeler and van Schaik1 to classify the social organization of mammals into three states: solitary, pair-living and group-living. These definitions focus on adult males and adult females and do not consider subadults, juveniles, and infants. Solitary species were defined as individuals who live alone and rarely synchronize their general activity (e.g., forage, rest) and move with others other than for mating and raising their young. A species whose members spend the majority of their life in a solitary state but occasionally forage, migrate, travel, or rest in temporary groups, was categorized as solitary, e.g., Cervus nippon, Enhydra lutris, and some of Balaena spp.114,115,116. Species were classified as pair-living when an adult male and an adult female associate for more than one year within a common home range; this definition considers behavioral evidence of living together but not genetic evidence. Species were categorized as group-living if at least three adults live together, synchronizing their daily activities and interacting with each other25. Cooperatively breeding species were categorized as pair living if a group contains one breeding pair and their non-adult offspring; if there are more than two adults per group, they were classified as group living. We also considered multiple states of social organization for a particular species. We classified activity as one of three states: nocturnal, diurnal, and other (nocturnal/crepuscular, cathemeral, crepuscular, or diurnal/crepuscular) using the definitions and classification of Jones et al.119. Following the approach used by Kissling et al.127, we divided diet into three categories: carnivore, herbivore, or omnivore. In addition, we categorized lifestyle into terrestrial, aerial, arboreal, semi-arboreal, freshwater, marine, or terrestrial-marine, according to Faurby et al.120 and Thorley et al.15. The fossoriality of species included non-fossorial and subterranean15.

Longevity was defined by the maximum lifespan of a given species. This definition is widely used in comparative longevity studies29,128,129. The maximum lifespan of a species was referred to as absolute longevity. Given a strong positive correlation between adult body mass and maximum lifespan, we also measured “relative longevity” to control for the confounding effects of body mass. The “relative longevity”, or the residuals of longevity, were calculated using the body mass adjusted residuals with the equation from the AnAge database: residuals of longevity = maximum lifespan/(4.88 × adult body mass0.153)29. For analyses requiring categorical variables, mammals were divided into long-lived species and short-lived species. We used the 3rd quartile of longevity (26 years) as the cut-off value for the long-lived state. This value is consistent with a previous study that labeled species whose maximum lifespan was more than 26 years as medium or long-lived130. Thus, we divided 974 mammalian species into absolute long-lived species (longevity > 26 years, n = 246) and absolute short-lived species (longevity ≤ 26 years, n = 728). To explore the effects of the classification of long-lived species on the phylogenetic comparative analyses, we also used two different cut-off values for longevity: 17 years (median of longevity) and 35 years (same range from median to the third quartile) (see section “Taxonomic sampling”). In addition, considering the influence of adult body mass on longevity, we used the 3rd quartile of the residuals of longevity as an index to divide mammals into relative long-lived species (n = 244) and relative short-lived species (n = 730). The residuals of longevity were calculated using the equation from the AnAge database as described above. Mammals were classified as relatively long-lived species if the residuals of longevity were greater than the value of the 3rd quartile (1.38), whereas species not meeting this criterion were classified as relatively short-lived species. Similarly, we also used two different cut-off values for relative longevity: 0.93 (median of the residuals of longevity) and 1.83 (same range from median to the third quartile). Long-lived species that were frequently identified in previous studies (e.g., Balaena mysticetus, Heterocephalus glaber, Myotis brandtii, Homo sapiens) were included as long-lived species, demonstrating the accuracy of the two classification approaches.

Mammal phylogeny and phylogenetic signal

We used the mammalian phylogeny tree from TimeTree131 in our analyses. The name.check function of the package Geiger in R was used to check the concordance of species names between a trait dataset and a phylogenetic tree132. Extra tree tips were removed using the drop.tip function of the R package ape133. For the analyses that required binary trees, we randomly converted multifurcating trees to binary trees using the multi2di function of the R package ape. In addition, considering the uncertainty of phylogenetic relationships among species, we also re-analyzed data using a different phylogenetic tree from Upham et al.33.

We calculated Pagel’s lambda using the fitDiscrete function in the R package Geiger for categorical variables (social organization) and the phylosig function in the phytools for continuous variables (longevity). The phylogenetic signals of both absolute and relative longevity were calculated. There is no phylogenetic signal when Pagel’s λ = 0, whereas Pagel’s λ = 1 indicates that a strong phylogenetic signal is detected and the trait has evolved consistently with a Brownian motion model (i.e., that close relatives are more similar than expected).

Evolutionary modeling of social organization and longevity

Evolutionary model of social organization

To evaluate the evolutionary pathway among solitary, pair-living and group-living, we assessed four alternative models in 974 mammalian species using a Bayesian framework, which was implemented in BayesTraits V3134,135. The transition rates between any of the two states of social organization were allowed to vary differently in these four alternative models. First, the equal rates model (ER) is the simplest model in which all transition rates were the same. That is, the transition rates were equal among solitary, pair-living and group-living (Supplementary Fig. 1). Second, the increasing complexity model (IC) posits that transitions are only permitted between solitary and pair-living, and pair-living and group-living, but not between solitary and group-living; transition rates can vary. Third, the all-rates-different model (ARD) dictates that all transition rates are different; this is also known as a parameter-rich model24. Fourth, the reversible-jump Markov chain Monte Carlo-derived model (RJ-MCMC) is derived from the data by the reversible-jump procedure in BayesTraits; this model structure is generated by the highest posterior support from the reversible-jump analysis (see details below).

To generate the best-supported RJ-MCMC model, we used the reversible-jump MCMC procedure with the hyper-prior approach, seeding the mean of the exponential prior from a uniform distribution on the interval of 0 to 224. To prevent the rates from being too small to estimate, we scaled the branch lengths of the tree to have a mean of 0.0132. We ran each Markov chain Monte Carlo simulation for 100 million iterations. The sample frequency of the MCMC chain was set as 100 iterations. We set the first 50 million iterations as the burn-in period. The model was identified as convergent when the posterior distribution was approximately normal and the trance of harmonic mean log-likelihoods remained stable across runs. We also plotted the transition rates across the MCMC chain in Tracer136 and verified that the effective sample size for the parameters of interest was above 200 (i.e., ESS > 200)26. We repeated the run of the MCMC chain ten times and ranked models that were visited by the MCMC chain according to their posterior probability for each run. Ten independent reversible-jump analyses were conducted to generate the highest posterior support RJ-MCMC models, which accounted for 31.79%, 34.74%, 32.01%, 32.96%, 35.54%, 33.54%, 33.36%, 32.99%, 32.28%, and 33.61% of 500,000 visits, respectively; these analyses consistently showed that transitions occurred between any two states of social organization, except from solitary to pair-living. Moreover, group-living was an intermediate step toward pair-living from solitary living.

We initially generated the RJ-MCMC model with the highest posterior support as described above. We subsequently compared four alternative evolutionary models of social organization: ER model, IC model, ARD model, and RJ-MCMC model. Each model was run using the hyper-prior approach (which seeds the mean of the exponential prior from a uniform distribution of 0-2) for 100 million iterations. Every 100 iterations were sampled and the first 50 million iterations were used as the burn-in. We set the stepping-stone sampler to 1000 stones and 10,000 iterations per stone to estimate the marginal likelihood. We conducted ten runs independently for each model. We calculated the variance of the log marginal likelihood values over the ten iterations to represent the stability of the models. The variances of all models were less than 0.03. We compared the four models using Log Bayes Factors (Log BF), calculated using the following equation: Log BF = 2 × (log marginal likelihood complex model − log marginal likelihood simple model). The simple model is favored when Log BF is less than 2, while there is positive evidence to support the complex model when Log BF is greater than 2 (strong evidence, 5–10; very strong evidence, >10)32.

Evolutionary model of longevity

To determine the evolutionary direction between long-lived and short-lived species, longevity was treated as a categorical variable. Both absolute longevity (i.e., absolute short/long-lived species) and relative longevity (i.e., relative short/long-lived species) were used in the following analyses. Three alternative models were constructed to estimate the evolutionary pathway of longevity: (a) ER model, in which the transition rate from a long-lived to a short-lived state and back is equal; (b) ARD model, in which the transition rate from a long-lived to a short-lived state and back is unequal; (c) RJ-MCMC model (as described above), which is derived from the data by the reversible-jump procedure. The priors, total iterations, burn-in periods, and sampled generations were the same as those used in the social organization models above (hyper-prior approach: the mean of the exponential prior from a uniform distribution of 0-2; iterations: 100 million; sample: 100 iterations; burn-in: the first 50 million iterations; stepping-stone sampler: 1000 stones and 10,000 iterations per stone). Similarly, the same model selection procedure as described for social organization was used to compare the three alternative evolutionary models for longevity. In addition, we repeated ten independent runs for absolute and relative longevity. All models were stable (variances <0.01).

Correlated evolution of social organization and longevity

To test whether social organization influences absolute longevity or relative longevity, we first used phylogenetic ANOVA (phyloAVOVA function) in the R package phytools137 for 974 mammalian species. P-values were adjusted using the Hommel method138, and two-sided p-values were reported. To control for factors related to external mortality, we also constructed a regression of the Markov chain Monte Carlo sampler for multivariate generalized linear mixed models (MCMCglmm), incorporating the phylogenetic relationship as the covariance structure139. In the MCMCglmm model, longevity was fitted as a Gaussian response variable and social organization, adult body mass, activity, lifestyle, and fossoriality as predictor variables. We then tested whether social organization is associated with longevity by analyzing the correlated evolution between the two characteristics using the Discrete package in BayesTraits140. Discrete package tests if two binary traits are correlated over a phylogeny by comparing the likelihoods of an independent and dependent model. All possible jumps between the states of each trait are allowed, but in an independent model, the two traits are assumed to evolve independently by placing some restrictions on the transition rate parameters; in a dependent model, which assumes the transitions in one trait depend on the state of the other, the rate parameters are not restricted140. To fit the “Discrete” test requirements, we divided social organization into three categories: non-solitary (0) and solitary (1); non-pair-living (0) and pair-living (1); non-group-living (0) and group-living (1); we also divided lifespan into two categories: absolute short-lived (0) and absolute long-lived (1); relative short-lived (0) and long-lived (1). We ran the “Discrete” analysis with a pairwise combination of the three categories of social organization and two categories of lifespan. That is, six “Discrete” analyses: non-solitary/solitary and absolute short-lived/long-lived; non-pair-living/pair-living and absolute short-lived/long-lived; non-group-living/group-living and absolute short-lived/long-lived; non-solitary/solitary and relative short-lived/long-lived; non-pair-living/pair-living and relative short-lived/long-lived; non-group-living/group-living and relative short-lived/long-lived). For each “Discrete” test, RJ-MCMC chains with an independent and dependent model were tested. We used the same procedure as described above to set the model parameters (hyper-prior approach: the mean of the exponential prior from a uniform distribution of 0-2; iterations: 100 million; sample: 100 iterations; burn-in: the first 50 million iterations; stepping-stone sampler: 1000 stones and 10,000 iterations per stone), check model convergence, and compare model performance. RJ-MCMC chains with an independent and dependent model were compared using Log BF. Ten independent runs indicated that all models were stable (variance <0.5).

Taxonomic sampling

To evaluate the impact of sample size on the phylogenetic analyses, we followed the taxonomic sampling methods of Kappeler and Pozzi26 to subsample the original dataset. We randomly selected species from the original dataset and generated subsets including 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, and 50% of the 974 species; we repeated this step 10 times. A total of 100 subsets were created. Each subset was used to run an RJ-MCMC independent and dependent model of six combinations: non-solitary/solitary and absolute short-lived/long-lived, non-pair-living/pair-living and absolute short-lived/long-lived, non-group-living/group-living and absolute short-lived/long-lived, non-solitary/solitary and relative short-lived/long-lived, non-pair-living/pair-living and relative short-lived/long-lived, non-group-living/group-living and relative short-lived/long-lived.

In addition, to explore the influence of the classification of long-lived species on the phylogenetic analyses, we did not only use 26 years (absolute longevity) and 1.38 (relative longevity) as cut-off values, but also two different cut-off values for absolute longevity (17 years and 35 years) and for relative longevity (0.93 and 1.83). If the longevity of a species was higher than the cut-off value, it was considered to be a long-lived species. For each classification, we randomly sampled the taxa similar to the procedure described above and generated 100 subsets, including 95%, 90%, 85%, 80%, 75%, 70%, 65%, 60%, 55%, and 50% of the original species. Then, we used each subset and compared the RJ-MCMC independent model and the RJ-MCMC dependent model for six combinations: solitary (0/1) and absolute longevity (0/1), pair-living (0/1) and absolute longevity (0/1), and group-living (0/1) and absolute longevity (0/1); solitary (0/1) and relative longevity (0/1), pair-living (0/1) and relative longevity (0/1), and group-living (0/1) and relative longevity (0/1).

Sample collection and RNA extraction

Comparative transcriptomics is a powerful approach to infer molecular changes underlying complex traits141,142,143. The brain is a central organ that mediates social behavior and social systems, including partner preference144, parental care145, social hierarchy146, eusociality141,142, and mating system28. Brain transcriptomics has also been used to identify significant genes and pathways that are associated with longevity trait30,147. Therefore, we performed comparative transcriptomics of the brain to test whether there are overlapping pathways and genes underlying the correlated evolution between longevity and social organization. We collected 267 fresh brain samples from 94 mammals that encompassed three types of social organization (solitary, pair-living, and group-living) and various longevities (3–122 years). The brain transcriptomes of 101 samples of 42 species were obtained from the published literature30,148,149,150,151,152,153,154,155,156,157,158,159. 57% of species and 62% of samples (166 samples of 54 species) were newly collected in this study. 96% of sampled individuals were adults and 73% of individuals were males (Supplementary Data 2). 71% of 94 species were prepared in more than two biological repeats or technical repeats; the number of repeats for each species are shown in Supplementary Data 2. Since we aimed to characterize the conserved genes and pathways that are related to social organization and longevity among species, the mammal species were chosen based on the availability of brain transcriptome and life-history data, and also the representation of mammal diversity and taxa distribution in the phylogenetic tree. These 94 mammal species are belonging to 14 orders (Artiodactyla, Carnivora, Chiroptera, Cingulata, Didelphimorphia, Diprotodontia, Eulipotyphla, Hyracoidea, Lagomorpha, Monotremata, Perissodactyla, Primates, Rodentia, and Scandentia). We calculated the ratio of species of each order to 974 mammals with longevity available (Ratiom), and the ratio of RNA-seq sampled species of each order to 94 species (Ratios). There was no significant difference between Ratiom and Ratios (Ratiom: Median = 2.09%, IQR = 0.38% ~ 7.57%; Ratios: Median = 1.06%, IQR = 1.06% ~ 10.37%; Wilcoxon signed rank test: n = 14, V = 38, P = 0.38), indicating that sampled species in the transcriptomic analyses have a good taxa representation in general. The mean longevity of the 974 mammals and 94 species was 19.55 ± 15.96 and 20.87 ± 16.53, respectively. The medians were 17.15 (IQR = 8.30 ~ 26.18) and 17.40 (IQR = 11.40 ~ 23.55). A summary of the social organization and other life-history traits is shown in Supplementary Table 9.

For the newly collected samples in this study, species identification was performed based on their morphological characteristics and the sequences of the mitochondrial DNA cytochrome oxidase I gene (COI) or cytochrome b gene (cytb). We sampled the forebrain or frontal lobe of the brain of large mammals (e.g., Carnivora) or the entire brain other than the olfactory bulb, cerebellum, pituitary and brain stem for small species (e.g., Chiroptera). We dissected the brains, rapidly froze them in liquid nitrogen, and stored them at −80 °C. Following homogenization of brain tissue, we used the TRIzol protocol to extract total RNA from all samples, according to the user guide (Invitrogen). We assessed the RNA quality and concentration using the Agilent 2100 Bioanalyzer (Lexington, MA, USA) prior to library construction.

RNA sequencing and gene expression

For each brain RNA sample, a library size of 150 bp was constructed using NEBNext Ultra RNA Library Prep Kit for Illumina (NEB, USA) and paired-end sequencing was performed on the Illumina NovaSeq 6000 platform (Novogene Co. Ltd), generating approximately 2.7 billion reads. We used in the package NGS QC Toolkit to remove low-quality reads of raw data160. Draft transcriptomes for 51 without-reference genomes were assembled using Trinity (parameters: k-mer = 25, minimum contig length = 200 bp, paired fragment length = 500 bp, maximum memory = 25 G)161. We then ran cd-hit-est (90% similarity) to reduce redundancy in the assembly by retaining the longest transcript in each cluster162,163. We applied AUGUSTUS164 to predict protein coding genes in non-redundant transcripts using the default parameters, except for –species = human and produced annotation GTF files. Reference genomes for 43 species were publicly available from the NCBI and Ensembl databases.

To generate the species-specific ortholog sets and calculate expression values, we first generated a human reference. We extracted the human CDS sequences using the gffread function of the Cufflinks package165. The longest transcript for each gene was extracted after filtering for incomplete and pseudogene transcripts. We used BLAST166 to remove highly repetitive or similar genes (cut-offs: e-value <1.0 × 10−6 and identity > 90%). We then extracted the longest transcript of each gene for other species and conducted mutual BLAST between each of those species and humans as a reference (cut-offs: e-value <1.0 × 10−6 and identity >30%)167. Genes with reciprocal best hits were identified as orthologous genes. Given that differences were generated when mapping the transcriptome to whole-genome and de novo genome, we set the CDS of orthologous genes as the reference genome for each species and produced annotation files (GTF format). We used STAR to create an index for each species based on the total sequence size of all orthologous genes and the read length168, then aligned the RNA-seq reads to the reference genome using the default parameters in STAR. We counted reads using the featureCounts function of the Subread package169 without including multi-mapping reads. Orthologous genes were discarded when the raw counts of genes were low (<10 counts in >3 samples; 2564 genes were removed) or high (read counts >5% of the total counts; 1 gene was removed). Batch effects (Bioproject and sequencing platform) were detected and removed with the comBat_seq function of the R package sva170. We kept orthologs that were detected in 70% of the total species (i.e., 66 of 94 species). The number of orthologous genes for different cut-off values was 14,730 (50% of species), 14,152 (60% of species), 13,402 (70% of species), 12,476 (80% of species), 10,138 (90% of species), and 0 (100% of species). 70% was chosen as it was a trade-off between the number of orthologous genes and the number of species that could be used in the sequencing analyses. The gene expression of missing orthologs in a species was treated as missing data. The average number of species used in a model differed for each gene, with a mean of 87.96 ± 6.27 in 13,402 genes. To normalize the raw counts, we scaled the library size (i.e., the number of total reads for each sample) by the TMM (trimmed mean of M-values), and used RPKM (reads per kilobase per million mapped reads) to normalize gene length in the R package edgeR171. One was added to the value generated from the RPKM-TMM method before log2 transforming to avoid an infinite value. We used these log-transformed gene expressions for all downstream analyses.

Modeling gene expression and traits

To identify the candidate genes associated with a specific social organization or longevity trait across the 94 species, we conducted MCMCglmm model, incorporating the phylogenetic relationship as the covariance structure in the model139. For each of the 13,042 orthologous genes, we constructed four MCMCglmm models; in all four models, the expression value of one gene from the 94 species was fitted as a Gaussian response variable and adult body mass, longevity, activity, diet, lifestyle, and social organization as predictor variables. Since for some genes the species did not include all levels of lifestyle as defined above, we categorized lifestyle into aerial and non-aerial (i.e., terrestrial, arboreal, semi-arboreal, freshwater, marine, and terrestrial-marine) in the gene expression models. The differences between these four models were the different categories of the variable social organization: (a) to identify the solitary-associated genes, all species were classified as solitary and non-solitary in the first MCMCglmm model; (b) to identify the pair-living-associated genes, all species were classified as pair-living and non-pair-living in the second MCMCglmm model; (c) to identify the group-living-associated genes, all species were classified as group-living and non-group-living in the third MCMCglmm model; and (d) all species were classified as solitary, pair-living and group-living in the fourth MCMCglmm model. We used a prior of covariance V of 1.00 and a degree of belief parameter (nu) of 0.002. We ran two MCMC chains for 1 million iterations, with 100,000 burn-in and a thinning of 500 iterations for each mode using the MCMCglmm function in the R package MCMCglmm172. Model convergence was declared when Gelman-Rubin’s convergence diagnostic was less than 1.1 using the gelman.diag function in coda package173. All effective sample sizes were set at >1000. Genes were considered statistically significant if pMCMC was less than 0.05 and the absolute value of the posterior mean was greater than the cut score. The pMCMC values were used directly because MCMCglmm implements MCMC methods for Bayesian generalized linear mixed models172,174. Within a Bayesian framework, in which parameters are estimated based on priors, multiple comparison corrections are not required. Hence, from a Bayesian viewpoint, there is no need to adjust pMCMC175,176,177,178,179. The cut score of the posterior mean was calculated using the following steps. Given that the MCMCglmm estimates a value for the posterior mean (similar to the coefficient in linear regression) for each predictor variable in the model, we first plotted a histogram of the posterior mean and fit a high probability distribution (normal or logistic distribution) to the data using the fitdist function in R package fitdistrplus180. The possible distributions were compared to obtain the best fit by computing goodness-of-fit statistics using the gofstat function of fitdistrplus. Parameters were estimated when the best fit distribution was chosen. We computed the statistics (e.g., Z-score) of the best fit distribution as the cut score, where each side of the distribution was cut at 0.025 (i.e., a significance level of 0.05 for a two-tailed test). For example, if the data fit a standard normal distribution, the cut score (Z-score) was approximately 1.96. Thus, genes whose pMCMC <0.05 and |posterior mean| > 1.96 were identified as significant genes. A gene with a positive or negative posterior mean had an up- or downregulated expression, respectively. If a significant gene was detected in more than one model, the value of the posterior mean and pMCMC were displayed using the mean and standard deviation (SD) value of these models.

Tests for relaxed and intensified selection

Multiple sequence alignment of each ortholog gene was performed using the Perl script translatorX.pl181, which calls MAFFT for alignment182 and GBlock183 for filtering unreliable regions (GBlock parameters: b1 = 0.75, b2 = 0.85, b3 = 3, b4 = 5). To test whether genes were under relaxed or intensified selection in solitary, pair-living, group-living, and long-lived species (four hypotheses tests), we ran a likelihood ratio test (LRT) on the 13,402 orthologous genes in RELAX (implemented in HYPHY65 using species tree as an input). In tests of the four hypotheses, the 94 species were divided into: (a) solitary, set as test branches, or non-solitary, set as reference branches; (b) pair-living (test) or non-pair-living (reference); (c) group-living (test) or non-group-living (reference); and (d) long-lived (>26 years, test) or short-lived (reference). We compared the model (K = 1) and the alternative model allowing K to be estimated for each hypothesis test. RELAX estimated three dN/dS rate categories and inferred the selection intensity parameter K. K < 1 indicates relaxed selection in the test branches, whereas K > 1 indicates intensified selection.

Pathway enrichment analysis

We used the pipeline of the detection of polygenic selection in genesets (polysel), which has the power to detect several genes with small effect mutations that can have a large influence on a biological pathway (see Daub et al., 2013 and 2017 for details)50. In addition to the gene sets from Daub et al.51, we also generated the gene set of the behavior (GO:0007610) and social behavior (GO:0035176) pathways from GO184,185. To detect the pathways associated with social organization or longevity, we used the posterior means (i.e., post mean) of each gene, which were generated from the MCMCglmm models as gene scores. The SUMSTAT score was calculated by summing the gene scores of genes in the set of interesting pathways. In the analysis of upregulated or positive gene sets, the gene score of downregulated or negative genes was set as 0 and vice versa. We used the cor.test function in R to run a correlation between gene score and gene length or species number; if they were significantly correlated (P < 0.05), we used RescaleBins to adjust the gene score. Overlapping genes between gene sets were removed in the pruning process. In addition, we used the same procedure of polysel to detect gene sets that were under selection. In pathway enrichment analyses of selection, the gene scores were set using the K value of each gene, which was generated from RELAX. The pathway was significant if the P-value was less than 0.05 or the absolute of the log10 of P-value was greater than 1.30.

Two-sided tests were used in all statistical analyses. We used general R packages for plotting, such as ggplot2186, dplyr187, RColorBrewer188, EnvStats189, and ggthemes190, but also some specific packages. For example, the R packages ggtreeExtra191, ggtree192, ggstar193, tibble194, and ggnewscale195 were used for the phylogenetic tree plots (Fig. 1a and Fig. 2a). The package vioplot196 was used in Fig. 1b and Fig. 1c. Venn diagrams (e.g., Fig. 2b) were plotted with the VennDiagram package197. The packages ggpmisc198, cowplot199, and ggpubr200 were used in Fig. 2d and Fig. 2e. The packages pheatmap201 and scales202 were used to plot the pathway heat maps (e.g., Fig. 2f and Fig. 3e).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.