The convergent evolution of eusociality is based on a shared reproductive groundplan plus lineage-specific sets of plastic genes

Eusociality, characterized by caste-based division of labor, has convergently evolved multiple times. However, the genomic basis of caste and degree to which independent origins of eusociality have utilized common genes remains largely unknown. To elucidate these issues, we characterized caste-specific transcriptomic profiles across development and adult body segments from pharaoh ants (Monomorium pharaonis) and honey bees (Apis mellifera), representing two independent origins of eusociality. We identified a large shared core of genes upregulated in the abdomens of queen ants and honey bees that also tends to be upregulated in female flies. Outside of this shared core, few genes are differentially expressed in common. Instead, the majority of the thousands of genes underlying the expression of the caste system are plastically-expressed, rapidly evolving, and relatively evolutionary young. Altogether our results show that the convergent evolution of eusociality involves the recruitment of a core reproductive groundplan along with lineage-specific sets of plastic genes. Significance Statement The convergent evolution of similar features in different lineages exemplifies how natural selection can produce common adaptive solutions, but whether the same genes are consistently used remains unresolved. Eusociality, defined by caste-based division of labor, convergently evolved in several insect lineages. Previous studies alternately emphasized lineage-specific genes or small sets of shared genes and pathways. We conducted an unprecedented parallel transcriptomic study and identified a large core of ancient reproductive genes that was recruited during the independent evolution of queen and worker castes in ants and honey bees. Outside this shared core, we find thousands more caste-associated genes with distinct features that are not shared, demonstrating that convergent eusocial evolution is based on a combination of shared and lineage-specific genes.


41
The degree to which convergent phenotypic evolution involves the same sets of genes or 42 pathways is a major unanswered question (1). Comparative genomic studies indicate that parallel 43 adaptive changes in the protein-coding sequences of the same genes are frequently associated 44 with the evolution of convergent phenotypes in closely related populations and species (2, 3). 45 Decades of research in evolutionary developmental biology also emphasize that changes in the 46 expression of a relatively small "toolkit" of deeply conserved genes are often associated with convergently evolved phenotypes in distantly related species (4). Alternatively, convergent 48 phenotypic evolution between lineages could involve distinct subsets of genes in each lineage. 49 Taxonomically-restricted genes, genes which are only found in certain lineages (5) have been shown to be important for lineage-specific evolutionary novelties (6), but their relative 51 contribution to the evolution of convergent phenotypes is unknown. 52 The evolution of eusociality in several insect lineages (e.g., ants, honey bees, vespid 53 wasps, termites) provides a striking example of convergent phenotypic innovation (7). Eusocial 54 insect societies are founded upon a novel caste polyphenism, in which reproductive queen and 55 non-reproductive worker female castes develop from the same genome, depending mainly on 56 socially-regulated nutritional inputs (8,9). Within the worker caste, further specialization occurs 57 as individuals age and progress through a series of tasks, including nursing and foraging (7). 58 Polyphenic traits are often thought to evolve from pre-existing developmental plasticity (10), and 59 leading hypotheses for the evolution of caste-based division of labor in social insects also stress 60 the use and modification of highly conserved developmental and physiological mechanisms (11- Here, we present to date the most comprehensive developmental transcriptomic dataset 96 investigating gene expression associated with reproductive caste and worker age-based division 97 of labor in the pharaoh ant (Monomorium pharaonis) and the honey bee (Apis mellifera), species 98 which represent two independent origins and elaborations of eusociality (41). We performed all 99 sampling, sequencing, and analysis for the two species in parallel to maximize compatibility 100 between the data sets. We leverage this extensive dataset to quantify in an unbiased manner the 101 relative contribution of differential expression of shared versus distinct genes at each life stage 102 and tissue to the convergent evolution of caste-based division of labor.

105
We constructed two large, parallel transcriptomic datasets in honey bees and pharaoh ants 106 spanning caste development as well as adult tissues separated by behavior, reproductive caste, 107 and sex. In total we sequenced 177 RNA-sequencing libraries, across 28 distinct sample types for 108 each species (Table S1). To identify genes associated with caste development and adult caste dimorphism, we performed 112 differential expression analysis between queens and workers at each developmental stage and 113 adult tissue, separately for each species. The number of differentially expressed genes (DEGs) 114 between queens and workers increased throughout development, peaking in the adult abdomen 115 differentially expressed in the other species (Fig. 1A). Similarly, the magnitude of gene-wise 117 caste bias (as measured by log 2 fold-change between queen and worker samples) was weakly  To identify genes associated with age-based worker division of labor, we performed differential 127 expression analysis between nurses and foragers in each adult tissue, separately for each species.

128
In general, there were very few behavioral DEGs in common in the two species (Fig. 1B). Gene-129 wise log 2 fold-change between nurses and foragers was significantly but weakly correlated across 130 ant and honey bee orthologs ( Fig. S2; r head = 0.070, P head < 0.001; r thorax = 0.031, P thorax = 0.008; 131 r abdomen = 0.051, P abdomen < 0.001; N = 7460 1:1 orthologs). The top enriched GO terms for 132 behavioral DEGs were dominated by metabolism and developmental processes (Table S6 and    We next tried to put the seemingly large proportion of shared abdominal caste-associated 151 DEGs (35% for ants and 29% for honey bees) into context, through comparison with the 152 proportion of genes that were differentially expressed across larval development in both species, 153 given that the molecular mechanisms of development are thought to be highly conserved (42). 154 We identified 6089 and 6225 developmental DEGs in ants and honey bees, respectively, 155 including 2544 shared DEGs, representing 42% (2544/6089) and 41% (2544/6255) of the total 156 developmental DEGs in each species (Fig. S4).
particularly important for queen abdominal expression (and presumably function), we performed 159 gene co-expression analysis, separately for each species. We identified a module of genes 160 specifically associated with queen abdominal expression in each species (N = 1006 genes in 161 module for ants, N = 1174 genes for honey bees). We identified hub genes in each module, genes 162 which are centrally connected in networks and strongly associated queen abdominal expression 163 (43). Many hub genes were clearly associated with reproduction and maternal effects (Table S8 164 and S9), including genes with known roles in caste determination such as vitellogenin (Vg  Given that our co-expression analysis indicated that many of the most important queen-173 upregulated genes are clearly associated with female reproduction, we reasoned that caste-biased 174 expression may be derived from sex-biased expression. Indeed, there was a positive correlation 175 between gene-wise log 2 fold change between queen and worker abdomens and gene-wise log 2 176 fold-change between queen and male abdomens in both honey bees and pharaoh ants ( Fig. 3 A   177 and B). Additionally, sex bias itself was correlated between species (Fig. 3C). The correlation of caste bias and sex bias was not restricted to the abdomen, as there was similar highly significant 179 effect when comparing heads and thoraces, albeit with weaker effect sizes (Fig. S5).

180
Given the link between shared caste bias and sex bias within ants and honey bees, we 181 hypothesized that these shared caste-biased genes were derived from conserved pathways that 182 also underlie sexual dimorphism in distant relatives. To test this hypothesis, we estimated the   (47)). Similarly, we defined "overall behavior bias" as the Euclidean distance of log 2 200 fold-change across all nurse/forager comparisons, separately for each species.

201
Across 1:1 orthologs, overall caste bias measured in ants was correlated to overall caste 202 bias measured in honey bees, and overall behavior bias was similarly correlated between species 203 (Fig. S7). Within species, overall caste and behavior bias were also correlated to each other (Fig.   204   S8). This indicates that plasticity in gene expression is correlated across contexts (caste versus 205 behavior) and species. GO terms associated with high caste bias were largely linked to 206 metabolism, while those associated with high behavior bias were largely linked to developmental 207 processes (Table S10).

209
Characteristics of genes associated with caste and behavior 210 We compared overall caste bias and overall behavior bias to gene age, evolutionary rate, network 211 connectivity, and tissue-specificity to understand the general features of genes commonly 212 associated with caste (queen versus worker) or behavior (nursing versus foraging). Genes with 213 younger estimated evolutionary ages tended to exhibit higher overall caste bias ( Fig. 4 A and B) 214 and behavior bias ( Fig. S9 A and B) compared in particular to ancient genes (Gamma GLM; ant 215 caste bias: χ 2 = 900.19, honey bee caste bias: χ 2 = 1412.80, ant behavior bias: χ 2 = 316.36, honey 216 bee behavior bias: χ 2 = 877.43; P < 0.001 for all cases; N = 10520 in ant, N = 10011 in honey 217 bees). Genes that were loosely connected (i.e. peripheral network elements) in co-expression 218 networks constructed across all samples tended to exhibit more caste and behavior bias in 219 comparison to highly connected genes (Fig. 4 C and D; Fig. S9 C and D). Similarly, genes with 220 high tissue-specificity across 12 honey bee tissues tended to exhibit higher values caste and behavior bias in comparison to more pleiotropic, ubiquitously expressed genes (Fig. S10), where 222 tissue specificity was calculated using available data (31). Finally, genes that were rapidly 223 evolving (i.e. with high values of dN/dS) tended to exhibit higher levels of caste and behavior 224 bias ( Fig. 4 E and F; Fig. S9 E and F). Importantly, while expression is correlated to overall 225 caste and behavior bias, our results are not driven by the effect of expression levels according to 226 partial correlation analysis (Table S11).  (Tables S6 and S7), which is consistent with the notion that the 259 transition from nurse to forager is essentially a developmental process, and that common 260 molecular pathways may provide the raw genetic material for social evolution (14, 24, 27).

261
Conserved factors or pathways clearly play important roles in aspects of caste 262 development and function as well as the transition from nursing to foraging, but our results and 263 other studies indicate that the majority of the full transcriptomic architecture associated with 264 caste and age polyethism is not shared between species (24, 30, 34-36). This lineage-specific architecture is comprised of large groups of both orthologous genes with different expression 266 patterns and taxonomically-restricted genes ( Fig. 1 A and B). In contrast the low amount of 267 context-specific overlap in differential expression, the overall degree of caste-associated plastic 268 expression across stages and tissues (overall caste bias) was correlated between species (Fig. S7   269   A and B), and expression plasticity was correlated between behavior and caste (Fig. S8). Genes 270 with high overall caste bias tended to be loosely connected (Fig. 4 C and D, i.e. downstream in 271 regulatory networks) and displayed tissue-specific expression profiles (Fig. S10). This indicates 272 that changes in their expression are unlikely to strongly influence the expression of many other 273 genes. Furthermore, we find that genes with high overall caste bias are weakly constrained in 274 terms of sequence evolution, as they tend to evolve rapidly and be evolutionarily young (Fig. 4 275 A, B, E, F), though note that evolutionary age and rate cannot be reliably disentangled (50).  Our study shows that the recruitment of a large core of conserved reproductive-associated 285 genes, which make up a reproductive groundplan, is fundamental to the convergent evolution of 286 caste-based division of labor in ants and honey bees. However, our study also reveals that the 287 bulk of the full genetic architecture underlying the expression of social insect caste-based 288 division of labor varies between lineages. This is reflected by the general biology of social 289 insects, in that independently evolved societies share reproductive division of labor, the main 290 defining feature of eusociality, but also display a wide diversity of lineage-specific adaptations 291 (7). It is likely that a relatively small number of core conserved genes exist as upstream hubs in 292 regulatory networks, and layered on top of this core is a myriad of taxonomically-restricted 293 genes as well as conserved genes with lineage-specific expression patterns (6, 31, 34, 53). This is 294 consistent with models for the evolution of hierarchical developmental gene regulatory networks, 295 whereby a relatively small number of highly conserved genes act upstream to initiate gene    (Table S1).

310
To identify caste-associated differentially-expressed genes (DEGs), we performed differential 311 expression analysis between queens and workers at each developmental stage and tissue, 312 separately for each species. We first removed lowly-expressed genes with genes with counts per 313 million (CPM) less than one in all samples. We constructed GLM-like models including replicate 314 and caste and identified genes associated with caste (FDR < 0.1) at each stage or tissue using  Coexpression Network Analysis 324 We performed plaid clustering, a non-deterministic biclustering algorithm (57). Biclustering 325 seeks to identify groups of genes that are co-expressed across a specific subset of samples (58). 326 We identified genes that were consistently associated with a queen-abdomen specific bicluster 327 across 1000 iterations, which we term queen abdominal modules (see Supplemental Methods). 328 We conservatively identified module hub genes as genes with intra-module connectivity in at 329 least the 90th percentile and abdominal log 2 fold-change values greater than 2 (representing a 4-330 fold increase in expression in queen relative to worker abdomens). We calculated connectivity of 331 each gene as the weighted sum of the Pearson correlation of expression between the given gene 332 and all other genes, where we raised each correlation to the 6th power, the default value for 333 weighted gene co-expression analysis (59). Intra-module connectivity (used in Fig. 2) represents 334 the connectivity of genes within the queen abdominal module to other genes within the module, 335 while total network connectivity (used in Fig. 4) represents the connectivity of genes across the 336 entire transcriptome, after filtering out lowly-expressed genes.  Estimation of evolutionary rate 345 We estimated evolutionary rate using dN/dS, the ratio of non-synonymous to synonymous 346 nucleotide changes. We estimated pairwise dN/dS between each focal species and a second  354 We performed partial Spearman correlations between overall bias and evolutionary/network 355 characteristics, controlling for the effect of expression.     "Head", "thorax", and "abdomen" refer to body segments of adults, while "pupa" and "larva" refer to whole bodies. "No ortholog" refers to genes for which no 1:1 ortholog exists (either due to apparent duplication or complete lack or orthology), "not shared caste/task bias" refers to genes for which 1:1 orthologs can be identified but are only differentially expressed in one species, and "shared caste/task" bias refers to genes for which 1:1 orthologs are differentially expressed in both species. Insets show the proportion of each category of gene out of all differentially expressed genes at that stage or tissue. C) Proportion of abdominal DEGs by estimated evolutionary age (shading). "Shared queen/worker" indicates genes upregulated in queen or workers of both species. *: the category "larva" represents differential expression across larvae of all stages for which caste can be identified (second to fifth larval stage).

Fig. 2.
Abdominal caste bias (log 2 fold change queen versus worker) is correlated with connectivity within the queen-abdomen module in A) ants (rho = 0.536, P < 0.001) and B) honey bees (rho = 0.617, P < 0.001). Genes upregulated in queens are in red, while genes upregulated in workers are in blue. Connectivity is proportional to the most highly connected gene in the module. Connectivity within the queen abdominal module is higher for genes found in the module for both species (shared) versus genes found in the module for only one species (not shared) in C) ants and D) honey bees. ***P < 0.001 (Wilcoxon test) ey Fig. 3. Caste bias is derived from sex bias. Abdominal caste bias (queen vs worker log 2 fold-change) is correlated to abdominal sex bias (queen vs male log 2 fold-change) in A) M. pharaonis (rho = 0.715, P < 0.001) and B) A. mellifera (rho = 0.774, P < 0.001) and abdominal sex bias is correlated between the two species (rho = 0.280, P < 0.001) (C). Red indicates shared queen-biased abdominal DEGs, while blue indicates shared worker-biased abdominal DEGs. Grey indicates genes that did not exhibit shared expression patterns or were not differentially expressed. D) Shared queen-biased abdominal DEGs tend to be female-biased in D. melanogaster while shared worker-biased abdominal DEGs tend to be male-biased in D. melanogaster (likely reflecting down-regulation in females). Fig. 4. Genes that exhibit more caste bias across tissues and developmental stages have younger estimated evolutionary ages (A,B) and tend to be loosely connected (C,D; ant: rho = -0.159, P < 0.001; honey bee: rho = -0.090, P < 0.001) and rapidly evolving (E,F; ant: rho = 0.157, P < 0.001; honey bee: rho = 0.240, P < 0.001). "Overall caste bias" combines queen/worker log 2 fold-change values across all development stages and adult body segments. Connectivity is calculated using all samples and genes and scaled proportionally to the highest value.