Introduction

Eusocial insects are characterized by reproductive division of labor. Within such insect colonies of termites, ants, bees, or wasps, one or a few individuals specialize in reproduction, while workers (and sometimes soldiers) perform all non-reproductive tasks in the colony, such as foraging, brood care, or colony defense. Associated with this division of labor is a striking increase in the longevity of queens (and kings in termites) compared to worker but also solitary insects1,2. How can such a division of labor evolve, and how can different castes develop? Social insect castes are prime examples of phenotypic plasticity, i.e., the expression of different phenotypes from the same genetic background. Within a colony, workers, soldiers, and new reproductives arise due to differential gene expression during ontogeny caused by epigenetic regulations or environmental triggers such as season, differential feeding by nestmates, the presence of predators, or food availability (e.g.,3,4,5,6).

Within the realm of sociogenomics (sensu7), there has been considerable progress in identifying genes and gene networks underlying caste differentiation and caste differences in social Hymenoptera (e.g., ants:8,9,10, bees:11, wasps:12; reviewed in13,14, and references therein). These results revealed that genes and gene networks from solitary insect species were co-opted for caste differentiation (reviewed in13,14,15) and these genes might be part of a genetic toolkit that underlies the evolution of caste16. During social evolution, some of these networks have become uncoupled, and their genes heterochronically (i.e., change in the timing of the expression of genes during development over evolutionary time) expressed between castes. Details seem to differ between species and lineages (e.g.,9,17). However, there are re-current genes and gene pathways that are associated with nutrient sensing (IIS: insulin/insulin-like growth factor 1 signaling; TOR: target of rapamycin pathways), endocrine (juvenile hormone, JH) regulation, and fecundity (vitellogenin/yolk protein) (e.g.,15,17,18,19,20). These genes and molecular pathways have been summarized as the TI-J-LiFe (for TOR/IIS-JH-Lifespan/Fecundity) network which underlies life history traits in insects in general17. Strikingly, in social insects, genes linked to chemical communication (e.g., cuticular hydrocarbon/CHC synthesis and perception) seem to be important components of the TI-J-LiFe network as well17,21.

Compared to social Hymenoptera, little comparative sociogenomic data exist for termites, most concentrating on the development of soldiers in a handful of species (e.g., reviewed in22,23,24), with a few studies on termite reproductives21,25. However, transcriptomic studies rarely compared queens with workers and if they did, they used whole bodies which may mask tissue-specific signals (see Discussion) (exception21). Termites (infraorder: Isoptera) are ‘social cockroaches’, a monophyletic clade nested within the Blattodea26 that evolved eusociality and castes independently from the social Hymenoptera. This different ancestry is reflected in colony composition: Termite colonies are composed of both sexes, with a queen and king heading a colony and with workers that are developmentally immature. Unlike social Hymenoptera, termite workers are immatures, and, therefore, there are no adult workers. A recent study on the drywood termite Cryptotermes secundus (Kalotermitidae) identified a module of 288 co-expressed genes from head plus prothorax tissue, the queen central module (QCM), which characterizes queens21 (Figure S1). The QCM comprises central molecular pathways that underlie a queen phenotype. It has a strong neuro-endocrine signal indicative of high JH titers in line with an upregulation of fecundity-related genes, such as vitellogenins (Figure S1). The QCM also included signs of an upregulation of the IIS pathway as well as signals of chemical communication, similar to those in many social Hymenoptera15,17.

In the current study, we first aimed to test whether the QCM exists in the Archotermopsidae as well, a termite family with a more basal phylogenetic position than the Kalotermitidae but with similar life style and caste development. Like C. secundus, Z. angusticollis is a wood-dwelling termite that nests in a piece of wood that serves both as food and shelter (one-piece nester sensu27) without ever leaving the nest to forage outside. As typical for this life type, both species have a low level of social complexity and a linear development in which early larval instars develop into older larval instars that function as workers from which all reproductives develop (reviewed in28). Z. angusticollis larvae initiate labour, such as brood care, from the third instar onwards. Accordingly, individuals at the third instar or older are designated as workers. As in all termites, Z. angusticollis workers are always immatures; in wood-dwelling termite species with a linear development, they can transition into adults and with doing this they become reproductives. Given that these two species have a shared linear developmental pathway but belong to different families, we can test the existence of a shared genetic toolkit that characterizes queens, without confounding factors that might arise when caste development differs. Termites not belonging to the wood-dwelling life type have a bifurcated development; i.e., there is a split into two developmental lines, one leading to wingless individuals (apterous line, mainly workers and soldiers) and the other to winged reproductives (nymphal line)28,29,30. This bifurcation of development means changes in the underlying developmental program, which makes queen-worker comparisons complicated across species with different developmental trajectories.

To test whether we can detect a QCM signal in Z. angusticollis, we generated and compared transcriptomes for queens and workers. We also tested how workers differed in gene expression from early instar larvae, from which they developed, by comparing transcriptomes of early instar larvae (1st and 2nd instars, which do not provide labor, hereafter larvae) from worker instars (≥ 3rd instar larvae, which do perform labor in the nest, hereafter workers)31.

Transcriptomes were generated for two tissues: head plus prothorax (hereafter ‘head’ for simplicity) and abdomen without gut (hereafter ‘abdomen’ for simplicity). The former tissues correspond to those used in the C. secundus study21, including neuro-endocrine signals. Note, the tissue of juvenile hormone (JH) biosynthesis, the corpora allata that are located at the posterior end of the brain, can be lost during dissection, when using head only. Hence, we used head plus prothorax. The abdominal tissue extended the analyses to reveal, for example, stronger fecundity and fat body related signals.

Results

In total, we found 262 homologs of the 288 C. secundus QCM genes in the analysis for Z. angusticollis.

Gene expression patterns characterizing queens compared to workers

Heads

Differentially expressed genes (DEGs)

In heads, 479 genes were more highly expressed in workers than queens (Data S3) and they were mainly characterized by genes related to cuticle proteins and transcription factors. There were four homologs to genes of the C. secundus QCM (for simplicity, hereafter ‘QCM genes’) among them, and QCM genes were not enriched (Fisher’s exact test: P = 0.117) (Table S2: Enrichment results).

On the other hand, 251 DEGs were more highly expressed in queens compared to workers (Data S3), among them sixteen were QCM genes (Data S5). Thus, QCM genes were significantly enriched among the DEGs characterizing Z. angusticollis queens (Fisher’s exact test: P < 0.001) (Table S2: Enrichment results). Accordingly, many QCM genes, and associated TI-J-LiFe genes, characterized queen DEGs compared to workers. There was a strong fecundity signal with three fecundity-related termite Vgs (Fig. 1). In addition, the upper part of the IIS pathway seems to be upregulated in queens compared to workers, as an InR and three ILPs genes are more highly expressed in the former (Figs. 1, 2a). However, genes further downstream of the IIS pathway were not significantly higher expressed in queens compared to workers (Fig. 1). Interestingly, the longevity gene FOXO, which is typically negatively affected by active IIS signaling32,33, was upregulated in queens (Figs. 1, 2a).

Figure 1
figure 1

Results of the differential gene expression analyses for genes from the IIS and fecundity-related pathways. Shown are QCM genes. Each row represents a gene. Column 1 shows the results for the queen-worker comparison of head samples, and column 2 those of the abdomen samples. Column 3 and 4 represent the results for the worker-larvae comparisons for heads and abdomens, respectively. The color bar reflects the log2 fold change (LFC) in gene expression, with red indicating a higher expression of the first group compared to the second group (e.g., in column 1, a higher expression in queens than workers) and blue vice versa. The value in each cell shows the adjusted P value. Q queens, W workers, H heads, A abdomens, L larvae.

Figure 2
figure 2

Caste- and tissue-specific gene expression patterns with a focus on genes related to IIS, JH, and Vg/YP. Shown are expression patterns characterizing (a) queen head compared to worker head, (b) queen abdomen compared to worker abdomen, (c) worker head compared to larval head, and (d) worker abdomen compared to larval abdomen. Solid arrows denote activation, while stop bars indicate repression. Question marks highlight unusual gene expression patterns contradicting expectations. Red and orange colors signify upregulation, blue indicates downregulation, and golden color represents upregulation by trend. For details, see main text. Relationships between genes are drawn based on previous studies19,50.

The endocrine JH signal was ambiguous. A farnesol dehydrogenase, FOHSDR (1) (supposedly involved in JH biosynthesis34), the methyl farnesoate epoxidase (also known as CYP15A1_7) (catalysing the conversion of methyl farnesoate to juvenile hormone III in a cockroach35), and two takeout genes (typically encoding JH binding proteins) were higher expressed in queens than workers (Figs. 2a, 3). However, the early response gene of JH signaling Kr-h1 (Kruppel homolog 1), was not differentially expressed. Furthermore, as is typical for the QCM, several genes related to trehalose metabolism (e.g., three TRET genes) as well as many genes putatively involved in CHC biosynthesis (e.g., one desaturase, three elongases, and one fatty acyl-CoA reductase), were found among the queen DEGs (Figure S2, S3). In addition, some immune defense genes (lysozyme-related genes and toll-like receptor) were also upregulated in queens (Data S3).

Figure 3
figure 3

Results of the differentially expressed gene analyses on JH-related genes for the queen-worker comparison (column 1 for head tissue and column 2 for abdomen tissue) and for the worker-larvae comparison (column 3 for head tissue and column 4 for abdomen tissue). Shown are QCM genes. For more information, see Fig. 1. Q queens, W workers, H heads, A abdomens, L larvae.

Network analyses

The gene co-expression network analysis revealed nine modules that were positively associated with workers compared to queens (Data S6). Module “yellow4” contained four QCM homologs and it was enriched with QCM genes (Fisher’s exact test: P = 0.035). However, the four QCM homologues (Znev_02208, Znev_10922, Znev_12889, and Znev_18213) are not TI-J-LiFe genes.

Seven modules were significantly associated with queens compared to workers (Data S7). QCM genes were significantly enriched in the modules red, darkolivegreen, turquoise, and darkred (Table S3). Yet, other modules also contained QCM genes, although they were not significantly enriched (Tables S3). The module red can be functionally characterized as a fecundity-chemical communication module. It contained the three fecundity-related termite Vgs, one gene identified as insulin-like growth factor, five takeout genes, and several genes potentially related to CHC production and perception (desaturase, elongase, odorant receptor, and odorant-binding protein) (Data S7). The queen module turquoise comprised a striking combination of co-expressed genes. It included genes putatively related to JH biosynthesis (farnesol dehydrogenases, FOHSDR-l1-6 (1), FOHSDR-l1-6 (4), HMGS (1), Jheh1,2), several IIS genes (among them ILP1), and the anti-aging gene FOXO. This apparent concordant co-expression of some IIS genes with FOXO is striking. Typically, FOXO is inhibited by an upregulation of the IIS pathway in animals, and this negative association seems to be a major molecular cause of the common trade-off between longevity and fecundity32,33. A positive association between IIS and FOXO, as indicated by the queen module turquoise, might contribute to explaining the absence of this trade-off in queens. However, the phosphorylation status of FOXO needs to be studied to determine its functional activity. In addition, the turquoise module included several genes putatively related to CHC production (acsbg2, elongase, and fatty acyl-CoA reductase) and genes related to trehalose metabolism (TRET), characteristic for the QCM. It was the module most similar to the C. secundus QCM.

Abdomen

DEGs

In the abdomen, 2,715 DEGs were more highly expressed in workers than queens; many of them were cuticle proteins (Data S3). Surprisingly, among the DEGs, there were 77 QCM homologs, and they were enriched with QCM genes (Fisher’s exact test: P < 0.001) (Table S2).

2,653 DEGs were more highly expressed in queens than in workers (Data S3), among them 29 QCM genes. In contrast to the head, QCMs were less likely to occur among the queen DEGs in the abdomen than expected by chance (Fisher’s exact test: P = 0.001) (Table S2). This might be due to the use of a different tissue, as the original QCM was identified from the heads of C. secundus.

The importance of tissue specificity is supported by the gene-specific analyses concentrating on QCM and TI-J-LiFe genes (Figs. 1, 2b, 3). Only fecundity-related Vgs and ILP4 and ILP7 were consistently more highly expressed in queens than workers across body parts (Figs. 1, 2a,b). Many IIS pathway genes (e.g., InR2, chico, and Akt) as well as JH-related genes, including the JH-signaling gene Kr-h1, were more highly expressed in workers’ abdomens (Figs. 1, 2b, 3). This suggests that in the abdomen both pathways are downregulated in queens, in contrast to the head (Fig. 2a,b).

Network analyses

The network analysis revealed ten modules that were positively associated with workers (Data S8). Among them, four modules (midnightblue, brown, yellow, salmon) were enriched with QCM genes (statistical results see Table S4). These modules contained genes that are supposedly involved in JH synthesis (e.g., JH epoxidase, HMGR, FOHSDR-l1-6 (3)), JH signalling (Kr-h1, several takeout genes) and CHC production (elongases, desaturases) and perception (ORs).

Sixteen network modules were significantly positively associated with queens compared to workers (Data S9). None of these modules were enriched with QCM genes (Table S3). The module royalblue contained two fecundity-related Vgs and many genes related to histones and histone modification including Neofem9, which is a Histone H2A, identified to be queen-specifically expressed in the termite Cryptotermes cynocephalus36.

Gene expression patterns characterizing workers compared to larvae

Head

DEGs

In the heads, 34 DEGs were more highly expressed in larvae than workers. Among them, there was one QCM homolog and QCM genes were not enriched (Fisher’s exact test: P = 0.469). On the other hand, 160 DEGs were more highly expressed in workers than larvae. These worker DEGs contained 12 QCM homologs and were enriched with QCM genes (Fisher’s exact test: P < 0.001). This implies that workers are more similar to queens than larvae.

The similarity between queens and workers (compared to larvae) is reflected at a gene-specific level. Among the worker DEGs was one of the three fecundity-related vitellogenin, Vg2, (Figs. 1, 2c) as well as ILP1 and ILP4 and several takeout genes (Figs. 1, 2c, 3). However, there were no signs of differential expression between workers and larvae of other IIS or JH-related genes (Figs. 1, 2c, 3). Reflecting the QCM signal, some metabolic genes (two TRET genes, trehalose) and genes putatively related to chemical communication (delta(11)Desaturase, elongase, acsbg2; OBPs) were detected (Figure S2, Figure S3).

Network analyses

For larvae heads, the network analyses revealed two modules (Data S10). The module thistle3 contained two genes related to histone modification. The module green contained several gustatory and odorant receptors and many genes related to transcription regulation (e.g., RNA splicing). With four QCM homologs, it might be enriched with QCM genes (Fisher’s exact test: P = 0.050, Table S5). Yet, only one of the four QCM homologs is well annotated: regucalcin, which is involved in sexual reproduction and diapause in Drosophila but is also expressed in larval somatic tissues37,38,39,40.

Four modules characterized workers compared to larvae (Data S11), all were enriched for QCM genes (for statistical results, see Table S6). The module paleturquoise comprised co-expressed genes that reflected the similarity of workers with queens. It contained Vg2, ILP1 and ILP4, and some of the genes related to chemical communication. Although there was no apparent JH signal among the worker DEGs, the worker module turquoise was strongly related to JH biosynthesis and regulation. The co-occurrence of a geranyl diphosphate synthetase, three farnesol dehydrogenase-like genes (one of them is a homolog of FOHSDR-l1-6 (4) from C. secundus), and the methyl farnesoate epoxidase (also known as CYP15A1_7) implies that JH biosynthesis genes are co-expressed. Furthermore, a JH epoxide hydrolase and eight takeout genes were also in this module as well as several genes potentially linked to chemical communication (several ORs, fatty acyl-CoA reductase, desaturase, elongases).

Abdomen

DEGs

In the abdomen, 62 DEGs were more highly expressed in larvae than workers; there was no QCM homologs among these DEGs. QCM genes were not enriched (Fisher’s exact test: P = 0.632). These were characterized by genes that have been associated with development (sonic hedgehog, ecdysone induced 78C, abdominal B).

In comparison, 118 DEGs were more highly expressed in workers than larvae, and they were significantly enriched for QCM genes (Fisher’s exact test: P < 0.001). The signal was similar to that of the head in that it contained the same fecundity-related Vg2 (Figs. 1, 2d). However, in contrast to the head, two putative JH biosynthesis genes (the farnesol dehydrogenase FOHSDR (1), the methyl farnesoate epoxidase (also known as CYP15A1_7), and CYP305a1/ female JH epoxidase were higher expressed in workers than larvae. In addition, several takeout genes, were among the worker DEGs (Fig. 3).

Network analyses

For larvae abdomens, the network analysis revealed two modules (Data S12), each with one QCM homolog and no enrichment of QCM genes (Table S5). The module palevioletred2 contained several genes related to histones and growth. The module cyan contained many genes related to histones and transcriptional splicing.

For workers, one module, black (Data S13), was uncovered that was significantly associated with workers, and it was significantly enriched with QCM genes (Fisher’s exact test: P = 0.011). It contained some of the worker DEGs, including Vg2 and the methyl farnesoate epoxidase (CYP 15A1_7).

Discussion

Is there a QCM signal in Z. angusticollis?

QCM genes that characterized gene expression in the head (plus prothorax) of C. secundus queens were similarly active in the head (plus prothorax) of Z. angusticollis queens. This suggests that the QCM is a conserved toolkit that gives rise to the queen phenotype in these two species from different families with shared linear development. Furthermore, the comparison of workers and larvae revealed a QCM gene enrichment in the workers. As young larvae progress into older worker instars, from which reproductives eventually emerge, the recurring QCM signal suggests a gradual increase in the expression of QCM genes (i.e., genes typical of queens) during development. This would reflect the gradual development of a hemimetabolous insect, which differs fundamentally from the holometabolous social Hymenoptera, in which a major re-structuring occurs during the pupal stage. In line, signals of queen differentiation increase drastically when pupal metamorphosis starts in ants41. Future research can test the hypothesis that the molecular signatures distinguishing reproductives in termites become apparent gradually, at least in wood-dwelling termite species with a linear development. This is expected to differ in termite species with a bifurcated development of an apterous and a nymphal line, in which workers that develop along the apterous line lose the potency to become winged sexuals28,42.

Furthermore, the QCM signal appears to be consistent across sexes. Given the limitations of sex differentiation based on morphology, we sampled larvae and workers randomly, encompassing both sexes. Yet, when examining the PCA, the distinctions between sexes appeared negligible as all workers and all larvae formed cohesive clusters (Figure S4-S8).

Fecundity related genes: vitellogenins

The QCM signal is not driven by the three fecundity-related Vg genes. They were more highly expressed in the abdomen of queens than workers. Yet there was no enrichment for QCM genes in the queens’ abdomen; in fact, QCM genes were even less common than expected.

The Vg signal was present in both body parts, in the queen (compared to worker) DEGs (Figs. 1, 2a,b) and worker DEGs (compared to larvae) (Figs. 1, 2c,d). In queens, all three fecundity-related Vgs (Neofem3 (Vg1) (1), Neofem3 (Vg1) (2), and Vg2) were overexpressed, while only Vg2 was more highly expressed in workers (compared to larvae) (Fig. 2). This implies that the two Vg1 in Z. angusticollis, which seem to be the result of a gene duplication of Neofem3 of C. secundus43, are reproductive-specific. This is in line with results for C. secundus, in which Neofem3 is also only upregulated in reproductives and across all body parts44. Vg genes are mainly expressed in fat bodies, which occur across a termite’s body45. In the head of queens and in the non-reproducing workers, these three Vg genes may function as storage proteins and may have additional functions such as serving as anti-oxidants like in social Hymenoptera (e.g.,46). In the abdomen of queens, the concordantly high expression of the Vg receptor (Fig. 1) reflects their role as egg yolk precursors in egg production (e.g.,3).

In addition, there was a fourth Vg in Z. angusticollis, Vg. The expression of this gene in the abdomen seems to characterize immatures as it was overexpressed in workers compared to queens with no difference between workers and larvae (Fig. 1). Vg probably functions as a storage protein or has other non-reproductive functions as in social Hymenoptera (e.g.,46) and termites47.

Distinct body-part-specific gene expression across castes

There was a strong body-part specificity of QCM gene expression. A body-part specificity of gene expression is not surprising as different tissues have different functions. Yet, the specific pattern is insightful. Only in the head (plus prothorax) tissues were QCM genes enriched among queen DEGs (compared to workers), while they were less common than expected in the abdomen and enriched among worker DEGs (compared to queens). Focusing on the TI-J-LiFe network reveals that this pattern is largely due to a lower expression of genes from the JH- and IIS-pathways in the abdomen of queens (compared to workers) (Figs. 1, 2a,b, 3). It is not surprising to detect an upregulation of late JH biosynthesis genes (like methyl-farnesoate epoxidase; Fig. 3) in queens heads (plus prothorax) (compared to workers) but not in their abdomen. The production sites of JH, the corpora allata, are in the former and termite queens are characterized by high JH production48,49. However, it is striking that Kr-h1, the early response gene of JH signaling, is upregulated in workers’ abdomen compared to queens (Fig. 3), though we would expect it to be higher in queens if they have higher JH titers, or at least not different as Kr-h1 in the head it is not differentially expressed. Several IIS genes (ILP1235, p60(PiK21b)-2, PP2A-B’, PP2A-subC), including the IIS ‘exit’ gene Akt, show a similar pattern as Kr-h1, implicating a specific lower IIS activity in the queens’ abdomen (Fig. 2b). As reduced IIS signaling is associated with prolonged longevity in multiple organisms, like the fruit fly Drosophila melanogaster and the nematode Caenorhabditis elegans50, the abdomen-specific down-regulation in queens might contribute to the long lifespan of termite queens. This is supported by a recent experimental study, in which C. secundus queens that received a protein-enriched diet had increased survival rates compared to those that did not51. Associated with increased queen survival, a similar downregulation of IIS genes (including Akt) and Kr-h1 was observed in abdominal fat bodies, while they were unaffected in the head (plus prothorax)51.

Comparing workers and larvae, we did not see a similar body-part specific lower expression of IIS- and JH-genes (Figs. 1, 2c,d, 3). This may suggest a downregulation of IIS and JH activity in the abdomen after queen differentiation. As both of these pathways have been associated with aging in many animals50,52, this body-part specific expression may contribute to the high longevity of Z. angusticollis queens of around six years2.

Unfortunately, it is difficult to compare our results in detail with other transcriptome studies that included queens, as they compared queens across age classes- rather than with workers53,54,55 and / or they used whole body transcriptomes56,57. Whole body transcriptomes are problematic because tissue-specific signals may cancel out each other, as is shown in our study.

The QCM is specific for the head (plus prothorax) and failure to choose the appropriate tissues may explain why other social insects studied could not identify a QCM yet21. For example, using only the head can lead to a loss of the corpora allata, the gland of JH biosynthesis, as dissections in our laboratory have shown. At a very broad scale, there is an IIS (and ILP) signal in termite queens53,55,57, which aligns with our results and that of social insects in general15,17. However, such superficial comparisons provide little insight as this is the default to be expected for reproducing female insects. Only detailed studies, which distinguish at least between relevant body-parts and which analyze (relevant) pathways in detail, will help to figure out what makes termite queens special. Allometric tissue differences between castes (i.e., tissue differences that do not scale proportional with size) could bias gene expression differences. They might also partly influence the gene expression results of our study. Yet this effect should be minor, as obvious allometries are unknown for our species. The application of advanced technologies like single-cell sequencing could offer an ultimate solution, particularly as these technologies become more widely available.

Conclusion

We showed that the QCM that characterize C. secundus queens is also typical for Z. angusticollis queens. This implies a conserved genetic toolkit that gives rise to the queen phenotype across two termite families with a linear development. Furthermore, QCM genes seem to become increasing expressed during the development from larvae via workers to queens in head plus prothorax tissues. Based on these results, we hypothesize that a head-prothorax-specific QCM signal is shared, at least in termites of low social complexity characterized by totipotent workers and a linear development.

Surprisingly for the abdomen, QCM genes were enriched in workers compared to queens. This signal is largely driven by a high expression of JH- and IIS-related genes in workers. This result stresses the importance of tissue specificity to reveal the QCM signal. In addition, the tissue-specific high expression of JH- and IIS-genes in queens, limited to the head-prothorax tissue but not in the abdomen, might contribute to the long lifespan of Z. angusticollis queens, as a high expression of both pathways has been linked with aging in many animals.

Materials and methods

Termite collection and maintenance

Six mature Zootermopsis angusticollis colonies were collected from the Redwood East Bay Regional Park in Oakland, California. The colonies with their original wood/nest material were placed inside covered plastic tubs and flown to Northeastern University under an USDA permit (P526P-17-03817). The study was conducted in accordance with the Nagoya protocol and all authors followed ARRIVE guidelines. The colonies were kept in the dark at 23 to 25 °C. They were sprayed with water twice a week to maintain a 60% relative humidity. Birchwood was added as needed to provide termites with additional food/nesting resources.

Establishment of incipient colonies and maintenance

Z. angusticollis alates (winged individuals) were collected from mature colonies after molting. They were sexed under a dissecting microscope and paired inside plastic Petri dishes (60 mm diameter X 15 mm height) lined with moistened (300 µL sterile water) Whatman # 1 filter paper discs. Each pair also received ~ 2.5 mg of birch wood to build a copularium (i.e., mating chamber). Only heavily sclerotized alates with intact wings were used. This ensured that paired alates were similarly motivated (both physiologically and behaviorally) to mate58,59,60. These alates were virgins, as Z. angusticollis alates complete reproductive maturation only when separated from their parental nest61, and copulation takes place only after the pair has constructed a copularium62. The Petri dishes were stacked inside clear, covered plastic boxes lined with wet paper towels to maintain high humidity (~ 90% relative humidity). Water and birch wood chips were added as needed. We originally set up hundreds of incipient colonies. Yet, Z. angusticollis has a high failure rate during the early stages of colony foundation (~ 60% mortality58,63). Thus, six months post-pairing, we had a total of 42 intact incipient colonies, 25 headed by nestmate reproductives (i.e., originating from the same parental nest, considered inbred), and 17 headed by non-nestmate pairs (originating from different parental colonies, considered outbred). All colonies were newly established and had no soldiers. In the end, we used seven incipient colonies (2 headed by non-nestmates and 5 by nestmate reproductives) to generate transcriptome data (Individual details are listed in Data S1). As we did not see any obvious differences in the gene expression profiles between out- and inbred colonies, we combined both data sets (see Figure S4-S6).

Collection of termites and storage

Given that the incipient colonies were established on different days, we controlled for the age of the incipient colony by standardizing the number of days elapsed since pairing to about 180 days post-pairing. At this point, the incipient colonies comprised the queen, the king, and a variable number of eggs, larvae, and workers63. Z. angusticollis queens have a lifespan of around six years2.

The queens were cold immobilized and decapitated under a dissecting microscope (~ 40X). Each individual (head + prothorax and abdomen) was then placed inside a PCR tube containing 200 µL of cold RNAlater and immediately stored at 4 °C for 24 h and then frozen under − 80 °C. The other individuals were similarly treated (dissected, submerged in 200 µL RNAlater, and frozen), except that all individuals of the same instar/incipient colony were pooled in the same tube. In total, we obtained 58 individual samples. We selected 36 samples with the best RNA quality for sequencing, making sure that we had each caste per colony. Subsequently, the tubes were kept frozen in a − 80 °C freezer until shipped to Freiburg (Germany) on dry ice, where they were stored at − 20 °C until extraction.

Generation of transcriptomes

Total RNA was extracted from both tissues of single individuals (no pooling of samples) using a protocol optimized for termites as described elsewhere21. This protocol allowed us to obtain enough high-quality RNA for body parts of single individuals. In short, the tissue was homogenized with peqGOLD TriFast™ (Peqlab) for 2–3 min. Then, we added chloroform to separate the aqueous phase and subsequently, nuclease-free glycogen (5 mg/ml) and cold isopropanol (Ambion) to precipitate the total RNA. We then washed the pellet using 75% ethanol and centrifuged the samples for 5 min at 4 °C, 8500 rpm. The washing step was repeated three times. After washing, the pellet was dissolved in nuclease-free water and kept at 4 °C for at least 3 h. DNA digestion was done using DNase I recombinant (Roche) and EDTA (Sigma-Aldrich). Samples were stored at − 80 °C until sending them to BGI (Hong Kong) on dry ice. RNA quality assessment and library preparation were done by BGI with the TruSeq RNA Library Prep Kit v2 (Illumina). HiSeq Transcriptome Sequencing was done on an Illumina HiSeq Xten platform (150-bp paired-end reads), resulting in ~ 4 Gigabases of raw data and 30 to 50 million reads per sample.

Quality control, trimming, mapping

We checked the quality of the Illumina raw reads using FastQC v. 0.11.5 and trimmed reads with Trimmomatic v.0.39, removing adapter sequences and keeping only paired-end reads with a minimum length of 120 bp64,65. As there is no sequenced genome of Zootermopsis angusticollis available, we mapped the trimmed reads with hisat2 to the genome of the sister species Zootermopsis nevadensis, v. 2.243. While other strategies such as Trinity-based de novo transcriptome assembly could have been employed, this was not a good alternative in our study, given our focus on gene expression, rather than transcript variants and alternative splicing. A Trinity-based trial revealed 678,095 transcripts that, for instance, could not be unambiguously associated with the QCM genes. Therefore, we mapped the reads to the genome of the sister species. We discarded two samples (LG1OW1H and LG3YW1H) because of low mapping rates of 14.5% and 32.9%, respectively (Data S1). We counted the reads against the reference genome using HTSeq count with the mode “union.” In the end, we had a sample size of six for each investigated caste and tissue, except for the abdomen of workers and larvae, for which we had five replicates.

Annotation

To identify genes, we aligned all peptide sequences from the Z. nevadensis official gene set v.2.2 to the non-redundant protein database (obtained July 16, 2021) using NCBI Blast + v2.10.066. We set a minimum e-value of 1e-5. For genes where the first match was ‘hypothetical’ or ‘unknown,’ we kept the next match if it fulfilled the 1e-5 e-value cutoff (Data S2: gene annotation list). Using the annotation results, we further identified all TI-J-LiFe genes (Data S2: TI-J-LiFe genes). Additionally, we used InterProScan v. 5.53–87.067 applying the analyses Pfam, PANTHER, CDD, Gene3D, HAMAP, PIRSF, PRINTS, and SMART to obtain Gene Ontology (GO) annotation. We collected all GO terms for each gene.

Differential gene expression analysis

Differentially expressed genes (DEGs) were identified using DESeq2, v. 1.32,0, in R 4.1.168). Gene expression data were first normalized using the varianceStabilizingTransformation() function from DESeq2. The normalized data were used to perform DEG analysis using the DESeq() function. P values were calculated using the Wald test and corrected for multiple testing with the false discovery rate (FDR) approach69. We defined genes as DEGs if their corrected P values were smaller than 0.05.

DEG analyses were performed separately for both tissues (head or abdomen) and for the following comparisons: (i) queens versus workers (Data S3) and (ii) workers versus larvae (Data S4). We also did a Principal Component Analysis (PCA) using the 500 genes with the greatest variance, separately for the abdomen (Figure S4) and head (Figure S6), as well as for both tissues combined (Figure S7, S8).

Network analysis

Weighted gene co-expression analyses (WGCNA)70,71 were applied to identify networks of co-expressed genes (modules) that characterize the phenotypes (‘traits’ in WGCNA terms) of (i) queens compared to workers and (ii) workers compared to larvae. We did these analyses separately for the head and abdomen.

Assessing data quality for WGCNA

We first normalized the gene count data using the rlogTransformation() function from DESeq2 (version 1.32.0). To guarantee high data quality, we then performed hierarchical clustering to check for outliers using the function hclust() in the R package flashClust() (version 1.1.2). No outliers needed to be removed as samples of the same phenotype clustered together. Genes with more than 50% missing values and zero variance were removed iteratively (Table S1) using the goodSamplesGenes() from the WGCNA package (version 1.70.3).

WGCNA

Normalized gene counts of good quality were used as input to construct a signed adjacency matrix with the most suitable soft-threshold powers, estimated separately for each analysis (Table S1). Average linkage hierarchical clustering analyses were performed on an adjacency-based dissimilarity matrix using the hclust() function. Modules (minimum 30 genes) were detected using the cutreeDynamic() function from the package dynamicTreeCut (version 1.63.1). Eigengenes of the modules were determined using the moduleEigengenes() function. We then calculated module-trait (i.e., phenotype) associations using the cor() function. The asymptotic P values for the Student T-test of all module-trait associations were calculated using the corPvalueStudent() function. The hub gene of each module (i.e., the gene with the highest connectivity within the module) was identified with the chooseTopHubInEachModule() function. Gene–trait associations and their corresponding Student P values were calculated using the corAndPvalue function.

Gene ontology (GO) enrichment analysis

Using the GO annotations obtained from InterProScan, we tested whether GO terms were overrepresented in the genes that were highly expressed in (i) queens compared to workers, and (ii) workers compared to larvae. The GO enrichment analysis was performed for all DEGs and striking WGCNA modules. The background for the enrichment analysis was all genes with GO annotations from the whole genome. We applied Fisher's exact tests, as implemented in the R package TopGO v. 2.44.072, to determine the significance level. The results of GO analysis for DEGs are provided in the Supplementary materials.

QCM enrichment analysis

We tested whether the QCM genes reported in C. secundus21 were enriched among the DEGs and modules that characterized queens compared to workers (i.e., queen modules) and those that characterized workers compared to larvae. To obtain homologs of C. secundus genes in Z. nevadensis, we used all amino acid sequences from the C. secundus genome73 and performed a local BLAST search against all amino acid sequences from the Z. nevadensis genome (version 2.2: http://termitegenome.org/?q=consortium_datasets60), using BLASTP (version 2.11.0). We took the best hits (i.e., the hit with the lowest e-value and the highest bit score; at least e-value 1e-5). The BLAST search yielded 28,397 hits and 10,420 unique Z. nevadensis homologs, of which 262 were QCM homologs. We then used Fisher’s exact test to test whether the occurrence of QCM homologs in the DEGs/modules was significantly higher/lower than expected (i.e., higher/lower than the occurrence of QCM homologs among all expressed genes).