Evolutionary transition of doublesex regulation from sex-specific splicing to male-specific transcription in termites

The sex determination gene doublesex (dsx) encodes a transcription factor with two domains, oligomerization domain 1 (OD1) and OD2, and is present throughout insects. Sex-specific Dsx splicing isoforms regulate the transcription of target genes and trigger sex differentiation in all Holometabola examined to date. However, in some hemimetabolous insects, dsx is not spliced sexually and its sequence is less conserved. Here, to elucidate evolutionary changes in dsx in domain organisation and regulation in termites, we searched genome and/or transcriptome databases for the dsx OD1 and OD2 in seven termite species and their sister group (Cryptocercus woodroaches). Molecular phylogenetic and synteny analyses identified OD1 sequences of termites and C. punctulatus that clustered with dsx of Holometabola and regarded them as dsx orthologues. The Cryptocercus dsx orthologue containing OD2 was spliced sexually, as previously shown in other insects. However, OD2 was not found in all termite dsx orthologues. These orthologues were encoded by a single exon in three termites for which genome information is available; they were not alternatively spliced but transcribed in a male-specific manner in two examined species. Evolution of dsx regulation from sex-specific splicing to male-specific transcription may have occurred at an early stage of social evolution in termites.


Results
The search for doublesex orthologues in termites and a subsocial woodroach. We performed BLAST searches using the translated OD1 sequence of Blattella dsx (45 amino acids) as a query against genome and/or transcriptome databases of seven termite species and one Cryptocercus woodroach species, after which two to four OD1-containing sequences were hit in each species (Table 1). BLAST searches were then performed using the translated OD2 sequence of Blattella dsx (45 amino acids) as a query. A single sequence from the woodroach was hit (Cpun_comp8195_c0_seq1, Table 1), but no sequence was hit from any tested termites. Rapid amplification cDNA ends (RACE) PCR was performed using primers specific for Cpun_comp8195_c0_ seq1 (Table S1), and a single full-length transcript was obtained only from females. Then, using a reverse primer specific to the male-specific exon of Blattella dsx (located at terminal codon and 3′UTR, Table S1), another 3′ end of coding sequence was amplified from males. As these transcripts contained an OD1 upstream of OD2 (Fig. 1), it was designated as C. punctulatus dsx (Cpun_dsx). The amplified sequences downstream of OD2 were predicted to be the sex-specific exon(s) of Cpun_dsx. The determined nucleotide and putative amino acid sequences of the Table 1. Results of BLAST search for OD1 against the genomes and/or transcriptomes of seven termite species and their sister group (subsocial woodroach). Bold italic letters are gene names used in this study. a Previously described by Price et al. 6 . b Found by BLAST search using OD2 (but not OD1) sequences of Blattella dsx orthologues, followed by 5′ RACE. To examine the orthology of dsx in insects, we performed a phylogenetic analysis using OD1 nucleotide sequences searched in termites and a Cryptocercus woodroach, as well as those of dsx and Dmrt orthologues in other species. The phylogenetic analysis revealed that all dsx reported in holometabolous insects, the German cockroach, and a water flea belonged to a single clade (dsx clade) (Fig. 2). This monophyletic dsx clade also contained a single OD1 sequence from each of the four termites H. sjostedti (c35221 16 Table 1). We confirmed that the sequences containing these dsx-clade OD1 did not possess a OD2 sequence that was unlike Cpun_dsx. No dsx-clade OD1 sequence from Z. nevadensis was identified in this phylogenetic analysis. The other OD1 sequences from insects formed clades specific to the DMRT11, DMRT93, and DMRT99 orthologues (Fig. 2).
To confirm whether the dsx-clade sequences in termites were dsx orthologues, synteny analyses for dsx were conducted based on the proximity of the dsx and prospero (pros), which is well conserved among eight different holometabolous and hemimetabolous insect orders 9,23 . The gene encoding the dsx-clade sequence was located 83 kb, 29 kb, and 80 kb from the pros orthologues (RS012493 24 , GFG36637 22 , and MN005277 25 ) in scaffold 6 of R. speratus 20 , scaffold 506 of Coptotermes formosanus 22 and scaffold 295 of M. natalensis (Mnat_08109), respectively. Therefore, these dsx-clade genes were designated as the termite orthologues of dsx (Rspe_dsx, Cfor_dsx, and Mnat_dsx, Table 1). In C. secundus, both dsx-clade sequences, XM_023861307 and XM_023858380, and the pros orthologue (XM_023851644) were located in scaffold 829 (length: 88.5 kb), 635 (length: 1.3 Mb), and 511 (length: 2.2 Mb), respectively. Because the scaffold lengths (especially for scaffold 829) appeared insufficient for the synteny analysis based on proximity (17-245 kb 9 ), evidence of synteny in C. secundus could not be obtained. The synteny in H. sjostedti and N. takasagoensis could not be examined because their genome data are unavailable (as of 12 July 2021). Based on OD1 sequence similarity (Fig. 2), however, we designated these as dsx orthologues (hereafter referred to as Csec_dsx1, Csec_dsx2, Hsjo_dsx, and Ntak_dsx, Table 1).

Splicing patterns and expression of doublesex homologues in termites and a subsocial woodroach.
To confirm the sex-specific splicing of Cpun_dsx, expression levels of the predicted sexspecific splicing isoforms were quantified by RT-PCR using primers specific to the predicted isoforms ( Fig. 1 and Table S2). The predicted female-and male-specific isoforms was confirmed to be abundantly expressed in females (t = − 6.68, p < 0.01, generalised linear model [GLM] ) and males (t = 7.88, p < 0.01, GLM), respectively (Fig. 3), indicating that Cpun_dsx was spliced in a sex-specific manner, as shown in holometabolous insects and the German cockroach B. germanica 9,15 . Next, to obtain the splicing isoforms of Hsjo_dsx, Rspe_dsx, and Ntak_ dsx, we performed 3′-RACE PCRs using gene-specific primers (Table S2) and amplified only the single fragment downstream of OD1 for each species (Fig. 1). The amplified downstream sequences of each species (accession no. LC635717-9) were consistent with those obtained by the aforementioned BLAST search. Although dsx is composed of ca. five exons, of which two to three posterior exons are sexually spliced in almost all insects examined previously 26 , each of the predicted full-length transcripts of Rspe_dsx, Mnat_dsx, Csex_dsx1, and Csec_dsx2 was encoded by a single exon in their genome. Based on the transcriptome data in R. speratus 21 , Rspe_dsx was expressed only in male reproductives (primary kings), soldiers, and workers, but not in females, and in both The OD1 is shown as a filled box, whereas the sexually shared, female-specific, and male-specific OD2 is shown as a grey, red, and blue box, respectively. Numbers on the right of boxes are amino acid lengths of the translated dsx. Positions of primers for quantifying Cpun_dsx expression by quantitative RT-PCR are shown in arrows: "F", "Rf ", and "Rm" indicate the shared forward primer, reverse primer for the female-specific splicing variant, and reverse primer for the male-specific variant, respectively (  www.nature.com/scientificreports/ heads (t = 3.99, p < 0.01, GLM) and other body parts (referred to as just "body") (t = 10.97, p < 0.001, GLM) (Fig. 4A). Quantitative RT-PCR also showed male-specific expression patterns even in the nymphs (t = 5.69, p < 0.01, GLM, Fig. 4B) and eggs (t = 3.10, p < 0.05, GLM, Fig. 4C). In addition, Ntak_dsx was expressed only in males, regardless of caste (t = 3.70, p < 0.01, GLM, Fig. 4D). These results indicated that termite dsx was not alternatively spliced, but transcribed in a male-specific manner.
Predicted binding sites and expression patterns of putative regulatory factors for male-specific transcription of Rspe_dsx. To examine the proximate mechanism for male-specific transcription of termite dsx, we searched for putative transcription factors that bind to the promoter region of Rspe_dsx, and those that are expressed in a male-or female-specific manner. First, 1 kb upstream of the transcription start site of Rspe_dsx was extracted as a transcriptional regulatory region, as defined by Toyota et al. 27 . Next, de novo motif discovery was performed for the promoter using hypergeometric optimisation of motif enrichment (HOMER) v4.11 28 , based on the equipped motif library of known transcription factors in insects. Although 32 putative TFBSs were detected in the promoter, 20 homologues out of the 32 transcription factors existed in the Reticulitermes genome, of which five were duplicated in the genome (Table 2). Based on the transcriptome data in R. speratus 21 , however, none of them were expressed in a sex-specific manner in either heads or bodies, except for the vielfaltig (vfl) orthologue (GLM, Table 2). The expression levels of the vfl orthologue were significantly different between sexes only in the bodies of reproductives, but similar patterns were not observed in other castes (Fig. S1).

Discussion
The dsx orthologue of the subsocial Cryptocercus woodroach was spliced in a sex-specific manner, similar to that of the gregarious Blattella cockroach and holometabolous insects (Fig. 5). Surprisingly, however, termite dsx homologues did not exhibit sex-specific splicing, and their transcription was regulated in a male-specific manner.
Male-or female-biased expression of dsx has been reported in at least 8 species of non-insect arthropods, including water fleas 5,27 , red claw crayfish 29 , Chinese shrimp 30 (Crustacea), and mite 7 (Chelicerata), suggesting that sex-biased dsx transcriptions are an ancestral state in Arthropoda 9 . In contrast, dsx produces sex-specific splice isoforms in major holometabolous insect orders (Coleoptera, Hymenoptera, Lepidoptera, and Diptera) and those hemimetabolous insects examined previously 6,9,26 except for the louse Pediculus humanus and the hemipteran Bemisia tabaci. These two hemimetabolous insects have dsx homologues with non-sex-specific isoforms 8,9 . Consequently, the sex-specific alternative splicing of dsx would have been acquired early in the evolution of insects, and secondarily lost in some hemimetabolous insects 9 , including termites. Within the cockroach and termite clade, after ancestral termites diverged from the common ancestor of subsocial Cryptocercus and eusocial termites (around 140.6 MYA, ranging from 112.6 to 170.5 MYA 31,32 ) the data suggest that sex-specific dsx splicing was lost, and the male-specific dsx transcription was alternatively acquired in eusocial termites (Fig. 5). It remains unclear whether the dsx orthologue was actually lost in the genome of Z. nevadensis. However, it is interesting to note that Z. nevadensis has a genome half the size of other termites (562 Mb) 19 , suggesting the possibility that some genes may be missing. A further exhaustive search is needed. www.nature.com/scientificreports/ Although dsx consists of approximately five exons in 145 species belonging to 39 holometabolous insect families 26 , two dsx paralogs (dsx1 and dsx2) in water fleas consist of four and two exons, respectively 5 . Its homologue in Chinese shrimp consists of two exons 30 , and those in mites 7 and termites consist of only a single exon (Fig. 5). Moreover, the present study showed no homologous OD2 in termite dsx (Fig. 5), suggesting that OD2 has indeed been lost or has diverged beyond our ability to detect it 6 . OD2 is conserved in dsx orthologues with sex-specific splicing in holometabolous insects, as well as in those with sex-specific transcription in water fleas 5,27 and Chinese shrimp 30 but not in red claw crayfish 29 . Both OD1 and OD2 are related to binding target DNA as a dimeric DNA binding unit in the fruit fly D. melanogaster: OD1 enables Dsx to bind target DNA and dimerizes weakly in a DNA-dependent manner 12 , whereas OD2 is necessary for dimerization and enhances DNA recognition of OD1 33 . In addition, fly OD2 contains sex-specific spliced sequences, which may be involved in the formation of sex-specific units for transcriptional regulation of downstream target genes through either its

(C) Relative
Rspe_dsx expression in eggs produced asexually and sexually. The former are fated to develop into females, whereas the sexual fate of the latter was not determined. Total RNA was extracted from 30 eggs, and three RNA samples were derived from different colonies. (D) Ntak_dsx expression levels relative to those of NADH in each sex and in each caste of N. takasagoensis. All castes examined were collected from a single colony (i.e., siblings). In (A) and (D), the effects of sex, caste, and their interaction on gene expression levels were evaluated using GLM followed by multiple comparison using linear hypothesis testing with Tukey adjustment in the "multcomp" R package. Different letters indicate significant differences according to multiple comparison (p < 0.05). In (B) and (C), relative expression levels of Rspe_dsx were compared between sexes using GLM analyses. ** and *** indicates p < 0.01 and 0.001 in GLM, respectively. www.nature.com/scientificreports/ sex-specific interactions with the transcriptional machinery or its sex-specific DNA binding 12 . The decreasing numbers of dsx exons in some arthropods and termites and the loss (or divergence) of OD2 in termites would be associated with the evolution of dsx regulation, although it remains unknown whether such evolutionary changes in the domain and exon organisation were the cause or consequence of the loss of sex-specific splicing. Future studies that aim to characterize the dsx orthologue in Mastotermes darwiniensis, the earliest branching termite lineage, might help clarify the issue.  Figure 5. Evolutionary changes in dsx regulation associated with social evolution in termites. The presence of OD1 and OD2, numbers of exons containing coding sequences ( † ), and regulatory manner of dsx homologues were compared among the German cockroach, the subsocial woodroach, and seven termite species. "-" and "?" mean "not detected" and "unknown", respectively. *Data on B. germanica were cited from Wexler et al. 9 . The tree topology was based on that of Bourguignon et al. 57  www.nature.com/scientificreports/ The sex determination cascade has never been examined in termites. Almost all orthologues of sex determination genes reported in holometabolous insects (e.g., transformer, transformer-2, and Sex-lethal) are conserved in the genome of R. speratus 24 . However, these orthologues encode splicing factors and are unlikely to directly regulate male-specific dsx transcription. Male-specific transcription should be regulated by a transcription factor expressed in a sex-specific manner in every caste, as shown in termite dsx. Although we found 27 putative TFBSs in the Rspe_dsx promoter region, no orthologues of 27 transcription factors showed sex-specific expression patterns similar to those of Rspe_dsx. However, our motif search was based on the binding motifs found in the fruit fly D. melanogaster. In addition, unidentified binding motifs could be present out of the transcriptional regulatory region examined in this study. Further investigations are needed to determine whether the unidentified transcription factors expressed in a sex-specific manner would regulate male-specific dsx transcription in termites.
Although dsx was sexually spliced in gregarious Blattella and subsocial Cryptocercus cockroaches, a single dsx transcript was expressed only in the male of eusocial lower and higher termites. Given that cooperative brood care by parents and a generational overlap was likely acquired in the last common ancestor of Cryptocercus woodroach and termites 32 , the transition in dsx regulation in termites is associated with the evolution of eusociality, especially the acquisition of reproductive division of labor (Fig. 5). Such association between dsx expression and social traits also has been suggested in ants. The dsx orthologues were sexually spliced in all 5 ants examined, whereas their dsx was expressed in a male-biased manner in 3 species with an environmental caste determination system, but not in 2 species with a genetic caste determination system 34 . Additionally, the OD2 was present in 2 basal ant species, but absent in 17 ants belonging to the subfamilies with more advanced social traits 34 .
The male-specific transcription of termite dsx would play a role in the regulation of reproductive division of labour, particularly in soldiers, the first sterile caste that evolved in Isoptera 35 . Some species, particularly in the crown termite family Termitidae, have sex-specific or sex-biased soldier ratios and soldier differentiation pathways 36 . For example, all soldiers are females in most species of the subfamilies Termitinae and Macrotermitinae, whereas all soldiers are males in most of the examined species in Nasutitermitinae. Soldier differentiation requires high juvenile hormone titres in workers, and the strongly biased soldier-sex ratio may be rooted in differences in juvenile hormone titres (and probably related gene expression levels) between male and female workers 37,38 . Reproductive caste differentiation is also regulated in a sex-specific manner (reviewed in Oguchi et al. 39 ). "Queen genes" with high expression levels in female reproductives have been identified in some species (reviewed in Korb 40 ), and in male reproductives upregulated genes likely associated with male fertility have been identified in species with available genome sequences 15,19,24 . Termite dsx could regulate this sexually dimorphic gene expression by upregulating or downregulating the expression of its target genes in a male-specific manner, resulting in reproductive and non-reproductive division of labour.

Materials and methods
Insects. Seven  BLAST searches for dsx and dmrt homologues in termites and cockroaches. tBlastX searches were performed against genome and transcriptome databases of six termite species and one Cryptocercus woodroach species (Table 1), using SequenceServer 42 . Either the OD1 (45 amino acids) or OD2 (45 amino acids) sequences of B. germanica dsx were used as queries. Sequences with an E value less than 10 -5 were selected as candidates for dsx orthologues and used for phylogenetic analysis.

Construction of phylogenetic tree of OD1 sequences and identification of dsx orthologues.
To identify the dsx orthologue, a phylogenetic tree of OD1 sequences (135 bp with no gaps) was constructed according to previous studies 5, 43,44 . We used 84 OD1-containing genes (Table S3 'OTU'). The DMRT7 genes of vertebrates (mouse and cattle) were used as the outgroups. Phylogenetic relationships were inferred using Bayesian inference (BI) and maximum likelihood (ML) methods. For BI, the most appropriate model of sequence evolution was determined using the model selection option implemented in MEGA version 7.0.21 45 , and the GTR + G model was selected. Using MrBayes version 3.2.6 46 , a total of 100,000 trees were obtained (ngen = 10,000,000, samplefreq = 100). The first 25% of these (25,000) were discarded as burn-ins, and a 50%-majority-rule consensus tree was produced. For ML, 1000 bootstrap replicates were performed based on the same model of sequence evolution as BI in MEGA 7.0.21, with the default tree inference options. Synteny analysis. According to Wexler et al. 9 , the synteny of dsx was examined based on the proximity to the transcription factor prospero (pros). The dsx locates at the genomic position close to pros, with an intervening distance between 17 and 245 kb in eight different insect orders (Blattodea, Orthoptera, Ephemeroptera, Phthiraptera, Thysanoptera, Hymenoptera, Coleoptera, and Lepidoptera) 9,23 . To identify the pros ortholog in each termite, tBlastX searches were performed against gene models of R. speratus 24 48 , incipient colonies were established using unrelated female and male alates. A total of 30 eggs was obtained, and 3 primary queens and 3 primary kings were randomly selected from 20 colonies at 1.5 months after colony establishment. Total RNA of all the eggs was extracted and treated with DNase I using the ReliaPrep RNA Tissue Miniprep System (Promega, USA). Total RNA of the primary queens or kings was extracted from the body (except for the head parts, three individuals of each sex) using ISOGEN II (Nippongene, Japan). Total RNA of female or male alates of N. takasagoensis (using the colony collected in 2017) was extracted from the body (except for the head parts, five individuals in each sex) using ISOGEN II. Total RNA from the secondary queen or king of H. sjostedti (using the colony collected in 2015) was extracted from the body (except for the head part, one individual in each sex) using ISOGEN II. An ovary and testis of C. punctulatus were dissected from a female adult and male adult, respectively. Total RNA was extracted from each of the gonads using ISOGEN I and II, and then treated with RNase-free DNase I (Takara, Japan). The purity and quantity of the extracted RNA were measured using a NanoVue spectrophotometer (GE Healthcare Bio-Sciences, Japan). The 5′ and 3′ ends of Cpun_dsx were amplified using the SMARTer RACE 5′/3′kit (TaKaRa, Shiga, Japan) and gene-specific primers designed for OD2 ("Cpun_dsx OD2 5'RACE" and "Cpun_dsx OD2 3'RACE", Table S1). Its male-specific exon was amplified using Advantage ® 2 Polymerase Mix (TaKaRa), the gene-specific primer "Cpun_dsx OD2 3'RACE", and a reverse primer in the male-specific exon of Blattella dsx, which was located at terminal codon and 3′UTR ("Cpun_dsx male-specific exon-R", Table S1). The 3′ ends of Rspe_dsx, Hsjo_dsx, and Ntak_dsx were also amplified using the SMARTer RACE 5′/3′ Kit and newly designed gene-specific primers (Table S1). The amplified fragments were subcloned into the pGEM-T easy vector system (Promega), and the nucleotide sequences of each fragment were determined using ABI Prism Big Dye Terminator v3.1 Cycle Sequencing Kit in conjunction with a 3500 Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). The newly identified sequences of Rspe_dsx, Ntak_dsx, and Hsjo_dsx were deposited in the GenBank/EMBL/DDBJ database under accession numbers: LC635717, LC635718, and LC635719 (Table 1).

RNA-seq analysis in Reticulitermes speratus.
RNA-seq data of R. speratus were used to compare the expression levels of Rspe_dsx (Table 1) among three castes (workers, soldiers, and reproductives), two body parts (heads and the remaining parts), and two sexes (biological triplicates; NCBI BioProject Accession No. PRJDB5589 21 ). The filtered RNA-seq reads were mapped onto their genome assembly 20 using TopHat v2.1.021. Transcript abundances were then estimated using the featureCounts program of the Subread package 49 . To compare gene expression levels among castes and between sexes, first, counts per million (CPM) were calculated from the estimated transcript abundances. Genes with at least CPM of 1 in at least three samples were kept for subsequent analyses. CPM values were then normalized with the trimmed mean of M-values (TMM) algorithm in edgeR 50 . Differential gene expression analyses were performed separately for each body part using a GLM with two factors, namely, caste and sex implemented in edgeR, and then, genes with a false discovery rate (FDR) < 0.01 were identified as genes expressed in a sex-specific manner. RPKM (Reads Per Kilobase Million) values were calculated by dividing the CPM values by the length of the genes in kilobases.
Real-time quantitative PCR. Gene-specific primers were designed against Rspe_dsx, Ntak_dsx, and Cpun_dsx using Primer3Plus 51 for real-time quantitative PCR (Table S2). Total RNA of female and male nymphs of R. speratus (from the colony collected in 2019) was individually extracted from whole bodies (10 individuals in each sex) using ISOGEN II. Total RNA of male (minor and medium) workers, female (medium and major) workers, and male soldiers of N. takasagoensis (from the colony collected in 2018) was extracted from the whole body of five separate individuals using ISOGEN II. Total RNAs derived from the gonads of adult woodroaches were also used. DNase treatment was performed using the same method described above. According to previous studies 47,48 , incipient colonies (queen-king and two-queen colonies) of R. speratus were established using alates derived from three colonies collected in 2019. Eggs produced sexually in queen-king colonies and asexually in queen-queen colonies develop into both sexes and only females, respectively. Total RNA was extracted from 30 sexual and 30 asexual eggs (triplicated) and treated with DNase I using ReliaPrep RNA Tissue Miniprep System (Promega). cDNAs were synthesised from the purified RNA using a High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems). Biological replications (n = 3-5) were prepared for each category. Expression analyses were performed using Thunderbird SYBR qPCR Mix (Toyobo, Japan) with a MiniOpticon Real-time PCR system (Bio-Rad, Japan) and an Applied Biosystems 7500 Fast Real-Time PCR System (Applied Biosystems).
The expression levels were statistically analysed using GLMs with gamma error distribution and log link function. The fixed effects were sex, caste, and their interaction. Multiple comparisons were performed using general linear hypothesis testing (glht function) with Tukey adjustment in the "multcomp" R package 55 . These analyses were conducted using R ver. 4.0.3 (available at http:// cran.r-proje ct. org/). www.nature.com/scientificreports/ Search for transcription factor binding motif. The Reticulitermes genome was loaded into the database of HOMER v4.11 using the "loagGenome.pl" program. The "findMotifsGenome.pl" program was executed to search for Drosophila motif collections using the "-mcheck" option against the promoter of Rspe_dsx, which was the 1.0 kbp upstream region from its transcription start site. The nucleotide sequences of Drosophila transcription factors that were detected in their binding sites were subjected to a BlastX search for orthologues against the protein database of R. speratus 41 using BLAST + (2.10.1) 56 . The expression levels of these orthologues were extracted from their transcriptome data 21 and statistically analysed using GLMs with gamma error distribution and log link function, as mentioned above.