Abstract
Sex-limited morphs can provide profound insights into the evolution and genomic architecture of complex phenotypes. Inter-sexual mimicry is one particular type of sex-limited polymorphism in which a novel morph resembles the opposite sex. While inter-sexual mimics are known in both sexes and a diverse range of animals, their evolutionary origin is poorly understood. Here, we investigated the genomic basis of female-limited morphs and male mimicry in the common bluetail damselfly. Differential gene expression between morphs has been documented in damselflies, but no causal locus has been previously identified. We found that male mimicry originated in an ancestrally sexually dimorphic lineage in association with multiple structural changes, probably driven by transposable element activity. These changes resulted in ~900 kb of novel genomic content that is partly shared by male mimics in a close relative, indicating that male mimicry is a trans-species polymorphism. More recently, a third morph originated following the translocation of part of the male-mimicry sequence into a genomic position ~3.5 mb apart. We provide evidence of balancing selection maintaining male mimicry, in line with previous field population studies. Our results underscore how structural variants affecting a handful of potentially regulatory genes and morph-specific genes can give rise to novel and complex phenotypic polymorphisms.
Similar content being viewed by others
Main
Sexual dimorphism is one of the most fascinating forms of intra-specific phenotypic variation in animals. Sexes often differ in size and colour, as well as the presence of elaborated ornaments and weaponry. Theoretical and empirical studies over many decades have developed a detailed framework of sexual selection and sexual conflict, explaining why these differences arise and how they become encoded in sex differentiation systems1,2,3. However, a growing number of examples of inter-sexual mimicry4,5,6,7 suggest that sexual dimorphism can be evolutionarily fragile and quite dynamic. Inter-sexual mimicry has evolved in several lineages, when individuals of one sex gain a fitness advantage, usually frequency- or density-dependent, due to their resemblance to the opposite sex. For example, males who mimic females, as seen in the ruff (Calidris pugnax) and the Melanzona guppy (Poecilia parae), forgo courtship and ‘sneak’ copulations from dominant males4,5, while females who mimic males, in damselflies and hummingbirds, avoid excessive male-mating harassment6,8. Inter-sexual mimicry thus requires the evolution of a novel sex-mimicking morph in a sexually dimorphic ancestor. The occurrence of inter-sexual mimicry may be a intermediate step in the evolution of sexual monomorphism, it may be an ephemeral state or it may be maintained as a stable polymorphism. In any case, sexual mimics harbour genetic changes that attenuate or prevent the development of sex-specific phenotypes, and can therefore provide insights into the essential building blocks of sexual dimorphism9.
Considerable research effort has been devoted to uncover the genetic basis of discrete phenotypic polymorphisms, such as those associated with alternative reproductive or life-history strategies10,11,12,13,14. Together, these studies highlight a vast diversity of mechanisms used by evolution to package complex phenotypic differences into a single locus that is protected from the eroding effects of recombination. At one extreme, phenotypic morphs may evolve via massive insertions, deletions or inversions that lock together dozens to hundreds of genes into supergenes15,16,17. At the other end, much smaller structural variants (SVs), confined to a few thousand base pairs, can modulate the expression of one or a few regulators of pleiotropic networks, resulting in markedly different morphs11,12,18. We are clearly only starting to get a glimpse of the major themes among these genetic mechanisms. For example, it is not known whether genomic architecture determines the type and breadth of co-varying traits or the likelihood of polymorphisms evolving in specific lineages19.
A few of these studies have focused on sex-limited polymorphisms, where one of the morphs shares the overall appearance, such as the colour pattern, of the opposite sex10,14,20. Such sex-limited morphs may illustrate novel origins of sexual dimorphism, driven by either sexual selection in males14 or natural selection in females18,21. Alternatively, sex-limited polymorphisms may arise with the evolution of inter-sexual mimicry. Crucially, empirical support for the evolution of inter-sexual mimicry demands both a macroevolutionary context for the polymorphism, showing that sexually dimorphism is ancestral, and a documented advantage of sexual mimics in at least some social contexts. There is therefore a need to integrate genomic, microevolutionary and phylogenetic evidence into our understanding of the evolutionary dynamics of sexual dimorphism and inter-sexual mimicry. This integrative approach has been overall rare, and applied mostly to the study of alternative male reproductive strategies18,22. Yet, female mimicry of males may be more common than historically appreciated23, and the genetic basis of such mimicry remains largely unexplored24,25,26.
The common bluetail damselfly Ischnura elegans (Odonata) has three female-limited morphs (namely O, A and I) that differ in colouration, whereas males are always monomorphic27. O females display the colour pattern and developmental colour changes inferred as ancestral in a comparative analysis of the genus Ischnura28 (Fig. 1). Male-like (A) females are considered male mimics, who experience a frequency-dependent advantage of reduced male mating and premating harassment due to their resemblance to males6. Finally, the I morph shares its stripe pattern and immature colouration with the A morph27 (Fig. 1), but develops a yellow-brown background colouration with age, eventually resembling the O morph upon sexual maturation29. I females are only known in I. elegans and a few close relatives28 (Fig. 1), and their evolutionary relationship to A and O females remains unresolved. The behaviour, ecology and population biology of I. elegans have been intensely investigated for over two decades, making it one of the best understood female-limited polymorphisms, in terms of how morphs differ in fitness-related traits and how alternative morphs are maintained sympatrically over long periods30,31,32,33. Nonetheless, the molecular basis of this polymorphism remains unknown.
To advance our understanding of the evolution of complex phenotypes, such as sexual dimorphism and sex-specific morphs, we identify the genomic region responsible for the female-limited colour polymorphism in I. elegans. Using a combination of reference-based and reference-free genome-wide association studies (GWAS), upon morph-specific genome assemblies, we revealed two novel regions adding up to ~900 kb that are associated with the evolutionary origin of the male-mimicking A morph. These SVs, probably generated and expanded by transposable element (TE) activity, are partly shared by male-mimicking females of the tropical bluetail damselfly (Ischnura senegalensis), indicating that male mimicry is a trans-species polymorphism. We also show that the novel I morph evolved via an ectopic recombination event, where part of the A-unique genomic content was translocated into an O genomic background. Finally, we examined the evolutionary dynamics of the colour morph locus and explored expression patterns of genes located in this region. Together, our results indicate that structural variation affecting a handful of genes and maintained by balancing selection provides the raw material for the evolution of a male-mimicking phenotype in pond damselflies.
Results
Male mimicry is encoded by a locus with a signature of balancing selection
We started by conducting three reference-based GWAS, comparing all morphs against each other in a pairwise fashion (Extended Data Fig. 1). We used an A morph genome assembly (Supplementary Text 1) as mapping reference because SV analyses revealed that A females harbour genomic content that is absent in the other two morphs (see ‘Female morphs differ in genomic content’). The draft assembly was scaffolded against the Darwin Tree of Life (DToL) reference genome to place the contigs in a chromosome-level framework34. The DToL reference genome contains the O allele (see Supplementary Text 2) and is assembled with chromosome resolution, except for chromosome 13, which is fragmented and consists of one main and several unlocalized scaffolds.
All pairwise GWAS between morphs pointed to one and the same unlocalized scaffold of chromosome 13 as the causal morph locus (Fig. 2a). Closer examination of this scaffold revealed two windows of elevated divergence between morphs (Fig. 2b). First, a narrow region near the start of the scaffold (~50 kb–0.2 mb) captures highly significant single nucleotide polymorphisms (SNPs) in both A versus O and I versus O comparisons (Fig. 2b). Thereafter, and up to ~1.5 mb, an abundance of SNPs differentiates A females from both O and I females, especially between ~0.6 and ~1.0 mb (Fig. 2b). These results are mirrored by genetic differentiation (FST) values across both regions (Fig. 2c).
Next, we investigated whether the morph locus carries a signature of balancing selection, as suggested by previous field studies of morph-frequency dynamics31. The larger genomic window that uniquely distinguishes A females from both I and O females displays a signature of balancing selection, indicated by highly positive values of Tajima’s D, exceeding the 95th percentile of genome-wide estimates (Fig. 2d). Conversely, values of both Tajimas’s D and nucleotide diversity (π) in the narrower window that differentiates O females from both A and I females (~50 kb−0.2 mb) fall within the 95th percentile of genome-wide estimates (Fig. 2d–e).
Female morphs differ in genomic content
Previous studies have found that complex phenotypic polymorphisms are often underpinned by SVs, arising from genomic rearrangements such as insertions, deletions and inversions10,13,15,20. As these variants can be difficult to detect in a reference-based analysis, we employed a k-mer based GWAS approach35 (Extended Data Fig. 1), which enables reference-free identification of genomic divergence between morphs. Significant k-mers in these analyses could represent regions that are present in one morph and absent in the other (that is, insertions or deletions), or regions that are highly divergent in their sequence (as in a traditional GWAS).
First, we investigated the divergence associated with the male-mimicking A morph. Pairwise analyses revealed 568,039 and 508,031 k-mers (length = 31 bp) significantly associated with the A versus O and A versus I comparisons, respectively. To determine whether the associated k-mers represent differences in genomic content or sequence between the morphs, we mapped these k-mers to morph-specific reference genomes. If the associated k-mers are owing to novel sequences found in one morph but not the other, we would expect a vast majority of the significant k-mers to be found in only one of the two morphs in a pairwise comparison. If the significant k-mers are instead owing to point mutations in high-identity sequences, there should be morph-specific k-mers in both morphs.
Most (>98%) of the mapped k-mers in the A versus O and A versus I comparisons aligned perfectly to a single ~1.5 mb region of the unlocalized scaffold 2 of chromosome 13, in the A-morph assembly (Fig. 3a and Extended Data Table 1). This is the same region of the A-morph assembly that was previously identified in the standard GWAS (Fig. 2). In contrast, only ~0.3% of the associated k-mers in the A versus O comparison were found anywhere in the O assembly and, similarly, only ~0.2% of the significant k-mers in the A versus I analysis mapped to the I assembly (Extended Data Table 1). These results thus suggested that a large region of genomic content is unique to the A haplotype.
Given that A and I females share their immature colour pattern29,36, we then tested for k-mer associations that would distinguish both A and I females from O females and found 85,134 such k-mers (Extended Data Table 1). When mapped to the A assembly, a majority of these k-mers were found near the start of the unlocalized scaffold 2 of chromosome 13 (Fig. 3a), where we previously reported pronounced divergence of O females (Fig. 2b,c). However, when mapped to the I assembly, most of the significant k-mers were found in a different region of the same scaffold, separated by approximately 3.5 mb (Fig. 3b). These results thus suggested that A and I females share genomic content that is absent in O. However, in the I haplotype this content occupies a different chromosomal location.
To further investigate the distribution of genomic content among morphs, we plotted the standardized number of mapped reads (read depths) along the ~1.5 mb region of the A assembly that included most of the significant k-mers (Extended Data Fig. 1). Here, we expected read depth values around 0.5 (heterozygous) or 1.0 (homozygous) for all A samples, whereas I and O samples should have read depths of 0, if genomic content is uniquely present in the A allele (because I and O individuals lack the A allele; Fig. 1). Read depths confirmed that male-mimicking A females are differentiated by genomic content. Specifically, there are two windows of the A assembly (of ~400 kb and ~500 kb) where no I or O data maps to the assembly after filtering repetitive sequences (Fig. 3c), and that are therefore uniquely present in A females. These two windows of A-specific content are separated by a region between ~0.6 and ~1.0 mb that is shared among all morphs (Fig. 3c), and highly divergent in SNP-based comparisons involving the A morph (Fig. 2b). Finally, the region including most significant k-mers in the A and I versus O comparison is present in all A and I samples but absent in all O samples, except for one individual (Fig. 3c and Supplementary Text 3). As noted in the k-mer GWAS, this region of genomic content shared by A and I individuals is located in different regions, separated by ~3.5 mb, in the two assemblies (Fig. 3d).
By combining reference-based GWAS, reference-free GWAS and read-depth approaches, we have identified three haplotypes controlling morph development in the common bluetail. The A and I haplotypes share ~150 kb that are absent in O. The A haplotype has two additional windows of unique genomic content, adding up to ~900 kb. In the A haplotype, a single ~1.5 mb window (hereafter the morph locus) thus contains the regions of unique genomic content, the region exclusively shared between A and I, and the SNP-rich region present in all morphs. In the I haplotype the region exclusively shared with A occupies a single and different locus separated by about 3.5 mb (Fig. 4a). These large and compounded differences in genomic content between haplotypes suggest that multiple structural changes on a multi-million base-pair region were responsible for the evolution of novel female morphs in Ischnura damselflies.
TE propagation and recombination probably explain the origins of novel female morphs
Based on previous inferences of the historical order in which female morphs evolved (Fig. 1), we hypothesized that genomic divergence first occurred between O and A females, with some genomic content being then translocated from A into an O background, leading to the evolutionary origin of I females. We analysed SVs between morphs to test this hypothesis (Extended Data Fig. 1 and Supplementary Text 4) and uncovered evidence of a ~20 kb sequence in the O haplotype that is duplicated and inverted in tandem in derived morphs (A and I; Fig. 4b and Extended Data Fig. 2). An investigation of the reads mapping to the inversion breakpoints suggested that additional duplications in the A genome, presumably via TE proliferation, may be related to the evolution of inter-sexual mimicry (Fig. 4b and Extended Data Fig. 3). Interestingly, TE content is enriched and recombination is reduced not just in the vicinity of the morph locus, but across the entire chromosome 13 (Extended Data Figs. 4 and 5, and Supplementary Text 4). Finally, evidence of a translocation of an A-derived genomic region back into an O background (Extended Data Fig. 6 and Supplementary Text 4) implied that the I morph evolved from an ectopic recombination event between A and O morphs (Fig. 4b). This scenario is also consistent with our previous k-mer GWAS and read-depth results, where we found that the only region differentiating both A and I females from O females is located ~3.5 mb in the I haplotype.
Male mimicry is a trans-species polymorphism
Ancestral state reconstruction of female colour states had previously pointed to an ancient origin of male mimicry in the clade that includes I. elegans and several other widely distributed Ischnura damselflies28 (Fig. 1). We investigated whether male mimicry is in fact a trans-species polymorphism using de novo genome assemblies from the closely related tropical bluetail (I. senegalensis; Extended Data Fig. 1). I. senegalensis shares a common ancestor with I. elegans about 5 Myr ago28, and has both a male-mimicking A morph and a non-mimicking morph, which resembles the O females of I. elegans28,37 (Fig. 5a).
We reasoned that if morph divergence is ancestral, the genomic content that is uniquely present in A females or shared by A and I females in I. elegans should be at least partly present in A females of I. senegalensis, but absent in the alternative O-like female morph (Supplementary Text 5). This prediction was supported by differences in standardized read depths between the A and O-like pool of I. senegalensis, specifically at the morph locus of I. elegans (Fig. 5b and Supplementary Text 5). A shared genomic basis of inter-sexual mimicry for the two species was also supported by the same ~20 kb inversion signature in the A pool against an O assembly, as detected in A and I females of I. elegans (Extended Data Fig. 7). Finally, assembly alignments between O-like and A haplotypes of I. senegalensis showed that the A-specific genomic region of I. elegans is partly present in the A but not the O-like assembly of I. senegalensis (Fig. 5c).
Shared and morph-specific genes reside in the morph locus
Finally, we examined gene content and expression patterns in the morph locus. As female morphs differ in genomic content as well as sequence, the phenotypic effects of the morph locus could come about in at least three non-exclusive ways. First, entire gene models may be present in some morphs and absent in others. Second, genes present in all morphs may differ in expression patterns. Third, genes may encode different amino acid sequences in different female morphs. We used newly generated and previously published38 RNA sequencing (RNA-seq) data to investigate these questions (Extended Data Fig. 1), and capitalized on the annotations of the reference genome of I. elegans34, as well as transcripts assembled de novo in our A-morph genome assembly. Because the genetic basis of inter-sexual mimicry is shared between I. elegans and I. senegalensis (Fig. 5), we focus on genes that are expressed in both species in at least one individual (Fig. 6a).
Three transcripts (from two predicted genes) in the morph locus are expressed in A females of I. senegalensis, and in A and I females of I. elegans, but never in O or O-like females (Fig. 6b). Only one of these gene models (Afem.4094) could be functionally annotated, and appears to encode a long interspaced nuclear element (LINE) retrotransposon in the clade Jockey (Supplementary Text 6). This gene also exhibited expression changes in I females that reflect their colour development trajectory of initial resemblance to A females, followed by an overall appearance similar to O females upon sexual maturation (Supplementary Text 6). Notably, RepeatModeler and RepeatMasker detected signatures of the Jockey family at the same locus as the mapping locations of the A reads that had suggested a propagation of TEs in our SV analyses (Fig. 6a and Extended Data Fig. 3). Thus, these results further support that TEs are responsible for the evolution and expansion of the male-mimicry allele.
We also identified three gene models that are shared by all haplotypes and expressed in both species. The three predicted genes encode zinc-finger domain proteins (Fig. 6b and Supplementary Text 6), which are known to participate in transcriptional regulation39. However, we found no conclusive evidence of differential expression, nor evidence of non-synonymous substitutions between morphs shared by both I. elegans and I. senegalensis (Supplementary Text 6). While we see genes of a potentially regulatory function reside in the morph locus, understanding their role in morph development will probably require higher temporal and spatial resolution of gene expression data.
Discussion
Sexual dimorphism, where males and females have markedly distinct colour patterns, has led to multiple evolutionary origins of female-limited polymorphisms and potential male mimicry in Ischnura damselflies28. Here, we present a genomic glance into how these morphs evolve, setting the stage for future functional work to unravel the reversal of sexual phenotypes in damselfly sexual mimicry. Male mimicry in the common bluetail is controlled by a single genomic region in chromosome 13 (Figs. 2 and 3). Our data suggest that this morph locus probably evolved with the accumulation of novel and potentially TE-derived sequences in the male mimicry haplotype (Fig. 4), which is shared by male-mimicking females of species diverging more than 5 Myr ago (Fig. 5). More recently, a rare recombination event involving part of the novel A genomic content has triggered the origin of a third female morph (Fig. 4), which shares its sexually immature colouration and patterning with A females, and shares its sexually mature overall appearance with O females27. The morph locus contains a handful of genes, some of which may have evolved with TE propagation in the A haplotype, and are therefore absent from O individuals (Fig. 6). However, existing annotations provide only a hint on how these genes may influence morph development. Our results thus echo recent calls for a broader application of functional validation tools, in order to understand how lineage-specific genes contribute to phenotypic variation in natural populations40.
This study underscores two increasingly recognized insights in evolutionary genomics. First, there is mounting evidence that SVs abound in natural populations and often underpin complex and ecologically relevant phenotypic variation41, such as discrete phenotypic polymorphisms10,13,15,20. Nonetheless, traditional GWAS approaches based on SNPs can easily miss SVs, as these approaches are contingent on the genomic content of the reference assembly42. Among other novel approaches to tackle this problem42, a reference-free k-mer-based GWAS, as implemented here, is a powerful method to identify variation in genomic content and sequence, especially when the genomic architecture of the trait of interest is initially unknown35. In this study, we did not know a priori which of the three morphs, if any, would harbour unique genomic content. Had we ignored differences in genomic content between morphs and based our GWAS analysis solely on the DToL (O) reference assembly, we would have failed to identify SNPs between I and O morphs (Extended Data Fig. 8), and the origin of I females via a translocation of A content would have been obscured.
Second, a role for TEs in creating novel and even adaptive phenotypic variation is increasingly being recognized43,44. Here, we found that a ~400 kb region of unique genomic content, possibly driven by LINE transposition, is associated with the male-mimicry phenotype in at least two species of Ischnura damselflies. TE activity can contribute to phenotypic evolution by multiple mechanisms. For instance, TEs may modify the regulatory environment of genes in their vicinity, by altering methylation45 and chromatin conformation patterns46, or by providing novel cis-regulatory elements47. The male-mimicry region in I. elegans is located between two coding genes with putative DNA-binding domains, and that may thus act as transcription factors. However, our expression data do not provide unequivocal support for differential regulation of either of these genes between female morphs. Importantly, currently available expression data come from adult specimens, as female morphs are not visually discernible in aquatic nymphs. Yet, the key developmental differences that produce the adult morphs are probably directed by regulatory variation during earlier developmental stages. Now that the morph locus has been identified, future work can address differential gene expression at more relevant developmental stages, before colour differences between morphs become apparent.
TEs can also contribute to phenotypic evolution if they become domesticated, for example, when TE-encoded proteins are remodelled through evolutionary change to perform adaptive host functions48. We found two transcripts located in A-specific or A/I-specific regions that are probably derived from LINE retrotransposons and are actively expressed in the genomes that harbour them (Fig. 6b). It is therefore possible that these transcripts participate in the development of adult colour patterns, which are initially more similar between A and I females than between either of these morphs and O females27,29. Yet, functional work on these transcripts is required to ascertain their role in morph determination. Finally, TEs can become sources of novel small RNAs that play important regulatory roles49, including in insect sex determination50. Thus, future work should also address non-coding RNA expression and function in the morph locus.
Our results also provide molecular evidence for previous insights, gained by alternative research approaches, on the micro- and macroevolution of female-limited colour polymorphisms. A wealth of population data in southern Sweden has shown that female-morph frequencies are maintained by balancing selection, as they fluctuate less than expected due to genetic drift31. Behavioural and field experimental studies indicate that such balancing selection on female morphs is mediated by negative frequency-dependent male harassment51,52. We add to these earlier results by showing a molecular signature consistent with balancing selection in the genomic region where A females differ from both of the non-mimicking morphs. Sexual conflict is expected to have profound effects on genome evolution, but there are few examples of sexually antagonistic traits with a known genetic basis, in which predictions about these genomic effects can be tested53,54. Here, the signature of balancing selection on the morph locus matches the expectation of inter-sexual conflict resulting in negative frequency-dependent selection and maintaining alternative morph alleles over long periods.
Similarly, a recent comparative study based on phenotypic and phylogenetic data inferred a single evolutionary origin of the male-mimicking morph shared by I. elegans and I. senegalensis28. Our present results strongly support this common origin. This is because A females in both species share unique genomic content, including associated transcripts, and an inversion signature against the ancestral O morph (Fig. 5 and Extended Data Fig. 7). Nonetheless, these data are consistent with both an ancestral polymorphism and ancestral introgression being responsible for the spread of male mimicry across the clade. A potential role for introgression in the evolution of male mimicry is also suggested by rampant hybridization between I. elegans and its closest relatives55, and by the fact that I. elegans and I. senegalensis can hybridize millions of years after their divergence, at least in laboratory settings56. The identification of the morph locus in I. elegans enables future comparative genomics studies to disentangle the relative roles of long-term balancing selection and introgression in shaping the widespread phylogenetic distribution of female-limited polymorphisms in Ischnura damselflies.
Finally, our results open up new lines of enquiry on how the genomic architecture and chromosomal context of the female polymorphism may influence its evolutionary dynamics. Our data are consistent with the evolution of a third morph due to an ectopic recombination event that translocated genomic content from the A haplotype back into an O background. Ectopic recombination can occur when TE propagation generates homologous regions in different genomic locations57,58, and may be facilitated by the excess of TE content in chromosome 13 (Exteded Data Fig. 4). The male reproductive morphs in the ruff (Calidris pugnax) are one of few previous examples of a novel phenotypic morph arising via recombination between two pre-existing morph haplotypes10. In pond damselflies, female polymorphisms with three or more female morphs are not uncommon, and in some cases female morphs exhibit graded resemblance to males59. It is therefore possible that recombination, even if rare, has repeatedly generated diversity in damselfly female morphs.
While recombination might have had a role in generating the the novel I morph, we observe reduced recombination over the morph locus in comparison to the rest of the genome of I. elegans (Extended Data Fig. 5). However, this reduction in recombination is not limited to the morph locus and its vicinity, but rather pervasive across chromosome 13 (Extended Data Fig. 5). This unexpected finding suggests two alternative causal scenarios. First, selection for reduced recombination at the morph locus, following the origin of sexual mimicry, could have spilled over chromosome 13, facilitating a subsequent accumulation of TEs and further reducing recombination60. Second, TE enrichment and reduced recombination may have preceded the evolution of female morphs, and facilitated the establishment and maintenance of the female polymorphism by balancing selection.
Both historical scenarios are compatible with a morph locus reminiscent of a supergene, which is defined by tight genetic linkage of multiple functional loci61. However, an alternative and parsimonious explanation is that the novel sequences in A and I females and their flanking genes may not code for anything important for the male-mimicking phenotype as such, but simply disrupt a region of chromosome 13 that is required for the development of ancestral sexual differentiation. The observation that I females carry part of the sequence that originated in A in a different location of the scaffold (Fig. 4b), and still develop some male-like characters (for example, black thoracic stripes), could come about if insertions anywhere on a larger chromosomal region disrupt female suppression of the male phenotype, although with variable efficacy depending on the exact location or insertion size.
Concluding remarks
Recent years have witnessed an explosion of studies uncovering the loci behind complex phenotypic polymorphisms in various species. An emerging outlook is that not all polymorphisms are created equal, with some governed by massive chromosomal rearrangements15,16,17, and others by a handful of regulatory sites11,12,18. Our results contribute to this growing field by uncovering a single causal locus that features structural variation and morph-specific transcripts in the female-limited morphs of Ischnura damselflies. These morphs not only differ in numerous morphological and life-history traits32,62,63 and gene expression profiles24,25, but they include a male mimic that is maintained by balancing frequency-dependent selection. Our findings enable future studies on the developmental basis of such male mimicry, with consequences for a broader understanding of the evolutionary dynamics of sexual dimorphism and the cross-sexual transfer of trait expression64.
Methods
I. elegans samples
Samples for morph-specific genome assemblies of I. elegans were obtained from F1 individuals with genotypes Ao, Io and oo (one adult female of each genotype). In June 2019, recently mated O females were captured in field populations in southern Sweden. These females oviposited in the lab within 48 h, and their eggs were then released into outdoor cattle tanks seeded with Daphnia and covered with synthetic mesh. Larvae thus developed under normal field conditions and emerged as adults during the summer of 2020. Emerging females were kept in outdoor enclosures until completion of adult colour development25,65. Fully mature females were phenotyped, collected in liquid nitrogen and kept at −80 °C. Because all of these females carry a copy of the most recessive allele o, individuals of the A and I morph are heterozygous, with genotypes Ao and Io, respectively.
A total of 19 resequencing samples of each female morph of I. elegans were also collected from local populations in southern Sweden, within a 40 × 40 km area (Supplementary Table 1). Samples were submerged in 95% ethanol and stored in a −20 °C freezer until extraction. Additionally, 24 individuals (six adult females of each morph and six males) were collected for RNA-seq analysis in a natural field population (Bunkeflostrand) in southern Sweden, in early July 2019. These samples were transported on carbonated ice and stored in −80 °C until extraction.
I. senegalensis samples
Adults of I. senegalensis (30 adult females of each morph) were collected for pool sequencing from a population on Okinawa Island in Japan (26.148° N, 127.795° E) in May 2016. Samples were visually determined to sex and morph and stored in 99% ethanol until extraction. Samples for morph-specific genome assemblies of I. senegalensis were obtained from a population in Clementi Forest, Singapore (1.33° N, 103.78° E). Because the A allele is recessive in I. senegalensis, all females with the A phentoype are homozygous. To obtain a homozygous O-like sample, we developed primers (forward: CGCGGTATGATATGGTCCGA, reverse: GGCTGCTTACACCAATGCAA) for an A-specific sequence that is shared by A females of the two species (318,131–318,213 bp on the A haplotype of I. elegans). We used the mapped pool-seq data to identify fixed SNPs between species and tailor the primer sequences accordingly. We then tested the primers in 20 A females of I. senegalensis using a 328 bp fragment of the histone H3 gene (forward: ATGGCTCGTACCAAGCAGACGGC, reverse: ATATCCTTGGGCATGATGGTGAC)66 as a positive control for the polymerase chain reaction. Once validated, we utilized these primers to identify O-like females lacking the A allele and selected one of these samples for whole genome sequencing.
DNA extraction, library preparation and sequencing
High molecular weight (HMW) DNA was extracted from one I. elegans female of each genotype (Ao, Io, oo), using the Nanobind Tissue Big Extraction Kit (NB-900-701-01, Circulomics Inc. (PacBio)). HMW DNA was isolated from homozygous females of each morph of I. senegalensis, using the Monarch HMW DNA Extraction Kit for Tissue (T3060S, New England BioLabs Inc.). DNA from resequencing samples was isolated using either a modified protocol for the DNeasy Blood and Tissue Kit (19053, Qiagen) or the KingFisher Cell and Tissue DNA Kit (Cat no. N11997, ThermoFisher Scientific). I. senegalensis DNA was extracted from muscle tissues in thoraxes using Maxwell 16 LEV Plant DNA Kit (AS1420, Promega). Details on extraction and library preparation protocols are provided in Supplementary Text 1.
Sequencing libraries were constructed from each HMW DNA sample for the Nanopore LSK-110 ligation kit (Oxford Nanopore Technologies). Adapter ligation and sequencing of I. elegans samples were carried out at the Uppsala Genome Centre, hosted by SciLife Lab. Each sample was sequenced on a PromethION R10.4 with one nuclease wash and two library loadings. Library preparation and sequencing of I. senegalensis samples were carried out by the Integrated Genomics Platform, Genome Institute of Singapore, A-STAR, Singapore. Each sample was sequenced on a PromethION R9.4 flow cell, with two nuclease washes and three library loadings.
RNA extraction and sequencing
Whole-thorax samples were ground into a fine powder using a TissueLyser and used as input for the Spectrum Plant Total RNA Kit (STRN50, Sigma Aldrich), including DNase I treatment (DNASE10, Sigma Aldrich). Library preparation and sequencing were performed by SciLife Lab at the Uppsala Genome Centre. Sequencing libraries were prepared from 300 ng of RNA, using the TrueSeq stranded mRNA library preparation kit (20020595, Illumina Inc.) including polyA selection and unique dual indexing (20022371, Illumina Inc.), according to the manufacturer’s protocol. Sequencing was performed on the Illumina NovaSeq 6000 SP flowcell with paired-end reads of 150 bp.
De novo genome assembly
Bases in raw Oxford Nanopore Technologies reads from I. elegans were called using Guppy v.4.0.11 (Ao and Io data) and Guppy v.5.0.11 (oo data) (https://nanoporetech.com/). Low-quality reads (q-score <7 for v.4.0.11 and <10 for v.5.0.11) were subsequently discarded. High quality reads were assembled using the Shasta long-read assembler v.0.7.067. Each assembly was conducted under four different configuration schemes, which modified the June 2020 nanopore configuration file (https://github.com/chanzuckerberg/shasta/blob/master/conf/Nanopore-Jun2020.conf) in alternative ways (Supplementary Table 3). Assembly metrics were compared among Shasta configurations for each morph using AsmQC68 (https://sourceforge.net/projects/amos/) and the stats.sh script in the BBTools suite (https://sourceforge.net/projects/bbmap). The assembly with greater contiguity (that is, highest contig N50, highest average contig length and highest percentage of the main genome in scaffolds >50 kb) was selected for polishing and downstream analyses.
Bases in raw Oxford Nanopore Technologies reads from I. senegalensis samples were called using Guppy v.6.1.5. Reads with quality score <7 were subsequently discarded. High quality reads were assembled using the Shasta long-read assembler v.0.7.067 and the configuration file T2 (Supplementary Table 3), which was also selected for the Io and oo assemblies of I. elegans.
Morph-specific assemblies of I. elegans were first polished using the Oxford Nanopore Technologies reads mapped back to their respective assembly with minimap2 v.2.22-r111069, and the PEPPER-Margin-DeepVariant pipeline r0.470. Alternative haplotypes were subsequently filtered using purge_dups v.0.0.371, to produce a single haploid genome assembly for each sample. The I. elegans draft assemblies were then polished with short read data from one resequencing sample (TE-2564-SwD172_S37; Supplementary Table 1), using the POLCA tool in MaSuRCA v.4.0.472. For every draft and final assembly of I. elegans, we computed quality metrics as mentioned above and assessed the completeness of conserved insect genes using BUSCO v.5.0.073 and the ‘insecta_odb10’ database (Supplementary Fig. 1). For I. senegalensis, we report quality metrics of the final assemblies (Supplementary Fig. 2).
Scaffolding with the DToL super assembly
During the course of this study, a chromosome-level genome of I. elegans was assembled by the DToL Project34, based on long-read (PacBio) and short-read (Illumina) data, as well as Hi-C (Illumina) chromatin interaction data. Of the total length of this assembly, 99.5% is distributed across 14 chromosomes, one of which (no. 13) is fragmented and divided into a main assembly and five unlocalized scaffolds.
We used RagTag v.2.1074 to scaffold each of our morph-specific assemblies based on the DToL reference (Supplementary Text 2). Scaffolding was conducted using the nucmer v.4.0.075 aligner and default RagTag options. Morph-specific scaffolded genomes were also aligned to each other using nucmer and a minimum cluster length of 100 bp. Alignments were then filtered to preserve only the longest alignments in both reference and query sequences, and alignments of at least 5 kb. These assembly alignments were then used to visualize synteny patterns across morphs, in the region uncovered in our association analyses (Extended Data Fig. 1), using the package RIdeogram v.0.2.276 in R v.4.2.277.
Reference-based (SNP) GWAS
We first investigated genomic divergence between morphs using a standard GWAS approach based on SNPs (Extended Data Fig. 1). Initially, we conducted preliminary analyses using different morph assemblies as mapping reference. Once the A-specific genomic region was confirmed, we designated the A assembly as the mapping reference for the main analyses. Short-read data were mapped using bwa-mem v.0.7.1778. Optical and polymerase chain reaction duplicates were then flagged in the unfiltered bam files using GATK v.4.2.0.079. Variant calling, filtering and sorting were conducted using bcftools v.1.1280, excluding the flagged reads. We retained only variant sites with mapping quality >20, genotype quality >30 and minor allele frequency >0.02 (that is, the variant is present in more than one sample). To avoid highly repetitive content, we filtered variants that had a combined depth across samples >1,360 (equivalent to all samples having ~50% higher than average coverage), and variants located in sites annotated as repetitive in either RepeatMasker v.1.0.9381 or Red v.0.0.182. The final variant calling file was analysed in pairwise comparisons (A versus O, A versus I, I versus O) using PLINK v.1.983 (https://zzz.bwh.harvard.edu/plink/). We report the −log10 of P values for SNP associations in these pairwise comparisons.
Reference-free (k-mer) GWAS
We created a list of all k-mers of length 31 in the short-read data (19 females per morph; Extended Data Fig. 1) following ref. 35, and counted k-mers in each sample using KMC v.3.1.084. The k-mer list was filtered by the minor allele count; k-mers that appeared in less than five individuals were excluded. k-mers were also filtered by per cent canonized (that is, the per cent of samples for which the reverse complement of the k-mer was also present). If at least 20% of the samples including a given k-mer contained its canonized form, the k-mer was kept in the list. The k-mer list was then used to create a table recording the presence or absence of each k-mer in each sample. A kinship matrix for all samples was calculated from this k-mer table, and was converted to a PLINK83 binary file, where the presence or absence of each k-mer is coded as two homozygous variants. In this step, we further filtered the k-mers with a minor allele frequency below 5%.
Because a single variant, be it an SNP or SV, will probably be captured by multiple k-mers, significance testing of k-mer associations requires a method to control for the non-independence of overlapping k-mers. We followed the approach developed by ref. 35, which uses a linear mixed model genome-wide association analysis implemented in GEMMA v.0.98.585, and computes P value thresholds for associated k-mers based on phenotype permutations. We thus report k-mers below the 5% false-positive threshold as k-mers significantly associated with the female polymorphism in I. elegans. We conducted three k-mers based GWAS: (1) comparing male mimics to the putatively ancestral female morph (A versus O); (2) comparing male mimics to the most derived female morph (A versus I); and (3) comparing both derived female morphs (A and I) to the ancestral O females. For every analysis, we then mapped the significant k-mers to all reference genomes using Blast v.2.22.2886 for short sequences, and removed alignments that were below 100% identity and below full length. The mapped k-mers thus indicate the proportion of relevant genomic content present in each morph and how this content is distributed across each genome (Extended Data Table 1).
Read-depth analysis
To validate the k-mer GWAS results of unique genomic content in A females relative to both I and O females, we plotted read depth across our region of interest (the unlocalized scaffold 2 of chromosome 13; see ‘Results’) in the A assembly (Extended Data Fig. 1). Short-read data (19 samples per morph) were mapped to the assembly with bwa-mem v.0.7.1778 and reads with mapping score <20 were filtered, using Samtools v.1.1487. Long-read data (one sample per morph) were also mapped to the assembly using minimap2 v.2.22-r111069, and quality filtering was conducted as above. Read depth was then averaged for each sample across 500 bp, non-overlapping windows using mosdepth v.0.2.888. We also annotated repetitive content in the reference genome using RepeatMasker v.1.0.9381 and Red v.0.0.182, and filtered windows with more than 10% repetitive content under either method.
To account for differences in overall coverage between samples, we conducted the same procedure on a large (~15 mb) non-candidate region in chromosome 11 and calculated a ‘background read depth’ as the mean read depth across the non-repetitive windows of this region. We then expressed read depth in the candidate region as a proportion of the background read depth. Values around 1 thus indicate that a sample is homozygous for the presence of the sequence in a window. Values around 0.5 suggest that the sample only has one copy of this sequence in its diploid genome (that is, it is heterozygous). Finally, values of 0 imply that the 500 bp reference sequence is not present in the sample (that is, the window is part of an insertion or deletion).
We also investigated read-depth coverage on the I assembly, specifically across the region that was identified in the k-mer based GWAS as capturing content that differentiated both A and I females from O females (Fig. 3b and Extended Data Fig. 1). To do so, we followed the same strategy as above, except here we used a 15 mb region from chromosome 1 to estimate background read depth.
Population genetics
We investigated the evolutionary consequences of morph divergence by estimating between-morph FST and population-wide Tajima’s D and π. For these analyses, we used the A assembly as mapping reference and the same variant-calling approach as described for the SNP-based GWAS, but applied different filtering criteria (Extended Data Fig. 1). Specifically, invariant sites were retained and we only filtered sites with mapping quality score <20 and combined depth across samples >1,360 (equivalent to ~50% excess coverage in all samples). FST and π were estimated in pixy v.1.2.589 across 30 kb windows. FST was computed using the Hudson estimator90. Negative FST values were converted to zero for plotting. Tajima’s D was estimated across 30 kb using vcftools v.0.1.1791. In all analyses, windows with >10% repetitive content according to either RepeatMasker v.1.0.9381 or Red v.0.0.182 annotation were excluded.
SVs
We used two complimentary approaches to identify SVs overlapping the genomic region uncovered by both k-mer-based and SNP-based GWAS. First, we mapped the raw data from each long-read sample to the assemblies of alternative morphs (for example Ao data mapped to Io and oo assemblies), and called SVs using Sniffles v.1.0.1092 (Extended Data Fig. 1). These SV calls may represent fixed differences between morphs, within-morph polymorphisms or products of assembly error. We therefore used SamPlot v.1.3.093 and our short-read samples (n = 19 per morph) to validate morph-specific SV calls (Extended Data Fig. 1). Samplot identifies and plots reads with discordant alignments, which can result from specific types of SVs. For example, if Sniffles called a 10 kb deletion in the Ao and Io long-read samples relative to the oo assembly, we then constructed a Samplot for this region using short-read data, and expected to find support for such deletion in I and A samples, but not in O samples. We complemented this validation approach with a scan of the region of interest in each assembly, in windows of 250 and 500 kb, again using Samplot and the short-read data. If a SV appeared to be supported by the majority of short-read samples from an alternative morph, we zoomed in this SV and recorded the number of samples supporting the call in each morph.
Linkage disequilibrium and TEs
To estimate linkage disequilibrium (LD), we used the same variant-calling file as for the SNP-based GWAS, which included only variant sites and was filtered by mapping quality, genotyping quality, minimum allele frequency and read depth, as described above (Extended Data Fig. 1). The file was downsampled to 1 variant every 100th using vcftools v.0.1.1791, prior to LD estimation. We estimated LD using PLINK v.1.983, and recorded R2 values >0.05 for pairs up to 15 mb apart or with 10,000 or fewer variants between them. We estimated LD for the unlocalized scaffold 2 of chromosome 13, which contains the morph loci and is ~15 mb in the A assembly. For comparison, we also estimated LD across the first 15 mb of the fully assembled chromosomes (1–12 and X), the main scaffold of chromosome 13, and the unlocalized scaffolds 1, 3 and 4 of chromosome 13.
We used the TE annotations from RepeatModeler v.2.0.1, RepeatMasker v.1.0.9381 and ‘One code to find them all’94 to quantify TE coverage in chromosome 13 in comparison to the rest of the genome. We divided each chromosome into 1.5 mb windows, and computed the proportion of each window covered by each TE family.
Evidence of a trans-species polymorphism
We used pool-seq data from the closely related tropical bluetail damselfly (I. senegalensis) to determine whether male mimicry has a shared genetic basis in the two species (Extended Data Fig. 1). First, we aligned the short-read data from the the two I. senegalensis pools (A and O-like) to the A-morph assembly of I. elegans using bwa-mem v.0.7.1778, and filtered reads with mapping score <20, using Samtools v.1.1487. We then quantified read depth as for the I. elegans resequencing data (see ‘Read-depth analysis’). To confirm that the higher read-depth coverage of the A pool is specific to the putative morph locus, we also plotted the distribution of read-depth differences between O-like and A pools across the rest of the genome and compared it to the morph locus (Supplementary Text 5). Next, we determined if the ~20 kb SV that characterizes A and I females of I. elegans is also present in A females of I. senegalensis. To do this, we mapped the pool-seq data to the O assembly of I. elegans as above, and scanned the region at the start of the scaffold 2 of chromosome 13 for SVs using Samplot v.1.3.093. Finally, we aligned the morph-specific assemblies of I. senegalensis to the A assembly of I. elegans using nucmer v.4.0.075, and preserving alignments >500 bp with identity >70% (Extended Data Fig. 1). We visualized synteny patterns across the morph locus using the package RIdeogram v.0.2.276 in R v.4.2.277.
Gene content and expression in the morph locus
We assembled transcripts in the A morph genome (Extended Data Fig. 1) to identify potential gene models unique to the A or A and I morphs, which would therefore be absent from the I. elegans reference annotation (based on the O haplotype). First, all raw RNA-seq data from I. elegans samples were mapped to the A assembly using HISAT2 v.2.2.195 and reads with mapping quality <60 were filtered using Samtools v.1.1987. Transcripts were then assembled in StringTie v.2.1.496 under default options, and merged into a single gtf file. Transcript abundances were quantified using this global set of transcripts as targets, and a transcript count matrix was produced using the prepDE.py3 script provided with StringTie. Mapped RNA-seq data from I. senegalensis were also used to assemble transcripts (Extended Data Fig. 1), but this time the HISAT2 assembly was guided by the annotation based on I. elegans data, while allowing the identification of novel transcripts. Transcript abundances were quantified as for I. elegans.
We analysed differential gene expression using the package edgeR v.3.3697 in R v.4.2.277. Transcripts with fewer than one count per million in more than three samples were filtered. Library sizes were normalized across samples using the trimmed mean of M-values method98, and empirical Bayes tagwise dispersion99 was estimated prior to pairwise expression analyses. Differential expression of genes in the morph loci was tested using two-tailed exact tests100, assuming negative binomially distributed transcript counts and applying the Benjamini and Hochberg’s algorithm to control the false discovery rate101.
Nucleotide sequences of all transcripts mapped to the 1.5 mb morph locus in the A assembly were selected. Coding sequences in these transcripts were predicted using Transdecoder v.5.5.0 (https://github.com/TransDecoder/TransDecoder). Predicted coding sequences and peptide sequences were read from the assemblies using the genome-based coding region annotation produced with Transdecoder and gffread v.0.12.7102. We investigated whether any inferred peptides or transcripts were unique to A or A and I by comparing these sequences to the DToL reference transcriptome and proteome (based on the O haplotype). We then searched for homologous and annotated proteins in other taxa in the Swissprot database using Blast v.2.9.086. We found three gene models that are protein-coding and present in both A and O females (see ‘Results’ and Fig. 6). We scanned these protein sequences for functional domains using InterProScan103 and searched for orthologous groups and functional annotations in eggNOG v.5.0104.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Sequencing data from this study have been submitted to the NCBI Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra/) under BioProject PRJNA940276. For details, please see Supplementary Tables 1 and 2. Morph-specific genome assemblies and intermediate output files required to reproduce the figures in the main text and supporting material are available on Zenodo105. Source data are provided with this paper.
Code availability
All code necessary to reproduce the results of this study can be found on Zenodo at https://doi.org/10.5281/zenodo.8304055 and Github at https://github.com/bwillink/Morph-locus.
References
Mank, J. E. Sex chromosomes and the evolution of sexual dimorphism: lessons from the genome. Am. Nat. 173, 141–150 (2009).
De Lisle, S. P. Understanding the evolution of ecological sex differences: integrating character displacement and the Darwin–Bateman paradigm. Evol. Lett. 3, 434–447 (2019).
Hopkins, B. R. & Kopp, A. Evolution of sexual development and sexual dimorphism in insects. Curr. Opin. Genet. Dev. 69, 129–139 (2021).
Jukema, J. & Piersma, T. Permanent female mimics in a lekking shorebird. Biol. Lett. 2, 161–164 (2006).
Hurtado-Gonzales, J. L. & Uy, J. A. C. Alternative mating strategies may favour the persistence of a genetically based colour polymorphism in a pentamorphic fish. Anim. Behav. 77, 1187–1194 (2009).
Gosden, T. P. & Svensson, E. I. Density-dependent male mating harassment, female resistance, and male mimicry. Am. Nat. 173, 709–721 (2009).
Falk, J. J., Rubenstein, D. R., Rico-Guevara, A. & Webster, M. S. Intersexual social dominance mimicry drives female hummingbird polymorphism. Proc. R. Soc. B 289, 20220332 (2022).
Falk, J. J., Webster, M. S. & Rubenstein, D. R. Male-like ornamentation in female hummingbirds results from social harassment rather than sexual selection. Curr. Biol. 31, 4381–4387 (2021).
Mank, J. E. Sex-specific morphs: the genetics and evolution of intra-sexual variation. Nat. Rev. Genet. 24, 44–52 (2022).
Lamichhaney, S. et al. Structural genomic changes underlie alternative reproductive strategies in the ruff (Philomachus pugnax). Nat. Genet. 48, 84–88 (2016).
Andrade, P. et al. Regulatory changes in pterin and carotenoid genes underlie balanced color polymorphisms in the wall lizard. Proc. Natl Acad. Sci. USA 116, 5633–5642 (2019).
Kim, K.-W. et al. Genetics and evidence for balancing selection of a sex-linked colour polymorphism in a songbird. Nat. Commun. 10, 1852 (2019).
Woronik, A. et al. A transposable element insertion is associated with an alternative life history strategy. Nat. Commun. 10, 5757 (2019).
Hendrickx, F. et al. A masculinizing supergene underlies an exaggerated male reproductive morph in a spider. Nat. Ecol. Evol. 6, 195–206 (2022).
Tuttle, E. M. et al. Divergence and functional degradation of a sex chromosome-like supergene. Curr. Biol. 26, 344–350 (2016).
Sanchez-Donoso, I. et al. Massive genome inversion drives coexistence of divergent morphs in common quails. Curr. Biol. 32, 462–469 (2022).
Villoutreix, R. et al. Inversion breakpoints and the evolution of supergenes. Mol. Ecol. 30, 2738–2755 (2021).
Tunström, K. et al. Evidence for a single, ancient origin of a genus-wide alternative life history strategy. Sci. Adv. 9, eabq3713 (2023).
Gutiérrez-Valencia, J., Hughes, P. W., Berdan, E. L. & Slotte, T. The genomic architecture and evolutionary fates of supergenes. Genome Biol. Evol. 13, evab057 (2021).
Sandkam, B. A. et al. Extreme Y chromosome polymorphism corresponds to five male reproductive morphs of a freshwater fish. Nat. Ecol. Evol. 5, 939–948 (2021).
Kunte, K. et al. Doublesex is a mimicry supergene. Nature 507, 229–232 (2014).
Feiner, N. et al. A single locus regulates a female-limited color pattern polymorphism in a reptile. Sci. Adv. 8, eabm2387 (2022).
Neff, B. D. & Svensson, E. I. Polyandry and alternative mating tactics. Phil. Trans. R. Soc. B 368, 20120045 (2013).
Takahashi, M., Takahashi, Y. & Kawata, M. Candidate genes associated with color morphs of female-limited polymorphisms of the damselfly Ischnura senegalensis. Heredity 122, 81–92 (2019).
Willink, B., Duryea, M. C., Wheat, C. & Svensson, E. I. Changes in gene expression during female reproductive development in a color polymorphic insect. Evolution 74, 1063–1081 (2020).
Takahashi, M., Okude, G., Futahashi, R., Takahashi, Y. & Kawata, M. The effect of the doublesex gene in body colour masculinization of the damselfly Ischnura senegalensis. Biol. Lett. 17, 20200761 (2021).
Cordero, A., Carbone, S. S. & Utzeri, C. Mating opportunities and mating costs are reduced in androchrome female damselflies, Ischnura elegans (Odonata). Anim. Behav. 55, 185–197 (1998).
Blow, R., Willink, B. & Svensson, E. I. A molecular phylogeny of forktail damselflies (genus ischnura) reveals a dynamic macroevolutionary history of female colour polymorphisms. Mol. Phylogenet. Evol. 160, 107134 (2021).
Henze, M. J., Lind, O., Wilts, B. D. & Kelber, A. Pterin-pigmented nanospheres create the colours of the polymorphic damselfly Ischnura elegans. J. R. Soc. Interface 16, 20180785 (2019).
Svensson, E. I., Abbott, J. & Härdling, R. Female polymorphism, frequency dependence, and rapid evolutionary dynamics in natural populations. Am. Nat. 165, 567–576 (2005).
Le Rouzic, A., Hansen, T. F., Gosden, T. P. & Svensson, E. I. Evolutionary time-series analysis reveals the signature of frequency-dependent selection on a female mating polymorphism. Am. Nat. 185, E182–E196 (2015).
Svensson, E. I., Willink, B., Duryea, M. C. & Lancaster, L. T. Temperature drives pre-reproductive selection and shapes the biogeography of a female polymorphism. Ecol. Lett. 23, 149–159 (2020).
Willink, B., Duryea, M. C. & Svensson, E. I. Macroevolutionary origin and adaptive function of a polymorphic female signal involved in sexual conflict. Am. Nat. 194, 707–724 (2019).
Price, B. W. et al. The genome sequence of the blue-tailed damselfly, elegans (Vander Linden, 1820) [version 1; peer review: awaiting peer review]. Wellcome Open Res. 7, 66 (2022).
Voichek, Y. & Weigel, D. Identifying genetic variants underlying phenotypic variation in plants without complete genomes. Nat. Genet. 52, 534–540 (2020).
Parr, M. J. The terminology of female polymorphs of Ischnura (Zygoptera: Coenagrionidae). Int. J. Odonatol. 2, 95–99 (1999).
Takahashi, Y. & Watanabe, M. Male mate choice based on ontogenetic colour changes of females in the damselfly Ischnura senegalensis. J. Ethol. 29, 293–299 (2011).
Okude, G. et al. Molecular mechanisms underlying metamorphosis in the most-ancestral winged insect. Proc. Natl Acad. Sci. USA 119, e2114773119 (2022).
Laity, J. H., Lee, B. M. & Wright, P. E. Zinc finger proteins: new insights into structural and functional diversity. Curr. Opin. Struct. Biol. 11, 39–46 (2001).
Gudmunds, E., Wheat, C. W., Khila, A. & Husby, A. Functional genomic tools for emerging model species. Trends Ecol. Evol. 37, 1104–1115 (2022).
Chakraborty, M., Emerson, J. J., Macdonald, S. J. & Long, A. D. Structural variants exhibit widespread allelic heterogeneity and shape variation in complex traits. Nat. Commun. 10, 4872 (2019).
Quan, C., Lu, H., Lu, Y. & Zhou, G. Population-scale genotyping of structural variation in the era of long-read sequencing. Comput. Struct. Biotechnol. J. 37, 1104–1115 (2022).
Schrader, L. & Schmitz, J. The impact of transposable elements in adaptive evolution. Mol. Ecol. 28, 1537–1549 (2019).
Catlin, N. S. & Josephs, E. B. The important contribution of transposable elements to phenotypic variation and evolution. Curr. Opin. Plant Biol. 65, 102140 (2022).
Zhang, J. et al. Autotetraploid rice methylome analysis reveals methylation variation of transposable elements and their effects on gene expression. Proc. Natl Acad. Sci. USA 112, E7022–E7029 (2015).
Diehl, A. G., Ouyang, N. & Boyle, A. P. Transposable elements contribute to cell and species-specific chromatin looping and gene regulation in mammalian genomes. Nat. Commun. 11, 1796 (2020).
Sundaram, V. & Wysocka, J. Transposable elements as a potent source of diverse cis-regulatory sequences in mammalian genomes. Phil. Trans. R. Soc. B 375, 20190347 (2020).
Jangam, D., Feschotte, C. & Betrán, E. Transposable element domestication as an adaptation to evolutionary conflicts. Trends Genet. 33, 817–831 (2017).
McCue, A. D. & Slotkin, R. K. Transposable element small RNAs as regulators of gene expression. Trends Genet. 28, 616–623 (2012).
Kiuchi, T. et al. A single female-specific piRNA is the primary determiner of sex in the silkworm. Nature 509, 633–636 (2014).
Van Gossum, H., Stoks, R. & De Bruyn, L. Frequency-dependent male mate harassment and intra-specific variation in its avoidance by females of the damselfly Ischnura elegans. Behav. Ecol. Sociobiol. 51, 69–75 (2001).
Takahashi, Y., Kagawa, K., Svensson, E. I. & Kawata, M. Evolution of increased phenotypic diversity enhances population performance by reducing sexual harassment in damselflies. Nat. Commun. 5, 4468 (2014).
Rowe, L., Chenoweth, S. F. & Agrawal, A. F. The genomics of sexual conflict. Am. Nat. 192, 274–286 (2018).
Sayadi, A. et al. The genomic footprint of sexual conflict. Nat. Ecol. Evol. 3, 1725–1730 (2019).
Wellenreuther, M. et al. Molecular and ecological signatures of an expanding hybrid zone. Ecol. Evol. 8, 4793–4806 (2018).
Okude, G., Fukatsu, T. & Futahashi, R. Interspecific crossing between blue-tailed damselflies Ischnura elegans and I. Senegalensis in the laboratory. Entomol. Sci. 23, 165–172 (2020).
Montgomery, E. A., Huang, S. M., Langley, C. H. & Judd, B. H. Chromosome rearrangement by ectopic recombination in Drosophila melanogaster: genome structure and evolution. Genetics 129, 1085–1098 (1991).
Delprat, A., Negre, B., Puig, M. & Ruiz, A. The transposon Galileo generates natural chromosomal inversions in Drosophila by ectopic recombination. PLoS ONE 4, e7883 (2009).
Andrés, J. A. & Cordero, A. The inheritance of female colour morphs in the damselfly Ceriagrion tenellum (Odonata, Coenagrionidae). Heredity 82, 328–335 (1999).
Kent, T. V., Uzunović, J. & Wright, S. I. Coevolution between transposable elements and recombination. Phil. Trans. R. Soc. B 372, 20160458 (2017).
Thompson, M. J. & Jiggins, C. Supergenes and their role in evolution. Heredity 113, 1–8 (2014).
Willink, B. & Svensson, E. I. Intra-and intersexual differences in parasite resistance and female fitness tolerance in a polymorphic insect. Proc. R. Soc. B 284, 20162407 (2017).
Abbott, J. K. & Gosden, T. P. Correlated morphological and colour differences among females of the damselfly Ischnura elegans. Ecol. Entomol. 34, 378–386 (2009).
West-Eberhard, M. J. Developmental Plasticity and Evolution (Oxford Univ. Press, 2003).
Svensson, E. I., Abbott, J. K., Gosden, T. P. & Coreau, A. Female polymorphisms, sexual conflict and limits to speciation processes in animals. Evol. Ecol. 23, 93–108 (2009).
Colgan, D. J. et al. Histone H3 and U2 snRNA DNA sequences and arthropod molecular evolution. Aust. J. Zool. 46, 419–437 (1998).
Shafin, K. et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 38, 1044–1053 (2020).
Phillippy, A. M., Schatz, M. C. & Pop, M. Genome assembly forensics: finding the elusive mis-assembly. Genome Biol. 9, R55 (2008).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Zimin, A. V. et al. Hybrid assembly of the large and highly repetitive genome of Aegilops tauschii, a progenitor of bread wheat, with the MaSuRCA mega-reads algorithm. Genome Res. 27, 787–792 (2017).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Alonge, M. et al. Automated assembly scaffolding elevates a new tomato system for high-throughput genome editing. Genome Biol. 23, 258 (2022).
Delcher, A. L., Phillippy, A., Carlton, J. & Salzberg, S. L. Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30, 2478–2483 (2002).
Hao, Z. et al. Rideogram: drawing SVG graphics to visualize and map genome-wide data on idiograms. PeerJ Comput. Sci. 6, e251 (2020).
R Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2021).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Van der Auwera, G. A. et al. From FastQ data to high-confidence variant calls: the genome analysis toolkit best practices pipeline. Curr. Protoc. Bioinform. 43, 11.10.1–11.10.33 (2013).
Li, H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993 (2011).
Smith, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0 http://www.repeatmasker.org (2013–2015).
Girgis, H. Z. Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinform. 16, 227 (2015).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Kokot, M., Długosz, M. & Deorowicz, S. KMC 3: counting and manipulating k-mer statistics. Bioinformatics 33, 2759–2761 (2017).
Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinform. 10, 421 (2009).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2018).
Korunes, K. L. & Samuk, K. pixy: unbiased estimation of nucleotide diversity and divergence in the presence of missing data. Mol. Ecol. Resour. 21, 1359–1368 (2021).
Hudson, R. R., Slatkin, M. & Maddison, W. P. Estimation of levels of gene flow from DNA sequence data. Genetics 132, 583–589 (1992).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Belyeu, J. R. et al. Samplot: a platform for structural variant visual validation and automated filtering. Genome Biol. 22, 161 (2021).
Bailly-Bechet, M., Haudry, A. & Lerat, E. ‘One code to find them all’: a perl tool to conveniently parse RepeatMasker output files. Mob. DNA 5, 13 (2014).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Robinson, M. D., McCarthy, D. J. & Smyth, G. K. EdgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Robinson, M. D. & Oshlack, A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 11, R25 (2010).
Robinson, M. D. & Smyth, G. K. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics 23, 2881–2887 (2007).
Robinson, M. D. & Smyth, G. K. Small-sample estimation of negative binomial dispersion, with applications to SAGE data. Biostatistics 9, 321–332 (2008).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).
Pertea, G. & Pertea, M. GFF utilities: GffRead and GffCompare. F1000Research 9, 304 (2020).
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Huerta-Cepas, J. et al. eggNOG5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–D314 (2019).
Willink, B. et al. Data from: The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies. Zenodo https://zenodo.org/records/8304153 (2023).
Takahashi, Y. Testing negative frequency-dependent selection: linking behavioral plasticity and evolutionary dynamics. Bull. Kanto Branch Ecol. Soc. Jpn. 59, 8–14 (2011).
Acknowledgements
We thank H. Dort, R. A. Steward, J. Haushofer, P. de Sessions and the Monteiro Lab for suggestions and helpful discussions. We are also grateful to M. P. Celorio-Mancera and H. M. Low for invaluable technical support. A. Monteiro hosted B.W. at the National University of Singapore during part of the duration of this study. Computation and data handling were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) through the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX), under projects 2022/6-230 and 2022/5-419 awarded to E.I.S. Specimens of I. senegalensis were collected in Singapore, under a research permit (NP/RP22-015b) from the National Parks Board, Singapore. This work was supported by an International Postdoc Grant from the Swedish Research Council (VR) (grant no. 2019-06444 to B.W.). Funding was also provided by the Swedish Research Council (VR) (grant no. 2017-04386 to C.W.W. and grant no. 2016-03356 to E.I.S.), by the Stina Werners Foundation (grant no. 2018-017 to E.I.S.) and Erik Philip Sörensens Stiftelse (grant no. 2019-033 to E.I.S.). S.N. was funded by a scholarship grant for Master’s students from Sven and Lily Lawski’s Foundation.
Funding
Open access funding provided by Stockholm University.
Author information
Authors and Affiliations
Contributions
B.W. conceived the study with input from C.W.W. E.I.S. organized field work in the long-term population study of I. elegans during 2019 and 2020, and planned and prepared the outdoor rearing experiments. E.I.S. and S.N. collected DNA and RNA samples of I. elegans. M.T. and Y.T. collected samples for pool sequencing of I. senegalensis, and B.W. collected samples of I. senegalensis in Singapore. S.N., B.W. and K.T. conducted laboratory work on I. elegans. M.T., Y.T. and B.W. conducted laboratory work on I. senegalensis. B.W. analysed the data with input from C.W.W., K.T., R.C. and T.L. B.W. wrote the paper with contributions from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Ecology & Evolution thanks Riddhi Deshmukh, Elina Immonen and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Outline of data and analyses used in this study.
For our main study species Ischnura elegans, we obtained short-read genomic data from 19 field-caught females per morph, and long-read genomic data from three females with genotypes Ao, Io, and oo. The long-read samples were used to assemble morph-specific genomes, scaffolded against the Darwin Tree of Life reference assembly. We obtained whole-thorax RNAseq data from females of each morph in both sexually immature and sexually mature colour phases (n = 3 of each morph and colour phase). Immature and mature males (n = 3 of each) were also sampled for whole-thorax RNAseq data. We used short-read pool-seq data (n = 30 individuals of each morph per pool) of the close relative Ischnura senegalensis to investigate whether the female polymorphisms in both species share a genomic basis. We also analysed expression levels of candidate genes in this species, using samples from a previously published study38, which produced transcriptomic data from four body parts (head, thorax, wing and abdomen) of each A females, O females and males (n = 2), sampled at adult emergence and two days thereafter. The k-mer based GWAS is reference-free, but significant k-mers were mapped to the morph-specific assemblies to determine their chromosomal context. Damselfly images adapted from ref. 25 under a Creative Commons licence CC BY 4.0.
Extended Data Fig. 2 An inversion signature differentiates A and I individuals from the O morph.
Read mapping and sample coverage at the start of the scaffold 2 of chromosome 13 in a our O assembly and b the DToL reference assembly, showing a signature of a ~ 20 kb inversion in A and I samples. A single O sample also exhibited this signature but was excluded here for clarity (see Supporting Text 3). Note that the first 415 kb of the reference DToL assembly are absent in our scaffolded O assembly, and therefore the x-axis is shifted by 415 kb in b.
Extended Data Fig. 3 The A and I reads mapped to inversion break points on the O assembly (see Extended Data Fig. 2) map to multiple locations on the A assembly.
a Reads from the first inversion breakpoint. b Reads from the second inversion breakpoint. Each row represents a sample and each circle an individual read. The x-axis corresponds to coordinates on the A assembly.
Extended Data Fig. 4 Proportion of TE content in non-overlapping 1.5 mb regions.
The gray dots correspond to genomic windows outside chromosome 13. The main assembly and the unlocalized scaffolds of chromosome 13 are depicted with different colours. The dashed line marks the 95 percentile of TE coverage across all windows.
Extended Data Fig. 5 Linkage disequilibrium (LD) in the genome of Ischnura elegans.
LD estimates are shown for the first 15 mb of each chromosome and all unlocalized scaffolds of chromosome 13. The morph locus is found in the first ~ 1.5 mb of the unlocalized scaffold 2 of chromosome 13, which has a total size of ~ 15 mb. Each dot represent the square correlation coefficient (R2) between two variant sites on the x axis, separated by the number of sites indicated in the y axis.
Extended Data Fig. 6 Evidence of a translocation between A and I haplotypes.
Mapping and coverage of long reads from an Io sample across the first 5.6 mb of the unlocalized scaffold 2 of chromosome 13 in the A assembly, showing a signature consistent with either a 5.54 mb inversion or a translocation of inverted A content. Absence of morph divergence beyond ~1.5 mb on the A assembly supports the translocation scenario.
Extended Data Fig. 7 Structural variants between A and O-like females of I. senegalensis along the morph locus identified in I. elegans.
a Read mapping and sample coverage of I. senengalensis pool-seq data at the start of the unlocalized scaffold 2 of chromosome 13 in the O assembly of I. elegans. The same ~ 20 kb inversion signature is found in A and I samples of I. elegans (see Extended Data Fig. 2). b-c The A-pool reads mapped to break points on the O assembly map to multiple locations on the A assembly. b Reads from the first breakpoint. c Reads from the second breakpoint. Each row represents a pool of I. senegalensis and each circle an individual read. The x-axis corresponds to the A assembly of I. elegans.
Extended Data Fig. 8 Morph divergence using the DToL assembly (O haplotype) as mapping reference.
a SNP-based genome-wide associations in all pairwise analyses between morphs. Genomic DNA from 19 wild-caught females of each colour morph and of unknown genotype was extracted and sequenced for these analyses. Illumina short reads were aligned against the DToL reference assembly. b A closer look of the SNP associations on the unlocalized scaffold 2 of chromosome 13, which contained all highly significant SNPs. The y axis in a and b indicates unadjusted -Log10 P-values calculated from chi-squared tests. c Fst values averaged across 30 kb windows for the same pairwise comparisons as in the SNP based GWAS. The dashed line marks the 95 percentile of all non-zero Fst values across the entire genome. The red double arrow shows the region of elevated divergence between A and both O and I samples. Population-level estimates of d Tajima′s D, and e nucleotide diversity (π) averaged across 30 kb windows. The shaded area contains the 5–95 percentile of all genome-wide estimates.
Supplementary information
Supplementary Information
Supplementary Texts 1–6, Supplementary Tables 1–6 and Supplementary Figs. 1–14.
Source data
Source Data Fig. 1
Tree file.
Source Data Fig. 2
Statistical Source Data.
Source Data Fig. 3
Statistical Source Data.
Source Data Fig. 4
Statistical Source Data.
Source Data Fig. 5
Statistical Source Data.
Source Data Fig. 6
Statistical Source Data.
Source Data Extended Data Fig. 3
Statistical Source Data.
Source Data Extended Data Fig. 4
Statistical Source Data.
Source Data Extended Data Fig. 7
Statistical Source Data.
Source Data Extended Data Fig. 8
Statistical Source Data.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Willink, B., Tunström, K., Nilén, S. et al. The genomics and evolution of inter-sexual mimicry and female-limited polymorphisms in damselflies. Nat Ecol Evol 8, 83–97 (2024). https://doi.org/10.1038/s41559-023-02243-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41559-023-02243-1