Introduction

The spongy moth Lymantria dispar L., formerly known as the gypsy moth, is native to Europe and Asia. It is considered to be one of the most destructive forest defoliators over much of its range1,2. The spongy moth was accidentally introduced into North America in Medford, Massachusetts in the 1860s3. Since then, the spongy moth has spread throughout much of the northeastern seaboard of the United States and adjacent parts of Canada. Spongy moth larvae feed on more than 300 species of trees, causing defoliation in coniferous and deciduous forests as well as residential areas4,5,6,7 .

The spongy moth has been further classified by Pogue and Schaefer8 into three subspecies: the European subspecies L. dispar dispar L., the Asian subspecies L. dispar asiatica Vnukovskij, and the Japanese subspecies L. dispar japonica Motschulsky. For regulatory purposes, moths of the latter two subspecies are grouped into a biotype that had been formerly referred to as the Asian gypsy moth, along with Lymantria umbrosa and Lymantria postalba9. This biotype is defined by the capacity of females to fly, in contrast to females of the European/North American spongy moth which are characterized by females that are largely flightless. This delineation of the spongy moth into subspecies and biotypes has been supported by comparative analyses of the mitochondrial genomes of different spongy moth populations and by a genotyping-by-sequencing analysis involving 2327 single-nucleotide polymorphisms, although these studies have also revealed differences among populations of a subspecies10,11. Subspecies of the Asian spongy moth biotype (hereafter abbreviated as ASM) are mainly distributed from the Ural Mountains east to China, South Korea, Japan, and the Russian Far East. The European spongy moth biotype (ESM) is distributed in Europe, east North America, west and central Asia, north Africa, north India, Pakistan and Afghanistan. Compared with ESM, some populations of ASM may also require a shorter time to break the diapause of its eggs12 and may be able to better adapt to some North American plants than the established ESM13. These properties suggest that ASM may cause more damage and loss than ESM if it becomes established in North America.

Female adult flight capability is the only criterion for classification of spongy moth populations as ASM8. While ESM females in North American populations are generally not capable of any kind of flight14,15, variability in female flight capability and activity has been observed among other strains of ESM from Europe and among populations of ASM16,17,18,19. Correlations between variations in female spongy moth flight capacity and variations in wing size and dimensions, flying muscle tissue, and wing load (mass/wing area) have been reported15,16,17,20.

Analysis of crosses between ASM and ESM moths indicate that flight capability has a significant genetic basis14,15. However, while broad geographic groups of L. dispar can be distinguished with mitochondrial and nuclear genetic markers10, alleles of these markers often were not completely fixed in regions where they occur and thus could not serve as unambiguous indicators of female flight capability16,21,22. In addition to biotype- and strain-dependent physiological factors influencing flight, mated ASM adult females were found to have significantly reduced flight capability after oviposition23, but not before17. Spongy moth adults are capital breeders that rely on resources accumulated as larvae to carry out flight and reproduction24. The effect of oviposition on ASM flight may represent the need for resorption of oocytes to supply fuel for prolonged flight activity, a resource that is lost upon oviposition.

A previous analysis and comparison of genomic sequence data derived from ESM and ASM samples detected genetic divergence between the two biotypes in a selection of genes enriched in gene ontology (GO) categories presumed to be involved in flight, such as “skeleton muscle adaptation” (GO:0043501) and “ionotropic glutamate receptors” (GO:0035235)25. Some of the divergent genes encoded homologs of genes that control wing size in Drosophila melanogaster. Results of this analysis also indicated that a greater degree of sequence divergence may exist in the regulatory regions of ESM and ASM genomes, suggesting that differences in gene expression may also contribute to differences in flight capacity. A comparative analysis of ESM and ASM female antennal and larval head capsule transcriptomes has been published which identified differences in olfaction-related gene expression among three representative strains of the two biotypes26, which is consistent with the concept that differences in gene expression may account for differences in flight capacity.

In this study, our goals were to test this concept and (1) identify key genes expressed in adult moths that potentially affect the flight ability and wing development of female spongy moth, (2) explore their expression characteristics and differences in distinct geographical populations, and (3) provide a basis for further research on the molecular mechanism of flight ability. We used transcriptome sequencing to analyze the female adults of two different populations each from the United States and China in order to identify differences in gene expression between ESM and ASM that are consistently observed. Eight libraries prepared from RNA harvested from adult females of these four strains before mating and after mating and oviposition were constructed and sequenced, and differentially expressed genes that might affect their flight activities were analyzed and assessed.

Results

Qualitative description for assembly and annotation of transcriptomes

We assembled and compared genome-wide transcription profiles of ESM and ASM virgin adult females and females after mating and oviposition. Eight independent RNA-Seq analyses were performed with different populations.

A total of 205.96 Gb of processed reads were obtained by sequencing (> 6.98 Gb/replicate). The percentage of Q30-filtered bases was more than 91.85%. Assembly of the sequence data from all libraries resulted in identification of 129,286 unigenes (Table 1).

Table 1 Transcriptome assembly results for all samples.

The length distribution of the unigenes is shown in Fig. 1. The percentage of unigenes with a length not greater than 500 bp was over 70%, with a further 17% ranging between 501 and 1000 bp. Relatively few unigenes were found in the 4001–4500 bp category.

Figure 1
figure 1

Length distribution of unigenes/transcripts among all spongy moth samples.

All female adult transcriptome unigenes were annotated from the NR, Swiss-Prot, Pfam, COG, GO and KEGG databases. The number of unigene annotations was 39,584, accounting for 30.62% of the total. Among them, queries of the NR database yielded the most annotations, accounting for 26.28% of the total. The COG database yielded the least annotations, 6,221 (4.81%) (Table 2).

Table 2 Unigenes annotation profiles of all spongy moth samples.

NR annotation

BLASTX comparison with NR database sequences was performed to identify the similarity between the transcription sequences of spongy moth and similar species and the functional information of homologous sequences. A total of 33,971 unigenes were successfully matched with known genes (E < 10–5), with 4716 (14.03%) sequences showing high sequence similarity with Spodoptera litura, followed by Helicoverpa armigera. There were 4467 (13.29%) and 3666 (10.91%) unigenes with top BLAST matches with Heliothis virescens sequences. Matches with other species returned by BLAST showed low sequence similarity with spongy moth sequences. However, 11,927 unigenes (35.49%) were unique transcripts of spongy moth (Fig. 2).

Figure 2
figure 2

Species distribution of BLAST results from female spongy moth transcriptomes.

COG/NOG annotations

COG (Clusters of Orthologous Groups) is a database of protein lineages for general function prediction, while NOG (Non-Supervised Orthologous Groups) is optimized on the basis of COG to expand genomic information and provide more detailed OG analysis. After comparison, COG function classification of genes/transcripts can be obtained. The most prevalent COG functions annotated in the transcriptomes of adult spongy moth are translation, ribosomal structure and biogenesis, while the Function unknown class was the most prevalent annotation in NOG (Fig. 3).

Figure 3
figure 3

COG/NOG functional annotation of adult spongy moth transcriptomes.

GO annotation

A total of 22,408 Unigene sequences were annotated with 94,399 GO entries, including 48 subclasses in 3 categories: 20 biological_processes subclasses, 14 cell_components subclasses, and 14 molecular_functions subclasses. The largest number of annotations were for biological process (37,112; 39.3%), and the least was molecular_function (26,424; 28%) (Fig. 4).

Figure 4
figure 4

GO functional annotation of adult female spongy moth transcriptomes.

KEGG pathway

Biological functions of transcripts in the spongy moth transcriptomes were identified with the assistance of KEGG (Kyoto Encyclopedia of Genes and Genomes), a large knowledge base for analyzing gene functions and linking genomic information with functional information. We found that there was a total of 21,260 unigenes that mapped to six pathways, including Metabolism, Genetic Information Processing, Environmental Information Processing, Cellular Processes, Organismal Systems, and Human Diseases. Among them, the most unigenes (3118) mapped to the Signal Transduction subgroup under Environmental Information Processing27,28,29(Fig. 5).

Figure 5
figure 5

Mapping of KEGG pathway functions of transcripts in adult female spongy moth transcriptomes.

Strain-specific differences in gene expression

A total of 6692 unigenes were found to be differentially expressed in pairwise comparisons of ASM and ESM transcriptomes, including 5371 up-regulated and 1321 down-regulated DEGs in ASM relative to ESM (Supplementary Table 1). Approximately two orders of magnitude more differentially-expressed genes (DEGs) were observed in pairwise comparisons of ASM and ESM transcriptomes (JGS vs. CT, JGS vs. NJ, ZY vs. CT, ZY vs. NJ) relative to the number of DEGs found with pairwise comparisons of transcriptomes from the same biotype (JGS vs. ZY, CT vs. NJ). The number of the DEGs are shown in Table 3.

Table 3 The number of up-regulated and down-regulated DEGs between four geographic spongy moth strains.

In four pairwise comparisons of ASM transcriptomes with ESM transcriptomes (e.g., JGS vs CT, JGS vs NJ, ZY vs CT, and ZY vs NJ), 306 DEGs consistently exhibited up-regulation and 2,309 DEGs consistently exhibited down-regulation in ASM transcriptomes relative to ESM transcriptomes (Supplementary Table 2). Table 4 lists the 40 DEGs with the greatest degree of up- or down-regulation in ASM-ESM pairwise comparisons. The 20 DEGs with the greatest degree of down-regulation in ASM relative to ESM included cytochrome c oxidase (COX) subunits I, II & III; cytochrome b oxidase; NADH dehydrogenase subunits 1 & 4; ATP synthase subunit 6; glucose dehydrogenase; myelin protein P0 isoform L-MPZ precursor; myelin basic protein isoform X3; myosin-4; and moricin. Among the 20 DEGs up-regulated to the greatest extent in ASM transcriptomes were pancreatic triacylglycerol lipase-like, alkaline C trypsin, calphotin-like, serine protease 1-like, non-specific lipid-transfer protein, L-serine dehydratase/L-threonine deaminase, trypsin precursor AiT6, actin cytoskeleton-regulatory complex protein PAN1-like, NADH dehydrogenase subunit 1, vitellogenin 7 precursor, vitellogenin 2 isoform 1 precursor, and chitin deacetylase 1 & 8.

Table 4 Genes that were consistently differentially expressed in all comparisons in four ASM and ESM transcriptomes.

Within-population differences in gene expression before mating and after mating and oviposition

Because adult female spongy moth flight was found to be significantly reduced after oviposition23, we also prepared transcriptomes from RNA of virgin adult females before mating and adult females after mating and oviposition and examined differences in gene expression. Table 5 shows the number of DEGs up-regulated and down-regulated in moths before mating relative to after mating and oviposition for all four strains. Noticeably more DEGs were identified for the ESM strains compared to the ASM strains.

Table 5 The number of DEGs up-regulated and down-regulated before mating (BM) relative to after mating and oviposition (AM).

No DEGs were identified that were consistently up-regulated or down-regulated in comparisons of the before-mating and after-mating and oviposition of both ASM strains. In contrast, 58 DEGs were found to be consistently up-regulated or down-regulated in these comparisons for the two ESM strains (Supplementary Table 2). All but one of these were up-regulated in the before-mating transcriptomes. Table 6 lists the single DEG which was down-regulated in ESM before-mating transcriptomes and the ten DEGs up-regulated in before-mating transcriptomes with the greatest degree of difference. The up-regulated DEGs included lysocardiolipin acyltransferase 1-like and actin muscle-type A2. One of the up-regulated DEGs in ESM comparisons was also found to be up-regulated in NJ before-mating transcriptomes (Table 5, TRINITY_DN53843_c6_g1).

Table 6 Genes that were consistently differentially expressed (DEGs) in all comparisons of ASM and ESM transcriptomes developed from moths before mating (BM) and after mating and oviposition (AM).

Discussion

The L. dispar genome, at approximately 1.0 Gb31,32, dwarfs the majority of sequenced Lepidoptera genomes that, on average, range from 250 to 500 Mb in size25. Genetic analysis has shown that ASM populations are more genetically diverse than ESM populations33. These features pose challenges to the identification of genes whose expression may account for the distinctive flightworthiness of ASM females. In an attempt to overcome these challenges, we carried out a comparative analysis among transcriptomes developed from adult females of two different ASM populations as well as two different North American ESM populations, and also included a comparative analysis of transcriptomes from virgin females and mated females after oviposition. Comparisons of differentially expressed genes identified genes which were consistently up- or down-regulated in adult ASM RNA samples. Some of the DEGs with the greatest differences in expression level between ASM and ESM populations (Table 4) appeared to bear some relevance to aspects of flight activity, such as flight muscle function and energy production. These DEGS included the following:

Cytochrome oxidase (COX) subunits I, II, and III, and cytochrome b (CytB) (TRINITY_DN55289_c1_g2, TRINITY_DN45473_c0_g2, TRINITY_DN38334_c0_g1, and TRINITY_DN38260_c0_g1, respectively; Table 4 )

Cythochromes were independently discovered in insect systems by Charles MacMunn and David Keilin. Keilin observed that “among all organisms examined, the highest concentration of cytochrome is found in the thoracic muscles of flying insects”34,35. Indeed, the high amount of cytochromes in insect flying muscle suggests that they play a role in biological oxidation and energy transmission, which is consistent with the large energy demand that flight activity places on this special tissue36,37. In mosquitoes, reductions in flight muscle mitochondrial metabolism triggered by a blood meal would lead to directing spare nutrients from flight muscle to ovaries in support of oogenesis38,39.This is part of the "flight-oogenesis syndrome," a physiological process in which some migratory insects switch between two energy-intensive states: migration and reproduction40. Evidence shows that flight metabolism and dispersal potential are tightly linked to cytochrome oxidase (COX) function. For example, long-distance migratory butterfly species have higher COX content and activity than short-distance fliers, and recently established populations of Melitaea cinxia butterflies have higher COX activity and dispersal potential than old ones41. This means that the relationship between dispersal potential and COX activity can also be observed within the same flying insect species. Given these observations, the finding that cytochrome b (CytB) and COX subunit I, II, and III DEGs were significantly down-regulated in ASM compared to ESM appears counterintuitive. However, mitochondrial COX genes showed evidence of relaxed selection in flightless as compared with flying lineages, as demonstrated by significantly higher dN/dS ratios in flightless lineages42. If there is also a lack of purifying selection pressure on the COX and CytB alleles of ESM, then the relevance of up-regulation of these genes for spongy moth female flight is unclear.

Vitellogenin 7 precursor (TRINITY_DN59741_c2_g1), vitellogenin 2 isoform 1 precursor (TRINITY_DN59741_c3_g2), vitellogenin-like (TRINITY_DN47254_c0_g1)

In many oviparous species, vitellogenin (Vg) is a crucial precursor protein of egg yolk vitellin (Vn)43, which acts as an energy store. Vg is involved in oocyte maturation and development, making it a crucial protein involved in insect reproduction. A study of Harmonia axyridis reveals that Vg expression leads to increased egg production44. Three of the most up-regulated DEGs in ASM relative to ESM matched with Vg genes. While vitellogenin synthesis occurs primarily during the last larval instar in spongy moth, a Northern blot study detected relatively small steady-state quantities of vitellogenin RNA in adults, suggesting that completion of oogenesis initiated during the larval and pupal stages may require some small degree of vitellogenin synthesis early during the adult stage45. It is generally accepted that female spongy moths produce 500–1000 eggs, but there are no data indicating a consistently significant difference in egg production between ASM and ESM. There has been no formal study on the applicability of flight-oogenesis syndrome to ASM females, but some recent research suggests that there is not always an obvious trade-off between insect migratory flight and reproduction46.

NADH dehydrogenase subunit 1 (TRINITY_DN57574_c0_g1, TRINITY_DN50963_c0_g2), NADH dehydrogenase subunit 4 (TRINITY_DN42442_c0_g1)

NADH dehydrogenase is involved in aerobic respiration and ATP synthesis47. Solitary locusts have higher initial flight speeds and shorter flight distances than gregarious locusts, and exhibited higher mitochondrial energetic storage (Acetyl-CoA and NADH), energy metabolic gene-expression levels, and metabolic enzyme activities in their flight muscles than their gregarious counterparts48. While NADH dehydrogenase subunit DEGs (for subunits 1 and 4) were found to be down-regulated in ASM, one DEG with matches to different NADH dehydrogenase subunit 1 sequences were up-regulated in ASM, suggesting either sequence divergence at this locus or two alleles with biotype-specific differences in their regulation.

Myosin (TRINITY_DN58997_c0_g1, TRINITY_DN100147_c0_g1, TRINITY_DN22477_c0_g1, etc.), Actin (TRINITY_DN6120_c0_g1, TRINITY_DN59230_c5_g1, TRINITY_DN47814_c0_g1, etc.)

Actin, a filamentous protein (42 kD) involved in muscle contraction in both smooth and striated muscle, also serves as an important structural molecule for the cytoskeleton of many eukaryotic cells. It is the main constituent of the thin filaments of muscle fibers. Actin participates in many important cellular processes, including muscle contraction, cell motility, cell division and cytokinesis, vesicle and organelle movement, cell signaling, and the establishment and maintenance of cell junctions and cell shape. Actin filaments, usually in association with myosin, are responsible for many types of cell movements. Myosin is a type of molecular motor and converts chemical energy released from ATP into mechanical energy. This mechanical energy is then used to pull the actin filaments along, causing muscle fibers to contract and, thus, generating movement49. Actin and myosin are found in every type of muscle tissue. Thick myosin filaments and thin actin filaments work together to generate muscle contractions and movement. We found multiple myosin and actin genes down-regulated in ASM relative to ESM (Supplementary Table 2). Given that ASM female adults have strong flight ability, while ESM has no flight ability, the relevance of higher expression of these genes in ESM for spongy moth female flight is unclear. In DEGs compared within ESM populations, some actin genes were also found to be significantly higher before than after mating, which may reflect reduced muscle activity after mating.

Comparison of gene expression before mating and after oviposition

Since ASM females are the ones in this study that are flightworthy, it was anticipated that meaningful differences in expression of genes before mating and after mating and oviposition would be observed in the transcriptomes of the JGS and ZY ASM strains and not in the flightless CT and NJ strains. However, few DEGs were identified in comparisons of ASM transcriptomes before mating and after oviposition, and no DEGs were found to be consistently up- or down-regulated in comparisons of these ASM transcriptomes. Thus, the results suggest that gene expression differences might not be the principal basis for the reported reduction in flight capacity of ASM females after oviposition17. It is interesting that comparisons of ESM transcriptomes before mating and after oviposition disclosed many more DEGs than the corresponding ASM comparisons, though the significance of this is unclear.

In conclusion, the results in this paper represent the first transcriptomic examination of gene expression in adult spongy moths. While DEGs with functions relevant to moth flight activity were identified in adult ASM and ESM transcriptomes, the trends in the differences in expression of these genes did not appear to be consistent with the differences in flight capabilities of ASM and ESM. DEGs expected to be up-regulated in flight-worthy L. dispar strains (such as cytochrome oxidase subunits, NADH dehydrogenase subunits, myosins and actins) often were found to be down-regulated instead. These results may reflect the possibility that the differences in gene expression relevant to female flight were subtle and hard to detect under the conditions the adults were sampled. Alternatively, differences in gene expression in ASM and ESM adults may affect flight capability by an unknown mechanism. It may also be the case that differences in gene expression of direct relevance for flight capability do not occur in adults of ASM and ESM. In addition, an examination of DEGs in adult females before mating and after mating and oviposition yielded no clues for why ASM female flight is reduced after mating and oviposition, but did reveal significant differences in gene expression before mating and after oviposition among ESM adults of two populations.

In a previous study, adult female wing size and wing load (body mass/wing area) were found to differ significantly among geographic strains of different biotypes as well as the same biotype, with larger wing sizes and lower wing loads observed in strains with greater flight capability20. This observation suggests that differences in the expression of genes controlling wing and body morphogenesis during development may account for strain-specific flight capability. A comparison of pupal transcriptomes may reveal differences in transcription during wing and wing muscle development in the pupal stage that may help to unlock the mystery of differential flight ability among the two spongy moth biotypes.

Materials and methods

Insect materials and RNA extraction

We analyzed four strains of Lymantria dispar from colonies derived from different geographic populations. Specimens of two ASM strains (L. dispar asiatica) from China were obtained from the Plant Quarantine Laboratory of Beijing Forestry University in 2019, the egg masses of JGS and ZY are collected in the wild from host trees, usually larches. After being brought back to the lab, then reared on artificial diet until pupation. Specimens of two ESM strains (L. dispar dispar) were obtained from the U.S. Department of Agriculture Animal and Plant Health Inspection Service Plant Protection and Quarantine50 program. Samples were processed at the Beijing Forestry University Plant Quarantine Laboratory in 2021 (Table 7). Eggs of all strains were hatched and the larvae reared on an artificial diet in a greenhouse under controlled conditions (temperature: 28 ± 0.5 °C)51. For pre-mating RNA samples, male and female pupae were separated prior to eclosion, and virgin females were harvested 24 h after hatching, because flight activity peaked when females were one day old and decreased thereafter. For post-oviposition RNA samples, females were mated with males and harvested within one hr after oviposition to avoid post-spawning mortality. All the samples were frozen in liquid nitrogen and immediately stored at -80℃.

Table 7 Location information for the four sampling sites of the spongy moth, Lymantria dispar.

RNA-Seq

RNA-seq data were generated from three biological replicates (five specimens/replicate) for each L. dispar strain, pre-mating and post-oviposition. Moths of each replicate were homogenized separately in two 2.0 mL tubes containing Lysing Matrix A (MP Biomedicals, Solon, OH, USA) and Lysis/Binding Solution from the mirVana™ miRNA Isolation Kit (ThermoFisher Scientific, Waltham, MA, USA) using a FastPrep-24™ Tissue and Cell Homogenizer (MP Biomedicals, Solon, OH, USA) set at 4.0 m/s and run for 40 s. Insoluble material was pelleted by centrifugation, and total RNA was recovered from the supernatants with the mirVana™ miRNA Isolation Kit. The RNA obtained was treated with DNase I (Invitrogen), and magnetic beads with oligo (dT) were used to isolate poly(A) + messenger RNA (mRNA), which was sheared into short fragments using a fragmentation buffer. Under the action of reverse transcriptase, six-base random primers (random hexamers) were added to synthesize one-stranded cDNA using mRNA as a template, followed by two-stranded synthesis to form a stable double-stranded structure. Termini of the double-stranded cDNA structure were blunted, and a terminal adenosine was added to the 3' end to facilitate library construction. The samples were submitted to the Majorbio Technologies company (Beijing, China) for quality assessment, construction of a non-stranded library using random hexamer priming, and 2 × 150 bp paired-end sequencing on an Illumina Novaseq 6000 instrument. RNA-seq data are available at National Center for Biotechnology Information (NCBI) Sequence Read Archives (SRA) under Bioproject accession numbers PRJNA789495 and PRJNA788963.

Assembly and annotation of the transcriptome

Trinity (v2.13.2)52 was used for initial de novo assembly of Illumina sequence reads. The results of assembly with Trinity were then optimized for filtering and re-evaluated using TransRate (v1.0.3)53, CD-HIT (v4.8.1)54, and BUSCO (v5.2.2, using the arthropoda_odb10 database)55. The TGI Clustering Tool (v2.1) was employed to assemble the transcripts into unigenes56. The unigene assembly set is publicly available at the Open Science Framework (OSF) repository at (https://doi.org/10.17605/OSF.IO/PME7K). All unigenes obtained were compared with six databases (NR, Swiss-PROT, Pfam, COG, GO and KEGG databases) to provide annotation for the sequences in each database, and the annotation of each database was statistically analyzed. All unigenes were searched against these databases using BLAST (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.29/) (e-value < 10−5). Protein function was predicted according to the most similar proteins annotated in these databases. Principle Components Analysis (PCA) was performed using the built-in prcomp function of R, which uses singular value decomposition, generally providing better numerical accuracy.

Differential gene expression analysis

Differentially expressed genes (DEGs) were identified using DESeq2 (v1.18.1)57 together with salmon (v0.11.3), a read abundance quantification tool, operating in its quasi-alignment mode58. The R package, tximport (v1.20.0)59, was used to prepare counts at the gene-level as a function of transcript-level counts. Differential analysis was performed on these inputs using DESeq2’s DESeq function. Genes determined by salmon to have non-zero expression levels were flagged as differentially expressed by DESeq if they exhibited at least a two-fold difference in expression levels between the statistical factors being compared; furthermore, these were required to exhibit an adjusted p-value of 0.05 or less (alpha = 0.05, lfcThreshold = log2(2), altHypothesis = “greaterAbs”). Expression was also estimated at both the transcript and gene levels with RSEM (v1.2.24)60 using results from the bowtie2 short read aligner (v2.3.4.1)61 as input. RSEM-estimated abundances were expressed using the transcripts per million measure (TPM)62. Transcript sequences were aligned against the 2 February 2022 version of the NCBI NR protein database using DIAMOND (v0.9.22)63 in its BLASTX-like mode with default parameter settings. The top hit per query, if any, was recorded-if multiple best-scoring hits were encountered, a representative match was arbitrarily selected.