Antennal transcriptome sequencing and identification of candidate chemoreceptor proteins from an invasive pest, the American palm weevil, Rhynchophorus palmarum

For decades, the American palm weevil (APW), Rhynchophorus palmarum, has been a threat to coconut and oil palm production in the Americas. It has recently spread towards North America, endangering ornamental palms, and the expanding date palm production. Its behavior presents several parallelisms with a closely related species, R. ferrugineus, the red palm weevil (RPW), which is the biggest threat to palms in Asia and Europe. For both species, semiochemicals have been used for management. However, their control is far from complete. We generated an adult antennal transcriptome from APW and annotated chemosensory related gene families to obtain a better understanding of these species' olfaction mechanism. We identified unigenes encoding 37 odorant-binding proteins (OBPs), ten chemosensory proteins (CSPs), four sensory neuron membrane proteins (SNMPs), seven gustatory receptors (GRs), 63 odorant receptors (ORs), and 28 ionotropic receptors (IRs). Noticeably, we find out the R. ferrugineus pheromone-binding protein and pheromone receptor orthologs from R. palmarum. Candidate genes identified and annotated in this study allow us to compare these palm weevils' chemosensory gene sets. Most importantly, this study provides the foundation for functional studies that could materialize as novel pest management strategies.


Scientific Reports
| (2021) 11:8334 | https://doi.org/10.1038/s41598-021-87348-y www.nature.com/scientificreports/ rogen, Life Technologies), and the RNA integrity was further confirmed using a 2100 Bioanalyzer (Agilent Technologies). After confirming the quality and the characteristic 'hidden break' in 28S RNA profile using 2100 Bioanalyzer, we proceeded to paired-end cDNA library preparation using TruSeq Stranded mRNA preparation Kit (Illumina Inc.) following manufacturer's protocols, which include the following steps: purification and fragmentation of total RNA, first and second-strand cDNA synthesis, 3′ end adenylation, adapter ligation and purification. Finally, the purified and PCR-enriched products were used for cDNA library preparation. The cDNA libraries were validated and quantified by Qubit 2.0 Fluorometer. The HiSeq Illumina sequencing was performed at the core sequencing facility of the King Abdullah University of Science and Technology (KAUST), Jeddah, Saudi Arabia. Image deconvolution and quality value calculations were performed using Illumina GAPipeline1.3.
Data processing, assembly, and gene annotation. Data processing, transcriptome assembly, and functional annotation were carried out by following the method described previously 28 , and contigs were identified and annotated based on local Blast search using R. ferrugineus chemosensory protein sequences, using Geneious v7.1.5 (http:// www. genei ous. com). The top blast hit transcript clusters were extracted from the male and female assembled transcriptomes with an in-house command-line script. Relevant sequences were translated and manually selected based on the following criteria: e-value score below 1.0E−5, similarity to other chemosensory proteins, and with an ORF with at least 50% the average length traditionally observed for each gene family. Selected sequences were numbered according to their estimated expression, which was obtained by RPKM values 45 . To assess the transcriptomes' completeness, an Arthropoda BUSCO database, consisting of 1066 core genes that are highly conserved single-copy orthologues 46,47 , was used to query the transcriptome fasta files. For this process, the gVolante web server (https:// gvola nte. riken. jp/) was utilized with the following parameters: min_length_of_seq_stats: 1, assembly_type: trans, Program: BUSCO_v2/v3, selected reference_gene_set: Arthropoda.
Phylogenetic analysis of the candidate chemosensory proteins. To Figure S1a and S1b). Top-blast hits were substantially distributed to S. oryzae (12,881 and 12,730 in male and female, respectively), likely reflecting the degree of relatedness between these two species in the context of genetic information that is available in the NCBI database. BLASTx and InterProScan results were utilized for functional GO annotations. For both transcriptomes, a majority of GO annotations were derived from Interpro and Uniprot databases ( Figure S1a and S1b). For the male transcriptome, Blast2Go (B2G) and InterProScan analyses were performed on 56,786 transcripts. Blast hits were identified for 24,106 transcripts, of which 4041 yielded no GO annotation, 4504 were assigned GO terms but could not be functionally annotated, and 15,561 were B2G annotated. ( Figure S1a). For the female transcriptome, B2G and InterProScan analyses were performed on 53,791 transcripts. Blast hits were identified for 23,455 transcripts, of which 3873 yielded no GO annotation, 4115 were assigned GO terms but could not be functionally annotated, and 15,467 were B2G annotated ( Figure S1b).
For both male and female transcriptomes, level 2 GO term distributions were consistently ranked. Within the "biological processes" GO ontology, the "cellular process", "metabolic process", and "biological regulation" terms were the most abundantly assigned to the transcripts. In the "molecular function category", "binding" and "catalytic activity" were the most abundant assignments. In the "cellular components" category, "cellular, the anatomical entity" and "intracellular" were the most abundant assignments ( Figure S1a and S1b).
RPKM based transcript abundance calculation in each male and female transcriptome revealed several uncharacterized proteins as most abundant transcripts in both, followed by heat shock 68-like proteins (Supplementary Table S1). Within the chemosensory gene families, OBPs were the most abundant group, with RpalOBP2 being the most abundant transcript in both males (RPKM: 5658.70) and female (RPKM: 11,523.09) transcriptome.

Odorant-binding proteins (OBPs).
We identified 37 candidate OBPs in the antennae of APW adults through our qualitative transcriptome analysis, 22 of which presented full-length open reading frames (ORFs). The average sequence length of the annotated OBPs was 853 bp. Four candidates OBPs (RpalOBP8, 9, 12, and 16.1) were not found in the female transcriptome, whereas only one (RpalOBP16.2) was absent from the male transcriptome. Except for OBP2 and paralogs (4.1 and 4.2), all other OBPs were named accordingly to the relative transcript abundance, as reported in the case of R. ferrugineus 42 , in which OBP2 and OBP1 were the most highly abundant transcripts followed by OBP3 and OBP4 paralogs. In contrast, all the other predicted OBPs were at least four times less abundant than these four (Table S2). The phylogenetic analysis confirmed the closeness between RpalOBP2, RpalOBP4.1, and RpalOBP4.2 and their orthologs from R. ferrugineus, RferOBP107, RferOBP23, and RpalOBP3213, respectively, with all of them clustering together in the so-called "antennal binding proteins II (ABPII)" subfamily of OBPs (Fig. 1). The RpalOBP4.1 R. ferrugineus ortholog, RferOBP23, is a highly expressed OBP previously classified as a candidate PBP as its silencing slightly impaired ferrugineol detection 42 . Both RpalOBP4.1 and RferOBP23 belong to a phylogenetic cluster that includes other well studied pheromone-binding proteins like PjapPBP 52 . RpalOBP1, being the most highly expressed OBP, clustered with a previously characterized antennal-specific RPW PBP, RferOBP1768 42 . They defined another phylogenetic OBP cluster within the Minus-C family, although they shared less than 25% sequence identity. Furthermore, several other OBPs from R. palmarum clustered together in this group along with other RferOBPs. Within this cluster, RpalOBP10 shared close sequence similarity (84% identical residues) with the PBP RferOBP1768. RpalOBP21 was the only RpalOBP that grouped in the CRLBP subfamily, sharing 87.02% identity with RferOBP14025. We observed that around two thirds of the predicted RpalOBPs belonged to Minus-C subfamily with the characteristic absence of C2 and C5 cysteine residues ( Figure S2), five were ABP-II subfamily members (RpalOBP2, 5, 4.1, 4.2, 11) four were classic OBPs (RferOBP6, 15, 31, 18) and remaining were classified as GOBPs/PBPs (Fig. 1).

Chemosensory proteins (CSPs).
A total of 10 CSPs were identified in the antennal tissue of APM adults, eight of which with full-length ORFs. All the detected candidate genes were found in both male and female transcriptomes. For the predicted CSPs, an average sequence length of 950 bp was obtained. This number of detected RpalCSPs is similar to what has been observed in its closest relative, R. ferrugineus (12 CSPs), or in other antennal transcriptomes in more divergent Coleoptera, such as Tenebrio molitor (12), Rhyzoperta dominica (8), and D. ponderosae (6) 22,23,28,53 . Except for RpalCSP7, all of these CSPs contained the conserved pattern of four cysteine residues (Cys-X6-Cys-X18-Cys-X 2-Cys, where X represents any amino acid) ( Figure S3). The most highly expressed contig (Table S3), RpalCSP1 (with 2239.41 RPKM), shares 91.5% identity with the RferCSP213 from R. ferrugineus, whereas the second most abundant, RpalCSP2, shares 92.0% with RferCSP2115. These sequences, in addition to RpalCSP4, RpalCSP5 and RpalCSP6, clustered together with several coleopterans specific CSPs, a feature also observed with some RPW CSPs. Most RpalCSPs are related to sequences from D. ponderosae or R. ferrugineus (Fig. 2) www.nature.com/scientificreports/ Regarding the different CSPs found within R. palmarum antennae, the percentage of identical residues ranged from 17.4 to 62%, indicating how divergent this gene family is. However, as it is characteristic of this gene family, predicted APW CSPs were very well conserved between species, with at least five out of ten CSPs corresponding to closely related RPW orthologs with percentages of identical residues 75-94%. Still, several of the other RpalCSPs clustered with CSPs from D. ponderosae and A. plannipenis rather than with those from RPW (Fig. 2).

Sensory neuron membrane proteins (SNMPs).
The group of proteins known as SNMPs belongs to the CD36 superfamily of proteins in insects, which are functionally associated to signal facilitation in response to external stimuli, in the case of SNMPs meaning environmental chemical stimuli 54 . Our analysis determined four candidate SNMPs in R. palmarum adults' antennae in both male and female transcriptomes, with only one candidate showing a full-length ORF. The sequences found presented an average length of 2594 bp. These sequences were highly divergent among them (15.3-28.3% identical residues). High percentages of identical residues and strong bootstrap support imply the proximity between the orthologs of both APW and RPW species (Fig. 3). The most abundant contig (Table S4), RpalSNMP1 (RPKM: 882.57), the shared identity of 88.3% with its putative ortholog in R. ferrugineus, RferSNMPu1. Both APW and RPW SNMP1 sequences clustered in the subfamily of SNMPs known as SNMP1 proteins, which is divided into SNMP1a and SNMP1b. Whereas RpalSNMP1 belongs to the "a" division, RpalSNMP2 likely belongs to the "b" division of the SNMP1 subfamily. The second most abundant transcript, RpalSNMP2, shared 75.6% identity with RferSNMP928. RpalSNMP3 and 4 showed 79 and Node colours represent bootstrap support (n = 100) with gradient green to red representing bs 40 to 100. OBP subfamilies Minus-C, ABPII, CRLBP (Red), and classic/GOBPs/PBPs are labeled. Sequence names of R. ferrugineus and R. palmarum have been colored in red and blue font, respectively. Two clusters with wellcharacterized pheromone-binding OBPs (PBPs) and one CRLBP clade are marked in red. Gustatory receptors (GRs). Whereas 15 GRs have been previously identified in the RPW antennal transcriptome, only seven-candidate GRs were found expressed in the antennae of APW adults, with only three of them showing full-length ORFs. The average length was 1537 bp. Interestingly, one single candidate GR (RpalGR2) was found only in the female transcriptome, whereas the rest were present in both sexes. The conserved C-terminal motif of "TYhhhhhQF", characteristic of GRs, was found in five of the seven predicted GRs ( Figure S4). The most abundant transcript (Table S5) from this family of receptors (RpalGR1, with 25.98 RPKM) clustered together with receptors tuned presumably to bitter compounds. Other predicted receptors such as RpalGR3, RpalGR4, and RpalGR7 were grouped in the same clade. RpalGR2 clustered within a group of conserved candidate sugar receptors, while RpalGR6 clustered with what has been mentioned as a fructose receptor in D. ponderosae. RpalGR5 clustered in a very conserved and strongly bootstrap-supported group of receptors putatively tuned to CO 2 (Fig. 4). Interestingly, RpalGR2 and RpalGR6 were the only receptors for which www.nature.com/scientificreports/ direct orthologs were not identified in the antennal transcriptome of R. ferrugineus, whereas RpalGR1, RpalGR3, RpalGR4, RpalGR5, and RpalGR7 presented identity percentages with RferGRs that range between 23.9 and 95.8%. Despite the fact that the low number of GRs identified, the phylogeny showed that the RpalGRs are well distributed among the different functional clades, with at least one GR found in each type (Fig. 4).  55 , and the previous classification 28 , the predicted RpalORs distributed among all the families except for families III, IV, and VI (Fig. 5), as RferORs did. Within the 63 APW ORs, including Orco, most of the predicted receptors clustered in four subfamilies: 13 ORs in subfamily I, 18 ORs in subfamily II, one (RpalOR62) in subfamily V and the remaining 30 ORs in subfamily VII. www.nature.com/scientificreports/ When looking at proposed pheromone receptors (PRs) from R. ferrugineus 28 , it becomes clear that APW shared some closely related orthologs: the sequence RferOR43 and RpalOR43 shared 85% identity, RferOR44 and RpalOR44 shared 78% identity, and RferOR63 and RpalOR62 shared 69% identity. However, none of these proposed RPW pheromones ORs have yet been functionally characterized, but a recent report identified RferOR1 43 as the aggregation pheromone receptor in the RPW. Our phylogeny identified a potential functional ortholog in APW, RpalOR1 that shared 82.24% amino acid identity with RferOR1. RpalOR20 was also found in the same phylogenetic cluster as RpalOR1 and RferOR1, potentially defining a palm weevil pheromone receptor clade (Fig. 5).

Ionotropic receptors (IRs).
We identified a total of 28 putative IRs from the antennal transcriptome of R. palmarum, 10 of them with full-length ORFs. The average sequence length of the putative receptors was 2412.5 bp. The most highly abundant contig was named RpalIR8a with 32.82 RPKM (Table S7) since it was closer to DponIR8a and DmelIR8a than to other candidates IRs. Similarly, RpalIR25a was named after its percentage of identical residues with DmelIR25a (78.75%) and DponIR25a (92.52%), and RpalIR76b was named after its GR subfamilies are named and colour coded as follows: purple for candidate CO 2 tuned receptors, pink for putative fructose receptors, green for putative conserved general sugar receptors, and dark green for putative bitter receptors. Sequence names of R. ferrugineus and R. palmarum have been colored in red and blue font, respectively.  3% identical residues). These three receptors are putatively broadly expressed co-receptors. At least ten predicted sequences were categorized as ionotropic glutamate receptors (iGluRs), three of them grouped with RPW candidates to be N-methyl-D-aspartame (NMDA) receptors, whereas the rest were classified as non-NMDA iGluRs. One single predicted receptor was classified in the group of divergent IRs, whereas the rest were classified as antennal IRs (Fig. 6).

Discussion
Palm weevils R. ferrugineus and R. palmarum are invasive pests responsible for millions of economic losses. One of the current management approaches is based on placing pheromone blends (ferrugineol, rhynchophorol) and kairomones (ethyl acetate, ethanol, acetoin, and sugar ferments) on traps that allow capturing as many insects as possible, a strategy called mass trapping 30,[35][36][37][38][39][40][56][57][58] . Fortunately, behavioral and ecological similarities between these closely related species have permitted the development of dual management technologies 33 . However, misuse of mass trapping sometimes fails to protect all the palms in a plantation, and oviposition by these insects usually associates with immediate palm loss. In this sense, the possibility to develop new management strategies based on "reverse" chemical ecology by targeting proteins with chemosensory function holds  www.nature.com/scientificreports/ promise as a futuristic and elegant way to disable insect host and mate finding. Furthermore, understanding the molecular machinery of chemoreception can also help understand insect behavioral specializations based on genetic divergence 59,60 . A total of 37 OBPs, 10 CSPs, 4 SNMPs, 7 GRs, 63 ORs, and 28 IRs have identified in our APW antennal transcriptomes. Except for two OBPs, all the sequences identified are entirely novel. Compared to the antennal transcriptome analysis provided for R. ferrugineus, our transcriptome analysis for R. palmarum presented a larger number of reads and contigs identified (Table 1). Interestingly, only five OBPs, one GR, and two ORs were differentially expressed between sexes. Since this species utilizes an aggregation pheromone blend detected by both males and females, it is expected that both sexes share common grounds for most of their life cycles. However, the existence of differentially expressed chemosensory transcripts may indicate the detection of important cues for sex-specific activities such as oviposition. However, due to the lack of the functional characterization of any of these transcripts, the meaning of such difference remains unknown.
In the APW antennal transcriptome, the most abundant chemosensory transcripts were determined as putative OBPs, particularly RpalOBP2 and RpalOBP1. Within RpalOBPs, we pinpointed putative APW PBPs, based on their high expression (RpalOBP2) and/or phylogenetic position close to characterized coleopteran PBPs (RpalOBP4. 1, 4.2, 10, 12, and 14). The most promising candidate consisted of RpalOBP10, as it is the closest ortholog of a functionally confirmed RPW PBP, RferOBP1768 42 , and both share 84.07% identical residues. Both OBPs are members of the Minus-C subfamily, which has been proposed in the past to contain pseudogenes, non-olfactory OBPs, or a novel type of OBPs characteristic from Coleoptera 61 . The functional characterization of RferOBP1768 demonstrates that at least its corresponding gene is not a pseudogene. Another interesting APW PBP candidate consists of RpalOBP6, as it is closely related to one OBP of the bamboo snout beetle Cyrtrotrachelus buqueti, which has been shown to strongly binds to the pheromone analog dibutyl phthalate, and much less to general plant volatiles 62 . Further functional characterization work with these candidates is needed, considering that OBPs and ORs are candidates to be used as targets for management techniques 42,43 . www.nature.com/scientificreports/ Looking at CSPs, it is interesting to note that APW lacks some CSPs in RferCSP-defined clades. It is possible that we missed these CSPs in the APW transcriptome, but given this transcriptome coverage compared to the RPW one, a more plausible interpretation is that both species have evolved different sets of CSPs. Hypothesizing these proteins' specific function is daunting and probably impossible, especially considering that CSPs' functional characterization has been demonstrated only through fluorescent binding assays of CSPs from species of other insect orders [63][64][65][66] . Furthermore, evidence indicates that these proteins may have different non-olfactory roles in embryonic development and leg regeneration 67,68 .
Our transcriptome analysis found four SNMPs, all of them presenting closest homology to R. ferrugineus SNMPs (Fig. 3). Several works on the SNMP subfamily showed that SNMP1 proteins are usually highly expressed in pheromone receptor neurons, indicating their putative involvement in pheromone detection, a function later evidenced in D. melanogaster and more recently in B. mori 10,69,70 . However, whether the weevil SNMP1s are indeed involved in detecting pheromone components remains to be investigated, especially considering that the function of SNMPs has not been evidenced in any Coleoptera. RpalSNMP3 and RpalSNMP4 clustered in the SNMP2 subfamily, which leads us to hypothesize that they may be expressed in supporting cells surrounding OSNs in antennae, legs, and wings, as it has been observed in other insects [71][72][73] , and are probably involved in different functions than pheromone detection.
Our transcriptome analysis focused only on antennal expression when it comes to gustatory receptors, which is why we found solely seven GRs. In contrast, genome analyses of other Coleoptera such as the emerald ash borer and the mountain pine beetle identified 30 and 60 GRs, respectively 61 . The number of predicted RpalGRs was lower than that in the RPW antennal transcriptome (15) 28 . The highest percentage of identical residues between RPW and APW GRs was observed between RpalGR5 and RferGR29259, both clustering in the putatively CO 2 tuned GR clade, a highly conserved clade across insect orders. These receptors' suggestion may be responsive to CO 2 is relevant according to these palm weevils' ecology. Several works on this genus have shown that fermenting compounds from plant tissue synergize the behavioral response to different Rhynchophorus species' pheromone, probably because these compounds indicate damaged palms as suitable places to find female partners for males and oviposition sites for females 74 . As a blatant product of fermentation, carbon dioxide could be part of the chemical signatures that these insects use to find the host. Sugars may also be important cues for host identification, and our transcriptome identified at least 2-candidate sugar/fructose GRs in APW, although more have been annotated in RPW. However, speculation on the function of these and other GRs is difficult, as no coleopteran GRs have been characterized to date.
Looking at ORs, we identified RpalORs in all the protein subfamilies that contained RferORs (Fig. 5). The RpalOR distribution in these subfamilies was similar to that of RferORs, D. ponderosae ORs, and I. typographus ORs, with an abundance of ORs subfamilies I, II, and VII, which may indicate some OR expansion or specialization in these species. It is important to notice that despite the phylogenetic relatedness between both palm weevils, we evidenced at least four ORs that may represent independent expansions in R. palmarum (RpalOR69a and 17a) (Fig. 5). This is not wholly unexpected since, despite the similarities between both species, they are distributed in different geographical locations whose characteristics may act as drivers of genetic differences in chemoreception. More information will come from further OR functional characterization. Indeed, only a small number of Coleoptera OR functional studies have been conducted, identifying mainly pheromone receptors (Fig. 5). Anyhow, this can serve as a basis to propose candidate PRs in the APW. The most promising APW PR candidate is RpalOR1, closely related to the sole Rhynchophorus OR characterized to date, namely RferOR1. Further confirmation of the RpalOR1 function in the APW pheromone detection would be an exciting finding since the APW and RPW do not share the same pheromones. APW responds to rhynchophorol (2(E)-6-methyl-2-hepten-4-ol), whereas RPW utilizes a blend of ferrugineol (4-methyl-5-nonanol) and ferrugineone (4-methyl-5-nonanone) 30,31,56,75 . RferOR1 has been shown to be tuned towards ferrugineol and ferrugineone, but not to rhynchophorol 43 . If RpalOR1 appears to have opposite tuning, this will open the way to structure-function relationship studies, the ~ 20% amino acid difference altering the pheromone-binding. In R. palmarum, electrophysiological work has shown the existence of some OSNs whose responses to pheromone are synergized by acetoin stimulation 76 . Further functional work on RpalOR1 would allow a better understanding of some kairomones' synergistic effects on pheromone detection at the OR level.
Ionotropic receptors have been traditionally associated with acid and amine detection. Our analysis identified at least 14 putative antennal IRs (Fig. 6). Considering that fermentation products seem to be essential for aggregation and oviposition of palm weevils, it is expected that some of such IRs may be tuned to compounds of yeast and bacterial origin, especially those more closely related to deorphanized D. melanogaster IRs receptors such as IR75a and IR75d, tuned to acetic acid and pyrrolidine, respectively 77 . A comparison between APW and RPW IRs revealed that most APW IRs have a direct orthologous IR from RPW. Within the antennal IR, however, subfamily, there are several cases of divergence between IRs from the two species. Whether these IRs perform different functions in each palm weevil species is unknown since it is challenging to assume any particular function for the IRs identified.
This comprehensive analysis of chemosensory related proteins provides a fundamental resource to better understand the APW chemoreception and serve as a basis for comparative studies with the APW Asian counterpart, the RPW. We highlighted differences and commonalities between these closely related species of weevils that share palms as their hosts but occupy different, non-overlapping areas globally. Further functional studies will better understand how olfactory gene evolution correlates with their host range and ethology 78 . The advent of higher computational power and machine-learning technologies will increase the capacity to find new ligands when the receptors are known 79 . Detecting conserved genetic traits and their functional bases emerges as an essential tool to predict and identify pheromone component and kairomone responses and, consequently, deepen our knowledge of pest insect chemical senses that could improve their management.