Introduction

The olfactory system is very important to insects because it is involved in various insect behaviors, such as locating suitable hosts, avoiding predators, identifying oviposition sites, and finding sexual partners1. The antenna is the major organ for insect olfactory sensing, especially for olfaction. Mounting evidence suggests that diverse olfactory genes are involved in the signal recognition process, including odorant receptors (ORs), ionotropic receptors (IRs), odorant binding proteins (OBPs), gustatory receptors (GRs), chemosensory proteins (CSPs), and sensory neuron membrane proteins (SNMPs)2,3,4. Olfactory signal transduction can be summarized as follows: Firstly, the hydrophobic chemical compounds penetrate the sensillar lymph through pores, wherein they are recognized and bound by OBPs5,6 or CSPs7. Secondly, it was speculated that the OBPs or CSPs were the transporters that transferred odorants through the sensillar lymph to ORs, a family of integral membrane proteins, located on the dendrites of olfactory receptor neurons (ORNs)8,9,10. Additionally, SNMPs11, IRs12,13 and GRs14 have also been proposed to play a role in insect olfaction.

Although insect ORs are seven-transmembrane domain (TMD) proteins with a reversed membrane topology (intracellular N-terminus)15,16, they do not belong to the G protein-coupled receptors. In the transduction process, ORs appear to be the primary mechanism by which insects detect volatile chemicals, facilitating the conversion of the chemical message to an electrical signal, such as a biological transducer17,18. It is generally thought that each ORN expresses a highly conserved OR co-receptor (Orco protein) and a divergent, conventional ORx, such that the heterodimer of the Orco-OR forms an ion channel and mediates odorant-binding specificity19,20,21. ORs are broadly tuned to a variety of volatile chemicals, including pheromones, plant volatiles, and odor molecules present in the environment17,22,23.

Coleoptera species account for approximately 25% of all known species of animal life-forms24. Almost 40% of all previously described insect species are beetles25. However, compared with Lepidoptera, olfaction genes identified in Coleoptera are poorly known. To date, there are only about 20 species of Coleoptera for which olfactory genes have been identified, such as Tribolium castaneum25,26, Megacyllene caryae27, Leptinotarsa decemlineata28, Phyllotreta striolata29, Colaphellus bowringi30, Pyrrhalta maculicollis31, P. aenescens31, Ambrostoma quadriimpressum32 and Galeruca daurica33. Thus, much work is needed to investigate and better understand olfaction and its associated molecular biology in other species of Coleoptera. Ophraella communa LeSage (Coleoptera: Chrysomelidae) originated in North America, and it is considered a potential biological control agent of common ragweed, Ambrosia artemisiifolia L. (Asteraceae)34. Both adults and larvae feed on the leaves of common ragweed, resulting in severe defoliation35. Since the beetle was first discovered in Nanjing, Jiangsu province (China) in 200136, it has been reported widely in eastern and central China, where it has significantly suppressed the population of common ragweed37. Zhou et al.38 reported that when the olfaction of male O. communa was hindered by covering their antennae with paint, the males spent significantly more time seeking mates in the arena, indicating that olfaction is important to the mating process. However, the molecular mechanism of olfaction recognition in this insect is still unknown. In this study, we performed a transcriptome analysis of the adult antennae of O. communa and identified 105 candidate chemosensory genes, including 30 ORs, 25 OBPs, 11 CSPs, 18 IRs, 17 GRs, and four SNMPs. Furthermore, we conducted a comprehensive and comparative phylogenetic analysis and examined 19 genes expression profiles using quantitative real-time polymerase chain reaction (qPCR). These results could help us better understand the olfactory signal transduction mechanisms in this insect.

Results

Transcriptome overview

Using an Illumina HiSeq 2000TM platform, a total of 31.1 million and 34.8 million raw reads were yielded, respectively, from the libraries of male and female antennae. After removing low-quality and adaptor reads, 29.4 million and 32.9 million clean-reads were generated. The total bases of sequence data were approximately 4.4 and 4.9 gigabases from male and female samples, respectively. Overall, 153,276 transcripts were generated, and we identified 92,259 unigenes by clustering and redundancy filtering. The mean length of unigenes was 1,229 nt and the N50 length reached 2,068 nt. In total, 35,508 unigenes were larger than 1,000 nt in length, which comprised 38.5% of all unigenes (Table 1). Homology searches of all unigenes with respect to other insect species showed that the highest percentage of unigenes matched T. castaneum (47.5%), followed by Dendroctonus ponderosae (12.8%), Lasius niger (3.7%), Acyrthosiphon pisum (3.2%), and Plutella xylostella (2.3%). The remaining 30.7% of the sequences showed similarity with the sequences of other insects (Fig. 1).

Table 1 Assembly summary of O. communa antennal transcriptome. nt = nucleotides, Gb = gigabases.
Figure 1
figure 1

All unigenes sequences (92,259) that had blast annotations against the nr database with a cut-off E-value 10−5 were analyzed for species distribution.

OBPs

We identified 25 different sequences encoding odorant binding proteins in O. communa antennal transcriptomes. Sequence analysis results showed that 20 unigenes had a putative full-length open reading frame (ORF) and 19 unigenes had predicted signal peptide sequences. All of the candidate OBPs sequences Blastx best hits were similar to known Coleoptera OBPs. The length of all putative full-length OcomOBPs ranged from 119 to 198 amino acids. Compared to ORs, insect OBPs were highly conserved. Twenty-one of 25 putative OBPs had more than 50% similarity with OBPs from G. daurica, P. maculicollis, and P. aenescens. Based on phylogenetic analysis, OcomOBPs were split in various branches and they formed small subgroups together with OBPs from other beetles. These groups were strongly supported by high bootstrap values. Remarkably, we found OcomOBP19, a pheromone binding protein (PBP), which clustered with other Coleoptera PBPs in a clade (Fig. 2). Information, including unigene reference, length, and best Blastx hit for all 25 OBPs are listed in Table 2.

Figure 2
figure 2

Neighbor joining phylogenetic tree of candidate OcomOBPs with known Coleopteran OBP sequences. Tcas, Tribolium castaneum (N = 47); Pmac, Pyrrhalta maculicollis (N = 33); Paen, Pyrrhalta aenescens (N = 31); Gdua, Galeruca daurica (N = 29); Agla, Anoplophora glabripennis (N = 2); Hele, Hylamorpha elegans (N = 1). Candidate OcomOBPs were indicated by red circles.

Table 2 The Blastx match of O. communa candidate candidate odorant binding proteins.

CSPs

We identified 11 unigenes encoding candidate chemosensory proteins in O. communa antennal transcriptome. Notably, all putative chemosensory proteins were predicted with a putative full-length ORF and signal peptide through sequence analysis. The length of all putative full-length OcomCSPs ranged from 118 to 261 amino acids. In addition, all of the OcomCSPs followed the highly conserved pattern with four cysteines arranged with an exact spacing of C1X6C2X18C3X2C4 (Fig. 3). Insect CSPs are more conserved than ORs or OBPs, and all OcomCSPs amino acid sequences have more than 65% similarity with CSPs from P. maculicollis, P. aenescens, G. daurica, and C. bowringi. Homology analysis showed that the OcomCSPs were present on different branches throughout the dendrogram and supported by high bootstrap values (Fig. 4). Information, including unigene reference, length, and the best Blastx hit of all 11 CSPs are listed in Supplementary Material S3.

Figure 3
figure 3

Sequences alignment of candidate OcomCSPs amino acid sequences. The conserved cysteine residues were marked with red box.

Figure 4
figure 4

Neighbor joining phylogenetic tree of candidate OcomCSPs with known Coleopteran CSP sequences. Tcas, Tribolium castaneum (N = 19); Pmac, Pyrrhalta maculicollis (N = 10); Paen, Pyrrhalta aenescens (N = 9); Gdua, Galeruca daurica (N = 10); Cbow, Colaphellus bowringi (N = 12). Candidate OcomCSPs were indicated by red circles.

SNMPs

We identified four SNMP genes in the antennal transcriptome (Fig. 5). Lengths of all candidate OocmSNMPs were over 500 amino acids and three of them were predicted to have a putative full-length ORF. Furthermore, all OocmSNMPs had more than 50% identity with SNMPs of P. aenescens, P. striolata, and C. bowringi. Information, including unigene reference, length, and the best Blastx hit of all four SNMPs are listed in Supplementary Material S3.

Figure 5
figure 5

Neighbor joining phylogenetic tree of candidate OcomSNMPs with known Coleopteran SNMP sequences. Tcas, Tribolium castaneum (N = 5); Pmac, Pyrrhalta maculicollis (N = 2); Paen, Pyrrhalta aenescens (N = 2); Cbow, Colaphellus bowringi (N = 4); Ldec, Leptinotarsa decemlineata (N = 2); Pstr, Phyllotreta striolata (N = 2). Candidate OcomSNMPs were indicated by red circles.

ORs

Thirty putative OR transcripts were identified in the O. communa antennal transcriptome. OcomOR1 (OcomORco) gene was easily identified because it had an intact ORF and seven transmembrane domains, which were characteristic of typical insects ORs. The amino acid sequences of OcomOR1 shared 91% identity with the co-receptor of C. bowringi. Except for OcomORco, 13 putative ORs were predicted to have a full-length ORF, encoding proteins with more than 335 amino acids. The putative OcomORs transcripts encoded complete proteins that were predicted to have three to seven transmembrane domains. Ten OcomORs were highly divergent and they had low levels of identity (<50%) with other reported insect ORs. Following the phylogenetic analysis, the OR sequences were clustered into several subgroups (Fig. 6). The OcomOR1 was clustered with other insects ORco containing PstrORco, CbowORco, and TcasORco. In addition, OcomOR5, OcomOR9, OcomOR24, and OcomOR26 were grouped into the same clade. Interestingly, OcomOR2 and OcomOR4 clustered with McarOR3 and McarOR5 in the same clade, OcomOR12 and OcomOR28 grouped together with McarOR20, meaning OcomOR2, OcomOR4, OcomOR12, and OcomOR28 may play a role in pheromone identity function, because McarOR3, McarOR5, and McarOR20 have been demonstrated to be tuned to the male-produced pheromone chemicals of M. caryae. Information, including unigene reference, length, and the best Blastx hit of all 30 ORs are listed in Table 3.

Figure 6
figure 6

Neighbor joining phylogenetic tree of candidate OcomORs with known Coleopteran OR sequences. Tcas, Tribolium castaneum (N = 70); Pmac, Pyrrhalta maculicollis (N = 18); Paen, Pyrrhalta aenescens (N = 23); Cbow, Colaphellus bowringi (N = 30); Pstr, Phyllotreta striolata (N = 36); Mcar, Megacyllene caryae (N = 34). Candidate OcomORs were indicated by red circles.

Table 3 The Blastx match of O. communa candidate odorant receptors.

GRs

We found 17 candidate GRs transcripts in the O. communa antennal transcriptome (Fig. 7). The majority of candidate OcomGRs were partial fragments (only four were predicted to have a putative full-length ORF), encoding overlapping but distinct sequences. Eleven OcomGRs had more than 50% identity with GRs of P. aenescens, P. striolata, C. bowringi, Monochamus alternatus39, Anomala corpulenta40, and Anoplophora glabripennis41. Information, including unigene reference, length, and the best Blastx hit of all 17 GRs are listed in Supplementary Material S3.

Figure 7
figure 7

Neighbor joining phylogenetic tree of candidate OcomGRs with known Coleopteran GR sequences. Tcas, Tribolium castaneum (N = 44); Pmac, Pyrrhalta maculicollis (N = 6); Paen, Pyrrhalta aenescens (N = 12); Pstr, Phyllotreta striolata (N = 12). Candidate OcomGRs were indicated by red circles.

IRs

We identified 18 transcripts encoding candidate IRs in the O. communa antennal transcriptome. Of these, eight OcomIRs contained a putative full-length ORF, with three to four TMDs. Based on the Blastx results, all OcomIRs had high levels of identity (>58%) with other reported insect IRs, indicating IRs were relatively conserved in Coleoptera insects. In the phylogenetic analysis, OcomIRs were grouped into different clades with high-level bootstrap values. OcomIR3 and OcomIR7 clustered with the IR8a/IR25a clades (including TcasIR8a, TcasIR25a, CbowIR6, CbowIR8a, PstrIR19, and PstrIR49), indicating they may be the co-receptor of OcomIRs (Fig. 8). Information, including unigene reference, length, and the best Blastx hit of all 18 IRs are listed in Supplementary Material S3.

Figure 8
figure 8

Neighbor joining phylogenetic tree of candidate OcomIRs with known Coleopteran IR sequences. Tcas, Tribolium castaneum (N = 23); Pmac, Pyrrhalta maculicollis (N = 6); Paen, Pyrrhalta aenescens (N = 8); Pstr, Phyllotreta striolata (N = 26); Cbow, Colaphellus bowringi (N = 6). Candidate OcomIRs were indicated by red circles.

Fluorescence quantitative real-time PCR

To verify the expression of olfactory genes in male or female antennae and characterize the expression profiles of chemosensory genes in different parts (including male antennae, female antennae, heads, legs, and the remainder of the body), 15 ORs and four OBPs were selected for qPCR. The qPCR results showed that all 15 OcomORs were predominately expressed in the antennae, indicating their function related to insect olfaction. Although we did not find apparent sex-specific OR genes in O. communa, we found OcomOR4, OcomOR19, and OcomOR2 had significantly higher expression levels in the male antennae, whereas OcomOR8 had a significantly higher expression level in the female antennae. Furthermore, OcomOBP19, OcomOBP10, and OcomOBP20 were specifically expressed in the antennae, whereas OcomOBP2 was expressed not only in the antennae, but also slightly expressed in the head, body, and leg. Importantly, we found OcomOBP19 and OcomOBP20 were expressed significantly higher in male antennae than in female antennae (Fig. 9).

Figure 9
figure 9

Relative expression levels of 15 ORs and four OBPs in adult antennae, head, leg and the rest of body using qPCR. M-T, male antennae; F-T, female antennae. The relative expression level is indicated as mean ± SE (N = 3). Different capital letters mean significant difference between tissues (P < 0.05, ANOVA, LSD).

Discussion

Compared with Dipterans and Lepidopterans, the molecular underpinnings of the olfactory system of Coleoptera are poorly understood. Based on the deep RNA sequencing, we analyzed the transcriptome antennae of O. communa. Among the 92,259 unigenes identified, only 47% gene translations shared significant similarity with entries in the NCBI non-redundant (nr) protein database, and only 36% unigenes could be annotated with one or more gene ontology (GO) term; this is similar to that reported in other Coleoptera species42,43. Thus, this amount of O. communa genes did not have any GO term because they were specific or fast-evolution genes of O. communa. However, the N50 of O. communa antennal transcriptome reached 2,068 bp, longer than those in P. aenescens31, P. maculicollis31, C. bowringi30, L. decemlineata28, and A. quadriimpressum32. The high quality of our transcriptome sequencing laid a foundation for olfactory annotation and further exploration of the molecular chemosensory mechanism of O. communa. On the basis of O. communa transcriptome results, we identified 105 candidate chemosensory genes, including 30 ORs, 25 OBPs, 11 CSPs, 18 IRs, 17 GRs, and four SNMPs, and this analysis substantially extended our knowledge of olfactory-related genes in Coleoptera insects. Moreover, we validated the expression profile of 15 ORs and four OBPs in different tissues of O. communa by qPCR, which facilitated the exploration of the function of these olfactory genes.

OBPs play an important role in odor processing by insects, facilitating the transport of odorant molecules through the sensillar lymph, and serving as the liaison between the external environment and ORs1,44. In our study, we identified 25 transcripts encoding OBP genes in the O. communa antennal transcriptome. The numbers of OBPs are clearly lower than T. castaneum (49 OBPs)24, G. daurica (29 OBPs)32, P. aenescens (31 OBPs)30, and P. maculicollis (36 OBPs)30, similar to the number of OBP genes in C. bowringi (26 OBPs)30, L. decemlineata (26 OBPs)28, but higher than that of A. quadriimpressum (16 OBPs)32. This is because the genome data of T. castaneum provide more a comprehensive list of olfactory genes than antennal transcriptome of other Coleoptera species. On the other hand, some genes may have been missed in our transcriptome results because some genes were expressed in other tissues than antennae45 or at different life history stages46,47. Relatively low coverage of the RNA-seq may also results in missing low transcripted olfactory genes. Based on the phylogenetic analysis, OcomOBP19 grouped together with HelePBP and AglaPBP1, indicating that OcomOBP19 may be involved in the pheromone recognition process. Further, the expression level of OcomOBP19 in the male antennae was significantly higher than that in the female antennae, which confirms that the function of OcomOBP19 may be related to pheromone identification in O. communa. Similarly, OcomOBP20 expression in male antennae was significantly higher than that in the female antennae, indicating the function of OcomOBP20 may be related to sex pheromone recognition or male-specific behaviors like OcomOBP19.

CSPs were another class of soluble proteins in the sensillum lymph with abundant expression48. A total of 11 candidate CSP genes were found in our transcriptome data. All of them were predicted to have a putative full-length ORF and amino acid sequences range from 118 to 261. In addition, the high level similarities found in Blastx best-hit results demonstrated that CSPs were highly conserved proteins between insects. Comparing CSP gene numbers in O. communa with that in other Coleoptera species, there was less than 20 CSPs in T. castaneum24, 15 CSPs in L. decemlineata27, and similar to CSP genes in P. aenescens (nine CSPs)30, P. maculicollis (ten CSPs)30 and A. quadriimpressum (ten CSPs)31. Thus, the number of CSP genes identified in our study was comparable with that of previous reports on these latter three beetles. SNMPs were first identified in pheromone-sensitive neurons of Lepidopterans49 and its function was thought to be related to pheromone detection50. Generally, SNMPs were classified into two families, SNMP1 and SNMP2. Two SNMPs were identified in P. aenescens30, P. maculicollis30 and A. quadriimpressum31, whereas there were three and four SNMPs identified in the C. bowringi30 and L. decemlineata28 transcriptome, respectively. In this study, four SNMPs were identified in the O. communa antennal transcriptome as well.

ORs are important to insects olfactory system, which determine the sensitivity and specificity of odorant reception, being the centerpiece of peripheral olfactory reception in insects1. The numbers of putative OR-encoding transcripts identified in O. communa are close to the number reported in the antennal transcriptome of P. aenescens (26 ORs)31, P. maculicollis (22 ORs)31, C. bowringi (43 ORs)30, L. decemlineata (37 ORs)28, and A. quadriimpressum (34 ORs)32, but much lower than the number in the T. castaneum genome (341 OR-encoding genes, including 79 pseudogenes)24, suggesting the antennal transcriptome data may have missed some OR genes. Obviously, OcomOR1 was grouped with PstrORco, CbowORco, and TcasOR1, and formed a specific co-receptor lineage, indicating that OcomOR1 could be the ORco of O. communa. Similar to the reported OR genes of T. castaneum, M. caryae, and A. corpulenta, a species-specific expansion of ORs (OcomOR5/9/24/26) was also found in O. communa, which may suggest that these distinct species inhabit different ecological niches. The OR gene function in beetles was first characterized in M. caryae27. McarOR3, McarOR5, and McarOR20 were sensitive to three compounds of male-produced pheromones in M. caryae, indicating the function of these three ORs may be related to pheromone recognition27. In the phylogenetic analysis, OcomOR2 and OcomOR4 clustered with McarOR3 and McarOR5 in the same clade, and OcomOR12 and OcomOR28 grouped together with McarOR20, indicating that the function of these four ORs in O. communa may be pheromone identification similar to that of other lepidopteran pheromone receptors (PRs)51. In addition, qPCR results revealed that the expression level of OcomOR2, OcomOR4, and OcomOR12 in male antennae was higher than in the female antennae, and the difference between OcomOR2 and OcomOR4 reached statistical significance. This evidence further demonstrates that OcomOR2, OcomOR4, and OcomOR12 may play a role in pheromone identification in O. communa. OcomOR8 expression levels in female antennae were significantly higher than in male antennae; therefore, OcomOR8 may be related to female critical behaviors, such as oviposition cues or male-produced courtship pheromones. The sex-specific functions of these OcomORs need to be further investigated in the future.

Furthermore, 18 putative transcripts encoding IRs were identified in O. communa. The IRs number of O. communa was greater than in most Coleoptera species, such as nine IRs in C. bowringi30, ten IRs in L. decemlineata28, eight IRs in P. aenescens31, seven IRs in P. maculicollis31, three IRs in Dendroctonus valens43, and four IRs in A. glabripennis41. In addition, the IR number of O. communa was similar to that of 20 IRs in A. quadriimpressum31. Similar to the ORco, both IR8 and IR25 were considered to act as co-receptors because they were co-expressed along with other IRs. From the phylogenetic tree of IRs, IR8 and IR25 formed a conserved IR clade, which agreed with the analysis results of C. bowringi30. OcomIR3 and OcomIR7 were clustered into conserved IR25a/IR8a clades, indicating they belong to this co-expression group. Furthermore, IRs in insects were more conserved than ORs, we can predict that the function of IRs is probably conserved among Coleoptera. We identified 17 candidate GRs in O. communa. The GR numbers of O. communa was greater than most of those previously reported in beetles, and we also believe that there are more GRs expressed in other tissues, such as maxillary palps, proboscises, and legs. In the previous study of P. aenescens antennal transcriptome, PaenGR12 was predicted to be the CO2 receptor31. In the phylogenetic tree of GRs, OcomGR16 grouped together with PaenGR12, so we predicted that OcomGR16 acted as the CO2 receptor in O. communa.

Conclusion

Using next-generation sequencing technology, we first reported large-scale olfactory gene information for O. communa and identified 30 ORs, 25 OBPs, 11 CSPs, 18 IRs, 17 GRs, and four SNMPs. This large number of insect chemosensory genes will provide the molecular basis for the olfactory systems of O. communa and will advance our understanding of olfactory mechanisms in Coleoptera. In addition, homology analysis and qPCR were performed to confirm the tissue- and sex-specific patterns of these chemosensory genes, which can help us to predict their function. Further analysis is needed to explore the function of these genes using integrated functional studies.

Materials and Methods

Insects

O. communa adults were collected from Laibin City, Guangxi province, southern China in June 2017, mixed, and reared together with common ragweed plants in cages in an insect breeding room at 26 °C, under 14 h light: 10 h dark cycle, and 70%-80% humidity. After the beetles laid eggs, the adults were removed from the cages and the next generation reared on common ragweed plant in the same breeding room. After eclosion, the male and female adults were separated under microscope and kept in separate cages. The antennae of the unmated male and female individuals were collected two days after eclosion. The antennae were pulled off with tweezers by grasping at the very root of the antennae, and subsequently transferred to Eppendorf tubes. For the study of gene expression profiles in different tissues, male antennae (M-T), female antennae (F-T), heads, legs, and the rest of the body were collected. All samples were immediately frozen in liquid nitrogen and stored at −80 °C until RNA extraction.

RNA extracting, cDNA library construction and Illumina sequencing

Total RNA was extracted using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) following the manufacturer’s instructions, in which a DNaseI digestion step was included to avoid contamination of genomic DNA. RNA quality was checked with a spectrophotometer (NanoDropTM 1000, Thermo Fisher Scientific, USA) and 1% agarose gels, and its concentration was measured using Qubit® RNA Assay Kit with a Qubit® 2.0 Flurometer (Life Technologies, CA, USA). The complementary DNA (cDNA) library construction and Illumina sequencing methods followed Li et al.30. Briefly, The mRNA samples were purified and fragmented using TruSeq PE Cluster Kit v3-cBot-HS (Illumina) according to the manufacturer’s instructions. Random hexamer primers were used to synthesize the first-strand cDNA, followed by synthesis of the second-strand cDNA using buffer, dNTPs, RNase H, and DNA polymerase I, and then end repair and the ligation of adaptors were handled. The cDNA library created by amplifying the products using polymerase chain reaction (PCR) and quantifying precisely using the QIAquick PCR Purification Kit (Qiagen, Valencia, CA, USA). The cDNA library was sequenced on the HiSeq 2000TM platform.

de novo assembly and gene annotation

The de novo assembly and gene annotation methods followed Li et al.30. All raw reads were processed to remove low-quality and adaptor sequences. And then the clean reads were assembled by Trinity v2.3.152,53 using the default parameters to generate unigenes. The annotation of unigenes was performed by Blastx searches ((http://www.ncbi.nlm.nih.gov) against nr, Swiss-Prot, KEGG, and COG protein databases(E-value < 10−5). Blast2GO program54 was used to obtain the GO annotation and WEGO software55 was used to get GO functional classification of these unigenes.

Sequence analysis and phylogenetic analysis

The ORF finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html) and the NCBI-BLAST network server (http://blast.ncbi.nlm.nih.gov/) were respectively used to identify the ORFs and perform the similarity searches of the candidate chemosensory genes. TMHMM Server Version 2.0 (http://www.cbs.dtu.dk/services/TMHMM) was used to predict the TMDs of ORs, GRs, and IRs. The signal peptides of OBPs, CSPs, and SNMPs protein sequences were predicted by Signal IP 4.1 (http://www.cbs.dtu.dk/services/SignalP/)56 with default parameters.

The amino acid sequence alignment of the candidate OBPs, CSPs, ORs, GRs, and SNMPs from O. communa and other insect species were performed using the ClustalW method57 implemented in the Mega v6.0 software package58. The OBP dataset contained 33 sequences from P. maculicollis, 31 from P. aenescens, 29 from G. daurica, and 47 from T. castaneum. The CSP dataset contained 10 sequences from P. maculicollis, nine from P. aenescens, 12 form C. bowringi, and 19 sequences from T. castaneum. The SNMP dataset contained two sequences from P. maculicollis, two from P. aenescens, four form C. bowringi, two from L. decemlineata, two from P. striolata, and five sequences from T. castaneum. The OR dataset contained 18 sequences from P. maculicollis, 23 from P. aenescens, 30 from C. bowringi, 36 from P. striolata, 34 from M. caryae, and 70 sequences from T. castaneum. The GR dataset contained six sequences from P. maculicollis, 12 from P. aenescens, 12 from P. striolata, and 44 sequences from T. castaneum. The IR dataset contained six sequences from P. maculicollis, eight from P. aenescens, 26 from P. striolata, six from C. bowringi, and 23 sequences from T. castaneum. All amino acid sequences of O. communa and other insects used in the phylogenetic analyses are listed in Supplementary Material S1. The phylogenetic tree was constructed using the neighbor-joining (NJ) method59 with P-distance modeling and pairwise deletion of gaps performed in the Mega v6.0 software package58 and the dendrograms were colored in Fig Tree v1.4.3 software package. The reliability of the tree structure and node support was assessed using a bootstrap procedure based on 1,000 replicates. To ensure greater accuracy in the analyses and make sure that the analyzed transcripts corresponded to individual genes, incomplete transcripts without sufficient overlap in alignments and protein sequence length less than 100 amino acids in length were excluded from the phylogenetic analyses. Six group candidate chemosensory genes were named “OcomOBP,” “OcomCSP,” “OcomSNMP,” “OcomOR,” “OcomGR,” and “OcomIR,” and were followed by a numeral in descending order of their coding region lengths.

Quantitative real-time PCR validation

We selected 15 ORs and four OBPs to verify their expression profiles because their relative high abundance from fragments per kilobase of exon per million reads mapped (FPKM) data in antennal transcriptome. The expression profiles of 15 ORs and four OBPs were analyzed using qPCR experiments. Total RNA was isolated from the five tissues as described above. The concentration of each RNA sample was standardized to one ug/ul and the cDNA was synthesized using a first-strand cDNA synthesis kit (Transgen Biotech, Beijing, China) according to the manufacturer’s protocol. Ribosomal protein (RL4) was used as an internal control and its specific primer sequences were RL4-F: “TGTGGTAATGCTGTGGTAT” and RL4-R: “TCTAGCACTGCATGAACA”. The qPCR was performed on an ABI 7500 (Thermo Scientific, Waltham, MA, USA) with TransStar Tip Top Green qPCR Supermix (Transgen Biotech, Beijing, China). The PCR reaction programs were 30 s at 94 °C, 40 cycles of 94 °C for 5 s, and 60 °C for 34 s. All qPCR primers were designed using Primer Premier 5.0 (PREMIER Biosoft International) and the efficiency of these primers was validated before gene expression analysis. All primer sequences were listed in Supplementary Material S2. Each qPCR reaction was performed using three technical replicates and three biological replicates.

Statistical analysis

Data analysis was performed using the 2−ΔΔCT method and data were analyzed using SAS 9.0 (SAS Institute Inc., Cary, NC, USA). Statistical significance was assessed by an analysis of variance (ANOVA) followed by a Tukey multiple comparison tests. A value of P < 0.05 was considered statistically significant. Figures were made using OriginPro 9.1 (Northampton, Massachusetts, USA).

Data deposition

All the Illumina sequencing data of the antennal transcriptome in this study have been stored in the NCBI SRA database, under the accession number of SRR8372148 (O.communa male antennae) and SRR8372149 (O.communa female antennae).