Identification of putative chemosensory receptor genes from yellow peach moth Conogethes punctiferalis (Guenée) antennae transcriptome

The yellow peach moth, Conogethes punctiferalis, is an extremely important polyphagous insect in Asia. The chemosensory systems of moth play an important role in detecting food, oviposition sites and mate attraction. Several antennal chemosensory receptors are involved in odor detection. Our study aims to identify chemosensory receptor genes for potential applications in behavioral responses of yellow peach moth. By transcriptomic analysis of male and female antennae, 83 candidate chemosensory receptors, including 62 odorant receptors, 11 ionotropic receptors and 10 gustatory receptors were identified. Through Blast and sequence alignment, the highly conserved co-receptor Orco was annotated, eight unigenes clustered into pheromone receptors, and two clustered as sugar receptor. Among the IRs, one unigenes was similar with co-receptors IR25a. Expression levels of 50 odorant receptors were further evaluated by quantitative real-time PCR in antennae. All the ORs tested were detected in antennae and some of which were associated with sex-biased expression. The chemosensory receptors identified in C. punctiferalis provide a foundational resource for further analysis on olfaction for behavior. The expression profiles of ORs in antennae indicated variant functions in olfactory recognition, and our results provided the possibility for the potential application of semiochemical to control this pest moth.

BmOR47 respond to benzoic acid, 2-phenylethanol, and benzaldehyde respectively 29 . While PRs (BmOR1 and BmOR3) respond specifically to bombykol (E10Z12-16: OH) and bombykal (E10Z12-16: Ald) respectively, which are major and minor sex pheromone components in B. mori. 30,31 . Although we classified receptors on the basis of their responses to chemicals, their functions are still unknown. PRs are more conserved compared with other ORs 32,33 . Orco is the only highly conserved OR protein among divergent insect species, which suggests the crucial role in chemical recognition 34,35 . The GRs are generally expressed in gustatory receptor neurons (GRNs) within gustatory organs. Differently, several GRs are usually expressed in one neuron. Most GRs sense non-volatile compounds, including sugar and bitter compounds 36 . Interestingly, Some GRs expressed in antennae cannot identify volatile odorants, but respond to carbon dioxide 37 . IRs belongs to a variant subfamily of ionotropic glutamate receptors (iGluRs), involving in detection of chemical signals 38,39 . Based on the expression location, two subfamily of IRs are identified: conserved antennal IRs and divergent IRs, indicating the function of IRs in olfactory and taste 39 . In antennal IRs, IR8a and IR25a appear to act as co-receptors and form heteromeric complexes with odor-specific receptors 40,41 . In contrast with ORs, many IR-expressing neurons expressed two or three IR genes 17 .
The yellow peach moth Conogethes punctiferalis (Guenée) (Lepidoptera: Crambidae), is a serious polyphagous insect of crop (sunflower, maize, sorghum, cotton, etc.) and fruit (peach, plum, durian, etc.). It's hard to control by traditional method, due to larvae bored the inside fruit or stem 42,43 . Few olfactory receptor genes have been reported in this moth 44 , however more receptor genes investigated allows a better understanding of the molecular basis of olfaction. Identification of the chemosensory genes can lead to comprehensive understanding of how to  recognize and locate host for this moth. In the present study, the antennal transcriptome of C. punctiferalis were sequenced on the Illumina HiSeq 4000 platform. In total 83 chemoreceptor genes including 62 ORs, 11 IRs and 10 GRs were identified. The expression levels of ORs, which may play important roles in chemoreception of C. punctiferalis were analyzed by using of quantitative real-time PCR (qRT-PCR). The results assist with the identification of genes involved in C. punctiferalis olfactory, help better understand the gene expression in sex, and provide the molecular basis for further study of pheromone and host volatile recognition.

Results
Illumina sequencing. To identify the chemosensory protein genes from C. punctiferalis, the cDNA from male and female antennae were sequenced using the Illumina HiSeqTM 4000 sequencing platform. A total of 59,375,582 and 57,353,054 raw reads were obtained from female and male antennae, respectively. After removing adaptor sequences, low quality sequences and containing N sequences, 58,053,354 and 55,592280 clean reads were generated from the antennae of female and male raw data, respectively. After assembled, the two datasets result in 112,933 contigs with a mean length of 902 bp and an N50 of 1750 bp. From these datasets, 76,486 unigenes were obtained with a mean length of 727 bp and an N50 of 1499 bp (  1). GO annotation was used to classify the function of transcripts according to the GO terms ( Fig. 2A). In the biological process terms, cellular, metabolic and single-organism were the highest classified. In the cellular component terms, cell, cell part and organelle were the most abundant. In the molecular function category, the genes expressed in the antennae were mostly related to binding, catalytic activity and transporter activity. KOG annotation was based on the relationship between orthologous genes from different species (Fig. 2B). Using KOG functional annotation, "General function prediction only", "Signal transduction", and "Posttranslational modification, protein turnover, chaperones" were three largest groups assigned to. For KEGG annotation, 10,298 unigenes were classified into five groups, Cellular processes,   environmental information processing, genetic information processing, metabolism and organismal systems (Fig. 2C). Transport and catabolism, signal transduction, translation, carbohydrate metabolism, endocrine systems were the main pathways in each group, respectively.
Putative chemoreceptor genes. All the unigenes were searched by BLAST to identified chemoreceptor genes. Total 62 OR genes, 11 IR genes and 10 GR genes were identified through bioinformatics analysis. All the chemosensory receptor genes of C. punctiferalis were submitted to the GenBank (accession numbers: KX084452-KX084521 and KX096203-KX096215). Of these genes, 45 unigenes (39 ORs and 6 IRs) showed sequence identity to the previously identified in C. punctiferalis which are already submitted in GenBank. Among the identified ORs, 38 were found to represent full-length open reading frames (ORF) with 4-7 transmembrane domains ( Table 3). Half of those sequences were annotated against homologs of Ostrinia furnacalis. As expected, Orco/OR2 was identified with high identity compared with the conserved insect OR2/Orco. The IRs and GRs sequences in C. punctiferalis antennal transcriptome were identified according to the similarity with known insect IRs and GRs. Seven of eleven putative IRs were similar with IRs of O. furnacalis. Six unigenes with full-length ORF, contained 3~4 transmembrane domains. Of GR genes, 3 putative GR genes contained full-length ORF with 6 transmembrane regions, while 7 sequences were partial (Table 4).
Phylogenetic tree. Phylogenetic tree of ORs was constructed using the sequences of 162 ORs from B. mori, O. furnacalis and C. punctiferalis (Fig. 3). The hits with a length of less than 200 amino acids were ignored. The OR sequences were clustered into PRs, Orco and other divergent ORs. The 8 amino acid sequences, OR1 and OR3-9 clustered into a clade with pheromone receptors of B. mori and O. furnacalis. Orco/OR2 was highly conserved and clustered with Orco/OR2 family. In the neighbor-joining tree of IRs (Fig. 4), the IRs were clustered into ionotropic glutamate receptors (iGluRs), IR25/IR8a and other divergent IRs. Among the nine IRs in C. punctiferalis, CpunIR25a grouped with the highly conserved IR25a/IR8a. Alignment analysis revealed that the similarity of IR25a was higher than 76.3% in D. melanogaster, B. mori, Tribolium castaneum and C. punctiferalis (Fig. 5). CpunIR4, CpunIR6, CpunIR3, CpunIR7 shared identity with IR76b, IR21a, IR68a, IR41a from D. melanogaster, B. mori and T. castaneum, respectively. Unfortunately, we cannot obtain the sequences belong to iGluRs. In phylogenetic tree of the GRs, GRs from B. mori, Manduca sexta and C. punctiferalis were analyzed (Fig. 6), CpunGR5 and CpunGR4 were respectively classified into BmorGr64a and MexGR6 subgroup, which were known as sugar receptors. CpunGR3 was cluster into a clade of fructose receptors.

Expression level of OR genes in female and male antennae. Quantitative real-time PCR (qPCR)
was used to analyze the expression levels of 50 OR genes in the antennae of male and female moths. The results indicated that all the tested genes were detected in antennae. Among the OR genes, OR8, OR39, OR48, OR4 and OR30 were highly enriched in female, with 182.6, 93.9, 84.4, 15.6, and 13.5 times to male, respectively. PR subfamily members, OR1, OR3, OR5 and OR7 were highly expressed in male antennae, as well as the odorant receptor OR62 (Fig. 7) (Table 3). As co-receptor, Orco/OR2 was most abundant in female and male antennae. Because OR4, OR39, OR48 and OR53 were only detected in female antenna, so we cannot obtain the FPKM value (Table 3).

Discussion
With the rapidly development of next generation sequencing technology, a large number of unigenes were identified. RNA-seq effectively increased the depth of sequencing compared to the previous sequence data from cDNA library [46][47][48] 49 , suggested that the olfactory related genes we identified were equivalent to other moths. Total number of identified OR genes varied in different orders or species due to sequencing methods and depth, and/or sample preparations. In Lepidopteran, B. mori, totally 66 ORs were reported in genomic data 50 . In Dipteran, there were 79 candidate OR genes from Anopheles gambiae and in Hemipteran, 170 OR genes from Apis mellifera were annotated 18,21 . Especially in Nasonia vitripennis, total number of ORs was up to 301 51 . However, identified OR genes were only obtained from the antennae, ORs expressed in other tissues might be difficult to identify and the physiological states perhaps influence the amount of ORs in the antennae 22 . It seemed that more OR genes in Hymenopteran insect, especially the parasitoid wasp, might be due to need more receptors were needed to identify and locate the host. In Lepidopteran insects, the PRs are more conserved and clustered in a subgroup within the OR family 9 . In our study, OR1 and OR3-9 of C. punctiferalis were clustered with pheromone receptors of B. mori and O. furnacalis, and we inferred that some or all of them might be putative pheromone receptors. But the expression profiles of these sequences showed that not all of them were male-specific. Koenig considered that at least one of the putative PR should exhibit male-specific expression associated with trichoid sensilla 9 . Recent studies showed that some PR genes expressed in both sexes. The PR1, 2 and 4 identified from the antennae of C. suppressalis have been detected in both male and female antennae 49 . From the antennal transcriptome of Spodoptera littoralis, four unigenes were enriched in male antennae, only two of which were annotated as putiative PRs and found to be expressed in antennae of both sexes 48 . In B. mori, Bmor19, Bmor45 and Bmor47 were also reported as female-specific or highly female-biased receptors, which responded to linalool, benzoic acid and benzaldehyde, suggesting the potential roles in detecting oviposition site and male-produced sex pheromone 29 . In our study, five unigenes were enriched in the female antennae than in the male, which may have provide important function during the host plant seeking.
IRs are derived from ionotropic glutamate receptors and comprise a novel family of chemosensory receptors. In D. melanogaster, 15 of total 66 IR genes were specifically expressed in antennae 38 that was similar with our study that only 11 IRs were identified from the antennae transcriptomes of C. punctiferalis. Phylogenetic analysis showed that IRs from C. punctiferalis were not closely related to the iGluRs. Animal iGluRs are highly conserved family of ligand-gated ion channels 39 , while IRs are extremely divergent exclusive of IR8a and IR25a 38 . IR25a and IR8a are most similar primary sequence to iGluRs, suggesting that they are the IR genes most similar to the ancestral IRs 9,39,52 . In our study, CpunIR25a clustered with the highly conserved IR25a from B. mori, D. melanogaster and T. castenum, while in N. vitripennis and Microplitis mediator, two putative IR25a orthologues were identified 22,39 . Therefore, the congruent function of IR25a orthologues is still not clear. In Lepidoptera insect, IR8a were identified from the antennal transcriptome of Helicoverpa armigera, H. assulta and C. suppressalis 26,49 . Surprisingly, IR8a was not found in our antennal transcriptome. We speculated that it maybe need more deeply sequencing and increasing the samples, suggesting that IR8a maybe not expressed in our present samples.
In the previous studies, 56 and 77 candidate GR genes were respectively identified from the Drosophila and Acyrthosiphon pisum genome 14,53 . In B. mori, 65 GR genes were annotated from the silkworm genome 54 . However, only 10 GRs were identified from the antennal transcriptome of C. punctiferalis. It might be that GRs have a low expression level in the antennae and mainly expressed in gustatory organs (proboscis, legs, wings and genitalis) 55,56 . GRs have been divided into four classes: non-fructose sugar, bitter, CO 2 , and fructose receptors 9,54 . In phylogenetic tree of GRs, the gustatory receptor genes, GR3-5 were clustered into sugar and fructose receptors, suggesting the potential sugar detection function in the antennae of this moth. However, the recent report showed a wide range of non-gustatory sensory functions of GRs 57 , indicating that GRs may have more divergent functions in insect antennae.
In conclusion, the main purpose of this study was to identify the chemosensory receptor genes involved in the chemoreception. Based on the transcriptomic analysis, 83 CRs were identified from the antennae of C. punctiferalis. Our method was successful in identifying CR genes with low-expressing levels and the results made it possible for further studies on the molecular level and behaviors, providing the possibility to applicate potential target genes for controlling this pest.

Materials and Methods
Insects rearing and antennae collection. C. punctiferalis larvae were collected from the infested sunflower at Langfang Experimental Station of Chinese Academy of Agricultural Sciences, Hebei Province, China. In laboratory, the larvae were raised on fresh corn ear under conditions 27 ± 1 °C, with 70-80% relative humidity (RH) and a photo period of 16:8 h light: dark (L:D). After eclosion, adults were fed with 10% honey solution. The antennae (70 pairs of each sex) from adults within three days after eclosion were separately collected and frozen immediately in liquid nitrogen. RNA extraction. Total RNAs were separately extracted and purified using Trizol Reagent (Invitrogen, Carlsbad, CA, USA) following the product manuals. The integrity of total RNA was assessed with Agilent 2100 (Agilent Technologies Inc., CA, USA) and the purity was detected on a NanoDrop 2000 spectrophotometer (NanoDrop products, Wilmington, DE, USA). The concentration was accurately measured by Qubit. Illumina sequencing was performed at Novogen Co., Ltd. Beijing, China, The cDNA libraries were sequenced using Illumina HiSeq 4000 system under effective concentration. In order to obtain the high-quality transcriptome sequence data, the raw reads were filtered to eliminate adaptor sequences and low quality reads. Reads with uncertain nucleotides larger than 10% of the fragment sequence were removed. Trinity de novo program with a default k-mer was used to assemble the clean reads 45 . Redundant sequences were removed to obtain unigenes sequences by means of selecting longest transcript contigs.
Unigenes annotation and classification. To obtain comprehensive information of gene functions, assembled unigenes were searched against Nr (NCBI non-redundant protein sequences), Nt (NCBI nucleotide), Pfam (Protein family), KOG (euKaryotic Ortholog Groups), SwissProt, KEGG (Kyoto Encyclopedia of Genes and Genomes), GO (Gene Ontology) database using BlASTX with an E-value < 10 −5 . CDS (coding sequence) were predicted in two-step process, the unigenes were firstly aligned using NR and SwissProt protein database to obtain ORF. If the sequences mismatch with the two databases, estscan (version 3.0.3) software was used to predict the ORF to obtain the nucleic acid and amino acid sequence. FPKM (fragments per kilobase pairs per million mapped reads) was used for gene expression analysis.
Identification of chemosensory receptor genes. Homology searches of OR, GR and IR sequence were performed using BLAST (http://blast.ncbi.nlm.nih.gov/blast.cgi), and the ORFs were predicted using ORF finder (http://www.ncbi.nlm.nih.gov/gorf/gorf.html). Transmembrane domain was predicted using TMHMM (http://www.cbs.dtu.dk/services/TMHMM/). Phylogenetic tree was constructed by MEGA 5.2 software using the neighbor-joining method with the Bootstrapping model by 1000 replication. The evolutionary distances were computed by using the Poisson correction method. The phylogenetic tree image was further created by Figtree software.
Quantitative real-time PCR. The expression levels of ORs were examined in male and female antennae using qRT-PCR. Total RNA was extracted as described above. First-strand cDNAs were synthetize followed One-Step gDNA removal and cDNA Synthesis kit (TransGen Biotech, China) with oligo dT-primer following the kit manual. The primers for qPCR were designed using Primer premier5.0 program (Table S1). qRT-PCR were performed on ABI 7500 fast real-time PCR system (Applied Biosysterm, USA) in a reaction volume of 20 μ L using SYBR Premix Ex Taq II (Tli RNaseH Plus) master mix (Takara-Bio, Shiga, Japan), according to the manufacturers' instructions. Two-step program were performed as follows, 95 °C for 30 s, followed by 40 cycles of 95 °C for 3 s and 60 °C for 30 s. For each gene, three biological replications were performed with each biological replication measured in three technique replications. Relative quantification was analyzed using the comparative 2 −ΔΔCT method, with the housekeeping genes β-actin (accession number JX119014) as the reference gene.