Transcriptome analysis of molecular mechanisms responsible for light-stress response in Mythimna separata (Walker)

Light is an important environmental signal for most insects. The Oriental Armyworm, Mythimna separata, is a serious pest of cereal crops worldwide, and is highly sensitive to light signals during its developmental and reproductive stages. However, molecular biological studies of its response to light stress are scarce, and related genomic information is not available. In this study, we sequenced and de novo assembled the transcriptomes of M. separata exposed to four different light conditions: dark, white light (WL), UV light (UVL) and yellow light (YL). A total of 46,327 unigenes with an average size of 571 base pairs (bp) were obtained, among which 24,344 (52.55%) matched to public databases. The numbers of genes differentially expressed between dark vs WL, dark vs UVL, dark vs YL, and UVL vs YL were 12,012, 12,950, 14,855, and 13,504, respectively. These results suggest that light exposure altered gene expression patterns in M. separata. Putative genes involved in phototransduction-fly, phototransduction, circadian rhythm-fly, olfactory transduction, and taste transduction were identified. This study thus identified a series of candidate genes and pathways potentially related to light stress in M. separata.


Results
Illumina sequencing and de novo assembly. Four cDNA libraries were prepared from M. separata exposed to dark (dark), white light (WL), UVA light (UVL), and yellow light (YL), respectively, and subjected to Illumina deep sequencing. Illumina paired-end sequencing generated a total of 235.0 million raw reads. After cleaning and quality checks, 212.5 million clean reads (15.08 Gb) were obtained, with an average of 53.1 million reads (~3.8 Gb) per sample. The dark, WL, UVL, and YL cDNA libraries generated 54,402,458, 55,437,108, 50,414,324 and 52,197,036 clean reads, respectively ( Table 1). The percentages of Q20 sequences (with an error probability of 0.01; high-quality indicator) in the four libraries were 98.12%, 97.04%, 96.93%, and 98.23%, respectively. These clean reads were de novo assembled into unigene sequences using Trinity software. These unigenes, encompassing 26,465,921 nucleotides (nt), were finally summarized into 46,327 All-unigenes, including 11,130 clusters and 35,197 singletons, with N50 at the length of 795 base pairs (bp) and an average length of 571 bp ( Table 2). Among these All-unigenes, 6175 were exclusive to the WL group, and 6916, 6935, and 2563 were exclusive to the dark, UVL, and YL groups, respectively. The length distributions of All-unigenes are shown in Fig. 1. The most abundant unigenes were < 300 bp (>19,000) and the least abundant were ≥ 3000 bp (579). Sequences ≥ 3000 bp were grouped together. Regarding the consensus sequences, many unigenes (69.50%) were 200-500 bp long, 13,906 were > 500 bp, 5881 (12.69%) were > 1000 bp, and 3.85% were > 2000 bp. The number of sequences decreased as the length increased.
To evaluate the completeness of these transcriptome libraries and the effectiveness of our annotation process, we searched the annotated sequences for genes involved in COG classifications. The 6444 assembled All-unigenes were divided into 25 categories (Fig. 4 posttranslational modification, protein turnover, chaperones (13.98%), and carbohydrate transport and metabolism (12.37%), while extracellular structures (20, 0.31%) and nuclear structure (2, 0.09%) were the least represented.
Sequence orientations were determined according to the best hit in the database. For protein coding region prediction, All-unigenes were aligned by blast search against the nr, Swiss-Prot, KEGG, and COG databases, using an E-value cut-off of 10 −5 . A total of 21,166 coding sequences (CDSs) mapped to the protein database, and there were 1591 predicted CDSs (ESTscan-CDSs) ( Supplementary Fig. S1). Among All-unigenes with CDSs, 13,292 (62.8%) were < 500 bp, while 7260 (34.3%) were 500-2000 bp and 144 were > 3000 bp. We performed open reading frame (ORF) predictions using getorf software from the EMBOSS (v. Possible genes related to phototransduction-fly and circadian rhythm-fly pathways. We investigated the presence of phototransduction and circadian rhythm pathways in the M. separata transcriptome and mapped the transcripts to pathways in the KEGG database (Supplementary Figure S2). The putative identified genes required for phototransduction-fly, such as Gq, PLCβ, PKC, TRP, TRPL, INAD, arr2, NINAC and CamkII, and those for circadian rhythm, such as Dbt, Per, Tim, Sgg, Cyc, dCLK, Vri and Pdp, respectively, were first identified in M. separata (Supplementary Tables S6 and S7). TRP, Arr2, PLCβ, Dbt, Per, Tim, and Vri had full ORFs. We identified three opsins, UV wavelength-sensitive opsin (Unigene9668_All), long-wavelength opsin (Unigene13996_All), and blue-sensitive visual pigment (CL2771.Contig1_All, CL2771.Contig2_All, CL2771. Contig4_All and CL2771.Contig5_All), all with amino acids homologies of 100% with KF539458, KF539446, and KF539428, respectively 45 . Meanwhile, a putative new opsin (Unigene10271_All) with an amino acid homology > 76% with the predicted UV-sensitive-like opsin from Amyelois transitella was also identified. In this study,   Annotation analysis of DEGs. To clarify the functions of DEGs involved in response to light treatment, we mapped all the DEGs to terms in the GO database and compared the results with the whole transcriptome background. The DEGs had a GO ID and could be categorized into small functional groups in three main GO categories of biological process, cellular component, and molecular function. Based on sequence homology, we identified significant DEGs annotated by the GO database. We categorised 53, 52, 54, and 52 functional groups in dark-vs-WL, dark-vs-UVL, dark-vs-YL, and UVL-vs-YL, respectively (Supplementary File 4, Fig. S3). Among these groups, "cellular process" and "single-organism process" were dominant within the "biological process" category, "cell" and "cell part" categories were dominant in the "cellular component" category, and "binding" and "catalytic activity" were dominant in the "molecular function" category.
We investigated the biochemical pathways of these DEGs further by mapping all the DEGs to terms in the KEGG database and comparing this with the whole transcriptome background. The DEGs had a KO ID and could be categorized into small pathways (Supplementary File 5, Fig. S4). Of the 12,012, 12,950, and 14,855 DEGs between the dark-vs-WL, dark-vs-UVL, and dark-vs-YL groups, 1938, 2482, and 2925 unigenes had KO IDs and could be categorized into 247, 248, and 252 pathways, respectively. Genes involved in metabolic pathways were the most significantly enriched.

Validation of expression profiles by RT-qPCR.
To confirm the expression profile data, we further examined the relative expression levels of eight selected genes and quantified the relative expression levels between dark and three light-treated samples by RT-qPCR. Three heads for each sample were used for each gene validation. Among the eight analyzed genes, seven were detected to have similar fold changes to the expression profile data (Fig. 7). These comparative analyses revealed a concordance rate > 87.0% between RT-qPCR analysis and expression profile data, thus validating the accuracy and reliability of the sequencing data.

Discussion
Light is an important environmental signal for most insects 19,27,46 . Light stress refers to the detrimental effect that exposure to insufficient or excess levels of light can have on an organism's function and development. Mortality of second instar larvae of M. separata was shown to be higher than in the third and fourth instar larvae after UV-A radiation 34 , and increased UV-A radiation of pupae could decrease the rate of adult emergence in this species 34 . As a major abiotic force, light has a wide range of biological effects on insects 19,24,47 , which are known to be highly sensitive to artificial light 23 . Light can influence the physiology, behaviour, and reproduction of insects, especially nocturnal insects [22][23][24][25] , and these influences may vary depending on the wavelength of the light 19,27 . Numerous studies have discussed the ecological impacts of light on insects, including disturbances of biological rhythms, orientation, and migration, and of basal activities such as foraging and mating behaviours, as well as effects on reproductive success 18,24,27 . However, the effects of light on nocturnal insects at the molecular level are largely unknown because of a lack of genomic information.
M. separata is an important nocturnal pest. Although full-genome information is lacking for M. separata, our collection of unique transcripts may represent a significant proportion of the functional genes in this species. In this study, we used the heads of M. separata exposed to different environmental light conditions to perform a genome-wide investigation by transcriptional sequencing and gene expression profile analysis. The results of this study provide the first transcriptomic analysis of M. separata in relation to light stress, and also provide useful information for studying the molecular mechanisms of responses to light stress in nocturnal insects.
In this study, we assembled 46,327 unigenes from the heads of the nocturnal moth M. separata, with or without light exposure. Among our transcriptome data, 21,328 unigenes were annotated and showed specific Swiss-Prot matches. The annotated unigenes were further used for functional annotation and classification using the GO and KEGG databases, which provide information about probable biological functions and biosynthesis pathways. Functional annotation of these unigenes demonstrated that they covered most biological processes. Of the unigenes with significant hits in the nr database, 10,621 were categorized into 57 sub-categories, and 13,394 unigenes were mapped into 258 significant KEGG pathways. Regarding secondary metabolite pathways, genes for sensory system, environmental adaptation, and digestive system were strongly represented. These categories were considered to be major components of the response to environmental light stress. However, up to 47.62% of unigenes were not annotated by BLAST, possibly because of limited information about the genomes or transcriptomes of the moth and its related species 48,49 . These unmatched unigenes might be candidates for novel gene discoveries.
We investigated 88, 68, 52, 72, and 36 unigenes involved in phototransduction-fly, phototransduction, circadian rhythm-fly, olfactory transduction, and taste transduction, respectively, by transcriptomic analysis. Many genes involved in sensory system and environmental adaptation were found to undergo expression changes through comparative transcriptomic analyses. Phototransduction and circadian rhythm are both regulated by light and represent important biological processes in living organisms. TRP channels have important functions in phototransduction in adult insect visual systems 50 . These channels can detect and respond to light changes in the environment and thereby influence many insect behaviours 51 . Previous studies have indicated that many important proteins are involved in phototransduction and circadian rhythms. However, our current understanding of insect phototransduction and circadian rhythm has mainly been derived from D. melanogaster 36,39 . Rhodopsin, G(alpha) q, PLC, PKC, TRP, and TRPL are important transduction proteins in phototransduction processes, and genes encoding these proteins were identified in M. separata in this study. Several well-known Drosophila circadian clock genes, such as period, timeless, Clock, vrille and Pdp1, have been also found in this species. Opsins are membrane proteins, and play key roles in insect phototransduction by absorbing light and activating G-protein 52 . In addition to the three kinds of opsins previously identified in M. separata 45 , we also found a putative new opsin. The unigenes required for phototransduction-fly and circadian rhythm identified in M. separata showed shared descent with those of D. melanogaster and other Lepidoptera. INAD and NINAC were first identified in Lepidoptera in this study. These findings have thus greatly enriched our current knowledge of M. separata gene expression profiles, and provided molecular bases for further studies of the response mechanisms of M. separata to light stress.
The current transcriptomic analysis identified many DEGs between insects treated with different wavelengths of light, with different numbers of DEGs in GO and KEGG terms and pathways. The regulated unigenes covered a variety of functions, and analysis of GO categories revealed enrichment of genes involved in stress response, circadian rhythm, phototransduction, olfactory transduction, taste transduction, and DNA repair. Many genes were differentially up-regulated or down-regulated in light-treated compared with dark-treated M. separata. Comparison between the UVL and YL libraries identified 4,914 up-regulated and 8590 down-regulated unigenes. Compared with the dark-treated group, 15 genes encoding heat shock proteins were up-regulated or down-regulated, while genes such as cytochrome P450 (Unigene13937_All) was repressed in WL, YL and UVL groups. Antennal binding protein (Unigene4550_All) was up-regulated in the UVL and YL but down-regulated in the WL group in the olfactory transduction pathway. Overall, more genes were down-regulated than up-regulated in the three treated groups compared with the dark control, suggesting that light largely inhibited gene expression, with possible implications for impaired biological functioning. However, further studies are needed to determine the significance of the altered expression of specific genes or pathways.
In summary, we obtained 46,327 unigenes from M. separata by transcriptomic sequencing, and identified numerous light stress-associated DEGs and signal pathways by gene expression profiling. These data suggest that the moth might respond to light stress at the transcriptomic level by decreasing or increasing expression levels of genes related to metabolism, phototransduction, olfactory transduction, and taste transduction. Overall, these results provide the first informative reference dataset for future studies on global and specific responses of insects to light at the molecular level, and will facilitate further gene discoveries and biomarker identification in M. separata and other nocturnal insects. Our work may also provide some new insights into the development of light traps for forecasting the outbreaks and controlling the activities of M. separata.

Materials and Methods
Insect rearing and light treatments. M. separata larvae were fed on maize seedlings in the laboratory at 25 ± 1 °C and 70% ± 5% relative humidity, with a photoperiod of 14 h:10 h (light:dark). Pupae were sexed and kept separately until eclosion. M. separata used in all experiments were obtained from a colony maintained at the Institute of Plant Protection, Henan Academy of Agricultural Sciences, Zhengzhou, China. Adult moths were fed on a 10% honey solution.
Three-day-old adult moths (female:male 1:1) were collected at 06:00 and then treated with white light (400-750 nm and 1500 lux) for 6 h. The moths were then divided randomly into a group (WL) maintained under white light, while the other moths were further divided into three groups: dark, moths maintained in complete darkness for 3 h after constant white light for 6 h; UVL, moths maintained under UV light (365 nm, 110 lux) for 3 h after constant white light for 6 h and dark for 3 h; YL, moths maintained under yellow light (589 nm, 110 lux) for 3 h after 6 h white light and 3 h dark treatment. Experiments were performed a total of three times. For each treatment, at least 60 adult heads were used and the collected samples were frozen immediately in liquid nitrogen after sampling and stored at − 80 °C until further use.
RNA extraction, cDNA library construction, and Illumina sequencing. Total RNA was extracted using TRIzol reagent (Invitrogen Carlsbad, CA, USA) according to the manufacturer's instructions. For transcriptome sequencing, equal amounts of total RNA from four samples were pooled and treated with DNase. The concentration and quality of DNase-treated RNA was evaluated using a NanoDrop 2000 UV-vis Spectrophotometer (Thermo Scientific, Waltham, MA, USA) and 1% formaldehyde gel electrophoresis. RNA integrity was assessed on an Agilent 2100 Bioanalyzer (Agilent Technologies, Inc., CA, USA). mRNA was isolated from total RNA using magnetic beads with oligo(dT) and sheared into short fragments using fragmentation buffer. cDNA was then synthesized from the mRNA fragments using SuperScript III Reverse Transcriptase (Invitrogen), according to the manufacturer's manual. The short fragments were then connected with sequencing adapters and analyzed by agarose gel electrophoresis. Suitable fragments were enriched by PCR amplification to construct the cDNA library. Four pooled cDNA libraries were constructed using an mRNA-Seq assay for paired-end transcriptome sequencing and sequenced on Illumina HiSeq2000 system at Beijing Genomics Institute (BGI, Shenzhen, China) to generate 100 base single-end reads. The raw data from Illumina deep-sequencing have been deposited in the Sequence Read Archive (NCBI) (accession numbers: SRR3658912-SRR3658915).
Functional annotations and ORF prediction of All-unigenes. The raw reads generated by Hiseq 2000 were filtered to remove low-quality reads (reads containing adaptor, reads containing > 5% unknown nt "N", and reads with > 20% quality value ≤ 10). Clean reads were then de novo assembled using Trinity software (version release-20130225) to generate a collection of non-redundant unigenes without a reference genome 53 . In this study, Trinity generated transcriptome assemblies from short-read sequences using the de Bruijn graph algorithm, and K-mer was set at 25 bp. Trinity first combined clean reads to form longer contigs, and then assembled contigs into unigenes. Finally, unigenes from each sample's assembly were further processed by sequence splicing and redundancy removal with sequence-clustering software to acquire non-redundant unigenes that were as long as possible, to form global transcriptome data for M. separata heads. After clustering, the unigenes were divided into clusters and singletons.
Assembled unigenes were subjected to blastx (BLAST, the basic local alignment search tool) alignment (E value < 1e-5) and several protein databases, including nt, Swiss-Prot, KEGG, COG, and Interpro 54,55 . We performed GO functional classification for All-unigenes using Blast2GO, according to molecular function, biological process, and cellular component 56,57 . The genes' complex biological behaviours were further examined by pathway annotation using KEGG identifiers (http://www.genome.jp/kegg/). These blasted unigenes were then used to extract CDS and translated into peptide sequences. Unigenes without BLAST hits were predicted using ESTScan (version v3.0.2) and then translated into peptide sequences 58 . The ORFs of unigenes were predicted using getorf software (http://www.bioinformatics.nl/cgi-bin/emboss/getorf) 59 . For a coding sequence with more than two ORFs, the longest one was identified as the sequence of interest.
Comparative transcriptomics. The transcriptome datasets of H. armigera (SRX518090) and H. assulta (SRX733617 and SRX707455) were downloaded from the NCBI Sequence Read Archive and de novo assembled using Trinity (version release-20130225). All-unigenes of M. separata were mapped to the transcriptomes of H. armigera and H. assulta. The BLASTn algorithm was used to test sequence similarities, with a threshold E-value < 1e −5 . The sequences were compared with the longest contig from each of the transcripts identified in the M. separata transcriptome.
Identification of DEGs. DEGs were determined based on their expression abundances in the different treatment groups using SOAP (version 2.21) (http://soap.genomics.org.cn/soapaligner.html). FPKM values were used to evaluate expression abundances of genes and to quantify transcript levels among the different samples 59 . DEGs were based on differences in expression abundances among the four cDNA libraries, with FDR ≤ 0.001 and the absolute value of log 2 Ratio ≥ 1 as the threshold for significant differential gene expression 60 . DEGs were also annotated using the GO database 57 , and the numbers of DEGs in each GO term were calculated. KEGG pathway analysis of the DEGs was also performed to identify the associated biochemical and signal transduction pathways 54 .
RT-PCR validation. The expression patterns of selected genes were analyzed by RT-qPCR using a Thermal Cycler Dice Real Time System Lite (TaKaRa, Japan). Each reaction contained 12.5 μ l of 2 × SYBR Premix Ex Taq II (TaKaRa), 2 μ l of cDNA, and 1 μ M of gene-specific primers in a final volume of 25 μ l. The PCRs were performed under the following conditions: 95 °C for 5 min, followed by 40 cycles of 95 °C for 5 s, 60 °C for 30 s and 72 °C for 20 s. Relative expression levels of each gene were calculated using the 2 −ΔΔCt algorithm 61 by normalizing to expression of M. separata arginine kinase gene, which was used as an internal control. Three technical replicates were used for each sample and the data were shown as means ± standard errors. The primer sequences are listed in Supplementary Table S8 .