RNA-Seq analysis and de novo transcriptome assembly of Cry toxin susceptible and tolerant Achaea janata larvae

Larvae of most lepidopteran insect species are known to be voracious feeders and important agricultural pests throughout the world. Achaea janata larvae cause serious damage to Ricinus communis (Castor) in India resulting in significant economic losses. Microbial insecticides based on crystalline (Cry) toxins of Bacillus thuringiensis (Bt) have been effective against the pest. Excessive and indiscriminate use of Bt-based biopesticides could be counter-productive and allow susceptible larvae to eventually develop resistance. Further, lack of adequate genome and transcriptome information for the pest limit our ability to determine the molecular mechanisms of altered physiological responses in Bt-exposed susceptible and tolerant insect strains. In order to facilitate biological, biochemical and molecular research of the pest species that would enable more efficient biocontrol, we report the midgut de novo transcriptome assembly and clustering of susceptible Cry toxin-exposed and Cry toxin tolerant Achaea janata larvae with appropriate age-matched and starvation controls.


Background & Summary
Bacillus thuringiensis (Bt) insecticidal proteins used in sprayable formulations and transgenic crops are the most promising alternatives to synthetic insecticides. However, the evolution of resistance in the field, as well as laboratory insect populations is a serious roadblock to this technology. Achaea janata, a major castor crop pest in India, is controlled using Bt-based formulation 1 comprising of Cry1 (Cry1Aa, Cry1Ab, and Cry1Ac) and Cry2 (Cry2Aa and Cry2Ab) genes 2 . Recent studies from our group reported extensive changes at the cellular and molecular level in the midgut of A. janata exposed to a sublethal dose of the Bt formulation [3][4][5] . Since a decade, reports of resistance against Bt toxins and their mechanisms have been emerging 6,7 . Long term exposure to Cry toxin formulations promotes tolerance in larvae which eventually leads to resistance [6][7][8] . Development of Bt resistance could be due to alterations in proteolytic cleavage of the Cry toxin, altered receptor binding or enhanced midgut regeneration responses [9][10][11] . With the advent of next generation sequencing technology it is now possible to characterize the entire repertoire of transcripts under different conditions and predict pathways involved in various molecular mechanisms. The RNA sequencing study presented here generated the first de novo transcriptome assembly of castor semilooper, Achaea janata (Noctuidae: Lepidoptera), and compared gene expression signatures between toxin-exposed susceptible and tolerant larvae. This article, is a first step in determining the primary basis for Cry tolerance in the pest, which could facilitate new long term management strategies.
Methods toxin administration and sample preparation. Wild population of A. janata larvae, unexposed to pesticides, was field collected from the Indian Institute of Oil Seed Research, Hyderabad, India. Further, the larvae were reared and maintained on castor leaves at 27 ± 2 °C, 14:10 h (light: dark) photoperiod and 60-70% relative humidity for three generations at the insectary of School of Life Sciences, University of Hyderabad, India. In the present de novo transcriptome analysis for the sublethal toxin exposure, 1/10 of LD 50 was used (Group ii) (Fig. 1), (2019) 6:159 | https://doi.org/10.1038/s41597-019-0160-0 www.nature.com/scientificdata www.nature.com/scientificdata/ while for the generation of a tolerant population (Group iv) ( Fig. 1) an LD 50 dose of DOR Bt-1 formulation was administered 1 . Larval batches (n = 100) designated as Cry-susceptible larvae and control larvae were exposed to toxin-coated and distilled water coated leaves respectively. The midgut was isolated from 15 randomly selected surviving larvae from each batch after every 12 h till 48 h. In earlier study we noticed that larvae probably sense the toxin and avoid feeding on toxin coated leaves after a short exposure. Hence, to eliminate any effect induced by starvation, an additional batch (Group iv) of 3 rd instar larvae was maintained on moist filter paper and collected for the midgut isolation every 12 h till 48 h. All the midgut dissections were carried out in ice-cold insect Ringer solution (130 mM NaCl, 0.5 mM KCl, and 0.1 mM CaCl 2 ). The experiment was performed in duplicates. For the Cry tolerant larval population, larvae (n = 100) in each generation were exposed to LD 50 dose and the surviving insects were maintained for larval development, pupal molting, adult emergence and egg laying. The larvae hatched from the eggs were collected and reared till 3 rd larval instar larvae and exposed to LD 50 Bt dose once again. This schedule was carried out for fifteen generations. The batch (n = 100) of Cry tolerant larvae thus generated were exposed to toxin-coated leaves and the midguts were isolated from randomly selected fifteen larvae after 24 h. Total RNA was isolated from the midgut samples using Trizol-based method. The RNA was quantified using NanoDrop TM 8000 spectrophotometer and the quality was assessed using 1% formaldehyde denaturing agarose gel. Library preparation. Illumina 2 × 150 pair end libraries were prepared as follows. Briefly, mRNA was enriched from isolated total RNA and fragmented. The fragmented mRNA was used for first-strand cDNA synthesis, followed by second-strand generation, A-tailing and adapter ligation. Adapter ligated products were purified and PCR amplification was carried out. PCR amplified cDNA libraries were assessed for quality and quantity using DNA High Sensitivity Assay Kit (Agilent Technologies).
Quality assessment prior to cluster generation and sequencing. The amplified libraries were analyzed using Bioanalyzer 2100 and High Sensitivity DNA chip (Agilent Technologies). After obtaining the Qubit concentration for each of the libraries, it was loaded on Illumina platform (2 × 150 bp chemistry) for cluster generation and sequencing. Data was generated on Illumina HiSeq. 2500 system and paired-end sequencing allowed the template fragments to be sequenced in both the forward and reverse directions. The library molecules bind to complementary adapter oligos on paired-end flow cell. The adapters were designed to allow selective cleavage of the forward strands after re-synthesis of the reverse strand during sequencing. The copied reverse strand was then used to sequence from the opposite end of the fragment. Total RNA was subjected to pair-end library preparation with Illumina TruSeq Stranded Total RNA Library Preparation Kit. The mean size of the libraries was between 357 bp to 567 bp for the 28 samples. The libraries were sequenced and high quality data was generated for ~ 3.05 GB data per sample (Online-only Table 1). Sequence analysis. Illumina 2 × 150 pair end libraries were prepared using the Illumina TruSeq stranded mRNA Library Preparation Kit and as per the firm's protocol (Illumina Inc.). The amplified libraries were analyzed on the Bioanalyzer 2100 with a High Sensitivity DNA chip (Agilent Technologies). The de novo master assembly was generated using "SOAP-denovo-Trans (v1.03)" assembler (Short Oligonucleotide Analysis Package) 12 . For each data set, raw quality was assessed and filtered with Trimmomatic (v.0.36) 13 . Transcripts were clustered using the CD-HIT (Cluster Database at High Identity with Tolerance) package 14 . The predicted proteins www.nature.com/scientificdata www.nature.com/scientificdata/ of CDS (Coding sequence) were subjected to similarity search against NCBI's non-redundant (nr) database using the BLASTP (Basic Local Alignment Search Tool) algorithm.

Description
Master Assembly  www.nature.com/scientificdata www.nature.com/scientificdata/ technical Validation SOAPdenovo-Trans assembler was used to generate de novo transcriptome assembly from four experimental sets of midgut samples viz. Group (i) susceptible larvae exposed to medium (water), Group (ii) susceptible larvae exposed to 1/10 of LD 50 dosage of DOR Bt-1 formulation, Group (iii) susceptible larvae subjected to starvation and Group (iv) tolerant larvae exposed to LD 50 dosage of DOR Bt-1 formulation (reared for 15 generations) (Fig. 1). A total of 1,74,066 transcripts were generated for master assembly with a transcriptome length of 10,02,47,510 bps (base pairs). A total of 1,36,818 unigenes were reported using CD-HIT and 35,559 coding sequences were predicted by Transdecoder. The top-hit species distribution revealed that majority (23%) of the CDS aligned with Spodoptera litura followed by Helicoverpa armigera and Heliothis virescens all of which belong to family Noctuidae in the Lepidoptera order. transcriptome assembly. The de novo master assembly of high quality reads of 28 processed samples was accomplished using "SOAP-denovo-Trans (v1.03)" assembler 12 . For each data set, raw quality (phred40) was assessed and filtered with Trimmomatic (v.0.36) using the parameters ILLUMINACLIP:adapter.fasta:2:30:8 MINLEN:40 to remove adaptor sequence and filter by quality score 13 . An average of 19 million clean reads were obtained. Statistics of high quality reads with total reads, base count and data size are summarized in Online-only Table 1 and statistics of assembled transcripts as well as length distribution is presented in Table 1.

Description Metrics
Total number of cds 35,559 Total size of all cds in bps 25,527,927 Average cds length in bps 717 Maximum cds length in bps 8,595

Metrics CDS
Length >5000 32 Table 3. Statistics and length distribution of the predicted CDS. www.nature.com/scientificdata www.nature.com/scientificdata/ Clustering. To filter the redundancy or the noise, it was required to select one representative transcript for transcripts clusters. Transcripts were clustered using CD-HIT (Cluster Database at High Identity with Tolerance) package 14 . CD-HIT-EST v4.6.1 was used to remove the shorter redundant transcripts when they were 100% covered by other transcripts with more than 90% identity. The non-redundant clustered transcripts were then designated as unigenes (Table 2). CDS were predicted from the unigene sequences with Transdecoder at default parameters which resulted in the identification of 35,559 CDS (Table 3).
annotation. The predicted proteins of CDS were subjected to similarity search against NCBI's non-redundant (nr) database using the BLASTP algorithm. Out of total 35,559 proteins, 32,561 proteins were captured with hits and 2,998 with no hits (Annotation of each transcript of the assembled transcriptome) 17 . The top-hit species distribution revealed that majority of the hits were found to be against the species Spodoptera litura followed by Helicoverpa armigera and Heliothis virescens (Fig. 2). Simultaneously, all protein sequences were searched for similarity against NR, UniProt (Universal Protein Resources), KOG (EuKaryotic Orthologous Groups) and Pfam database using BLASTP with an e-value threshold of 1e −5 . The BLAST result of four databases has resulted in Fig. 3.