Transcriptome sequence analysis and mining of SSRs in Jhar Ber (Ziziphus nummularia (Burm.f.) Wight & Arn) under drought stress

Ziziphus nummularia (Burm.f.) Wight & Arn., a perennial shrub that thrives in the arid regions, is naturally tolerant to drought. However, there are limited studies on the genomics of drought tolerance in Ziziphus sp. In this study, RNA-sequencing of one month old seedlings treated with PEG 6000 was performed using Roche GS-FLX454 Titanium pyrosequencing. A total of 367,176 raw sequence reads were generated, and upon adapter trimming and quality filtration 351,872 reads were assembled de novo into 32,739 unigenes. Further characterization of the unigenes indicated that 73.25% had significant hits in the protein database. Kyoto encyclopedia of genes and genomes database (KEGG) identified 113 metabolic pathways from the obtained unigenes. A large number of drought-responsive genes were obtained and among them differential gene expression of 16 highly induced genes was validated by qRT-PCR analysis. To develop genic-markers, 3,425 simple sequence repeats (SSRs) were identified in 2,813 unigene sequences. The data generated shall serve as an important reservoir for the identification and characterization of drought stress responsive genes for development of drought tolerant crops.


Material and Methods
Plant materials and stress conditions. The seeds of Z. nummularia were procured from the Central Institute for Arid Horticulture (ICAR-CIAH), Bikaner, India. After removing the hard coat, the seeds of Z. nummularia were sown in 15 cm diameter pots containing soilrite and allowed to grow in growth chambers at 30 °C with 16 h light/8 h dark photoperiod at the National Phytotron Facility, Indian Agricultural Research Institute, New Delhi, India. Four-week-old seedlings were used to initiate drought stress. Fifteen whole seedlings (3 for each hour of treatment) were treated with 30% Polyethylene glycol M.W. 6000 (PEG 6000) in 0.5x Murashige and Skoog media (w/v) (Himedia, India) supplemented with 1 mM 2-(N-morpholino) ethanesulfonic acid (MES) (w/v) (Himedia, India) to introduce drought stress at 0.1 Mpa (osmotic potential). The seedlings (whole plant including roots, stem and leaves) were harvested after 6 h, 12 h, 24 h, 48 h and 72 h of treatment and three untreated (control) seedlings were also harvested simultaneously for further downstream processing. All the harvested seedlings were immediately frozen in liquid nitrogen and stored in a deep freezer (−80 °C) until use.
The osmotic potential was calculated as follows: where, C denotes the molar concentration of solute in molL −1 , R is the universal gas constant (R = 8.314 kPa L/ mol K) and T is the absolute temperature (i-e, experimental temperature 30 °C + absolute temperature).
RNA extraction and transcriptome pyrosequencing. The total RNA from both treated and control seedlings was extracted using Spectrum TM Plant Total RNA kit (Sigma-Aldrich, St. Louis, MO 63103, USA) as described by the manufacturer and DNA contamination was removed by on column DNase I Digestion set (Sigma-Aldrich). The integrity and concentration of RNA were checked by electrophoresis on 1.2% agarose/ EtBr gel in 1X TAE buffer, UV spectrophotometry (Thermo scientific, Waltham, MA, USA) and a Bioanalyzer Chip RNA7500 series II (Agilent Technologies, USA). The mRNA was purified from 3 µg total RNA using Oligotex mRNA mini kit (Qiagen, Germany) as per manufacturer's instructions. Equal concentrations of purified mRNA (200 ng) from both control and treated samples were used as template for random hexamer-primed synthesis of first strand cDNA, followed by double strand cDNA synthesis using cDNA Synthesis System Kit (Roche, Switzerland). The double strand cDNA was purified by PCR purification kit (Qiagen) and concentration of the purified cDNA was determined using Bioanalyzer 2100 (Agilent Technologies). The purified double strand cDNAs were sheared via nebulization into small fragments, followed by ligation to adapters using GS FLX Titanium Rapid Library MID Adaptors kit (Roche, Switzerland). The libraries were pooled before sequencing and loaded on to Titanium PicoTiter Plate followed by pyrosequencing on Roche GS-FLX454 Titanium platform.
Differential expression analysis of drought-responsive genes. In order to validate the robustness of the obtained transcriptome profile read analysis, 16 putative drought-responsive genes were selected and their expression patterns were studied at different intervals of PEG6000-induced drought stress by qRT-PCR analysis. Primers were designed by using Integrarted DNA technology (IDT) software. The description of the genes selected and their corresponding primer sequences are given in Table 1; the related BLAST annotations of these genes is provided in supplementary Table S1. The total RNA was isolated from control and stressed samples, as described above and cDNA was synthesized by SuperScript ® III first strand synthesis system (Invitrogen, USA) using oligo (dT) primer. Ziziphus elongation factor1 (Zjef1) [GenBank: EU916202] gene was used as internal control 31 . The reaction mixtures of qRT-PCR were set up in a total volume of 10 µl containing 100 ng template cDNA, 200 nM of each primer and 1 µl of 10X KAPA SYBR FAST qPCR master mix buffer (KAPA Biosystems, USA

Results
De novo assembly, annotation and Functional characterization. To gain insights into the transcriptome profile of Z. nummularia in response to drought stress, a pooled cDNA library at different intervals of PEG 6000 induced drought stress was constructed and sequenced. A total of 367,176 raw reads were generated and after removing the low-quality reads, 351,872 (95.83% of all the reads) high quality (HQ) reads were used for subsequent analysis. Among these HQ reads, control (C) and stress (S) library contained 1,14,012 and 2,37,860 sequences, respectively ( Table 2). The de novo assembly was carried out using iAssembler v1.0 (beta) tool with default or optimized parameters. A total of 351,872 high quality reads were assembled into 32,739 unigenes (Supplementary Table S1 For annotation and functional characterization, non-redundant (NR) databases of NCBI and TAIR were used. All the unigenes were aligned with unigenes of other plant species using BLASTx program with an E-value cut-off of 10 -06 . Out of 32,739 unigenes annotated, 20,291 (61.97%) showed significant BLAST matches and it was observed that many unigenes (two or more) were annotated against the same protein. A total of 9,487 (28.97%) unigenes did not show any hits with the known protein, so they may be considered as putative novel unigenes. The number of annotated sequence showing similarity to Ziziphus jujuba was the largest (74.36%), followed by Morus notabilis (2%), Glycine max (1.58%), Vitis vinifera (1%) and Citrus sinensis (0.89%) (Supplementary Figure S1, Supplementary Table S2).

Differential expression analysis of genes involved in drought tolerance.
Based on the qRT-PCR data analysis, the selected genes displayed different expression profiles, with chaperone protein ClpB3 and secretory carrier membrane protein encoding genes showing significant down regulation compared to the control, and genes encoding calcium-dependent protein kinase and heat shock protein 20 showing expression at par with the control (Fig. 4). All other genes displayed significant up-regulation with time compared to the control, with LEA gene showing the maximum up-regulation (56.2-fold) at 48 h of PEG 6000 treatment (Fig. 4). The qRT-PCR expression patterns of all the candidate genes were in agreement with the transcriptome data, and therefore validated the results of the first de novo transcriptome data.

Discussion
Z. nummularia, an inherently drought-tolerant non-model plant, is an important untapped genetic resource to study the genetic and metabolic control of adaptation to drought stress 7 . Next-generation sequencing technology (NGS) has been extensively used to study the model plant transcriptomes and due to accuracy, rapidity and low cost, its use has also been extended to unearth the unique genetic characteristics in the non-model plant systems 33 . The sequences generated were de novo assembled using iAssembler software. The iAssembler generates initial assemblies by MIRA and CAP3 assemblers and makes correction in two common types of transcriptome assembly errors: i) ESTs from different transcripts (mainly alternatively spliced transcripts or paralogs) are incorrectly assembled into same contigs; and ii) ESTs from same transcripts fail to be assembled together to deliver highly accurate de novo assemblies of EST sequences 34 . In spite of the development of several bioinformatics tools for data assembly and analysis, de novo assembly of short reads without a reference genome, is very challenging task 35,36 . It is very important to choose a suitable assembler and parameters to assemble large-scale ESTs into consensus sequences with significantly high accuracy.
In the present study, the average GC content of Z. nummularia transcripts was observed to be 42.93% which is in agreement with the previously reported value in other dicots viz., 44.6% for Picrorhiza kurrooa 37 , 33.41% for genome of Z. jujuba 18 , and 41.0% for Nicotiana benthamiana 38 . The GC content represents the stability of DNA as well as of the genes and genomic composition including evolution, gene structure and regulation 39 , and the changing GC content may reveal the adaptability to different climatic conditions.
The functional annotation results showed that a total of 23,984 (73.25%) unigenes have significant BLAST matches, which is higher than the previously reported data, 54.9% in bamboo 40 , and 54.31% in Ziziphus jujuba 5 . This higher value indicates the extensive coverage of the Z. nummularia transcriptome in this study. A total of 8,112 (24.77%) unigenes did not match with any of the known proteins, this is a common feature associated with de novo sequencing and varying percentages like 45.73%, 40.70%, 62.48%, 42.78% by Dang et al. 41 , Long et al. 42 , Wu et al. 43 and Ma et al. 6 , respectively of such unigenes has been reported previously. The presence of these un-identified unigenes can be partly ascribed to the dearth of genome and EST information of Z. nummularia in the public domain and also to the short read length commonly associated with the next-generation sequencing technology 6 . The gene ontology results revealed that a large number of unigenes were involved in cellular process, metabolic process and response to stimulus (Fig. 2). The functional GO results indicated the dominance of gene  regulation, signal transduction and enzymatically active processes in transcriptome, as maximum unigenes were found to be belonging to the binding activity, catalytic activity, transport activity and transcription regulator activity categories (Fig. 2). The various categories identified by the GO analysis classified the unigenes as, those involved in response to abiotic and biotic stimulus. Among these, the unigenes were further found to be involved in heat acclimation, cellular response to hydrogen peroxide, osmotic stress, oxidative stress, unfolded protein, water deprivation, drought recovery, endoplasmic reticulum unfolded protein response, hydrogen peroxide catabolic process, starvation, osmosensory signaling pathway, priming of cellular response to stress and cation stress. All these categories are reported to participate in drought tolerance mechanisms 44,45 . Drought tolerance mechanism acts in combination with drought avoidance and tolerance periods; it also involves many genes that play a major role in osmotic and redox homeostasis processes and participate in the readjustment process. Some of the unigene categories identified were found to be involved in response to salt, cold and heat stresses, hyperosmotic response, and hyperosmotic salinity response, indicating that various abiotic stresses follow multiple pathways for stress perception and signaling, which cross-talk at various points 46 . These results suggested that transcriptome sequencing was successful in collecting the information about cytosolic response to drought stress in Z. nummularia and it would contribute to our understanding about the drought tolerance mechanism in Z. nummularia. The KEGG pathway mapping resulted in the identification of 114 biological pathways from 1,140 annotated unigenes (Fig. 3, Supplementary Table S3). Li et al. 5 reported 16,693 Z. jujuba annotated unigenes, consisting of 6,485 isotigs and 10,117 singletons, which were assigned to 319 KEGG pathways. Some of the pathways, identified in our study, that are involved in biosynthesis of genes responsive to drought stress are purine metabolism 47 , starch and sucrose metabolism 48 , fructose and mannose metabolism 49 , arginine and proline metabolism 50 , aminobenzoate degradation 43 , thiamine metabolism 51 , and phenylalanine metabolism 52 . These analyses showed that the diversity and variation in Z. nummularia genome provide precious information for investigating specific processes, functions and pathways and thereby making possible the discovery of novel genes which are responsive to drought stress in Z. nummularia.
In transcriptome of Z. nummularia, a total of 3,425 SSRs were identified with a frequency of 8.59%, which is lower than that observed in the fruit transcriptome of the related species Z. jujuba (10.1%) 5 , and floral transcriptome of Z. celata (17%) 53 . A total of 3,425 SSRs were recognised in 2,813 sequences, with one SSR observed for every 3.93 kb of the analysed sequences. Previously, one SSR locus for every 0.94 kb (kb/SSR) was detected in Triticum aestivum 54 , 3.87 kb/SSR in Z. jujuba 5 , 3.55 kb/SSR in Camellia sinensis 55 , 6.22 kb/SSR in Arachis hypogaea 56 , 6.69 kb/SSR in Epimedium sagittatum 57 , and 17.56 kb/SSR for Cymbidium ensifolium 58 transcriptomes. The incidence of SSRs in plant genomes is governed by the repeat length and the standard used for searching SSRs in databases 59 . Moreover, the depth of sequencing coverage also influences the detection rate and since 454-pyrosequencing gives less depth compared to other contemporary sequencing platforms, the detection could be low in the present case. Among all the nucleotide repeat motifs, AT/AT, AG/CT and AAG/CTT were the most abundant SSR motifs and CG/CG was the least frequent. Our results are consistent with those observed in Sesamum indicum 60 , Epimedium sagittatum 57 , and other plant species 61,62 . These observations suggest that the transcriptome sequencing data is an ideal resource for mining the SSRs in Z. nummularia and its cross transferability will help in studying different species of this family.
Drought tolerance is a complex trait and various adaptive mechanisms are involved at multiple levels like cellular, molecular and morphological to cope with water-deficit conditions. In our study, a large number of drought-responsive genes were obtained in the transcriptome of Z. nummularia and their expression patterns were studied by qRT-PCR analysis. Among them, the cell wall associated hydrolase gene has been reported to play an important role in accumulation of compatible extra-cellular solutes through altering cell wall extensibility and other cell wall modifications under drought and salt stress. High expression levels of the cell wall associated hydrolase gene was reported in Ammopiptanthus mongolicus and may help the specific structural adaptations to cope with severe drought in deserts 63 . Glycine-rich RNA-binding protein harboring RNA chaperon activity has importance in RNA metabolism to prevent the adverse effect of environmental stresses on crop yield. Transgenic rice over-expressing Arabidopsis thaliana glycine-rich RBP (AtGRP or AtGRP7) has shown much higher stress-recovery rates and grain yields in comparison to the wild-type plants under drought stress conditions and has confirmed the importance of post-transcriptional regulation of RNA metabolism in plant under abiotic stress conditions 64 . ABC transporter gene is a membrane-intrinsic gene that has shown high expression level and plays an important role in physiological processes to adapt the plant in changing environments and cope with biotic and abiotic stresses 65 . Based on the qRT PCR analysis, the maximum upregulation was observed for Lea gene (Fig. 4), many studies have reported that over-expression of LEA proteins improve abiotic stress tolerance in transgenic plants. For example, the expression of barley LEA protein (HVA1) conferred drought tolerance in transgenic wheat and rice 66,67 , and wheat LEA proteins PMA80 and PMA1959 enhanced dehydration tolerance in transgenic rice 68 . The involvement of WRKY proteins in various abiotic stresses, such as salinity, drought, and cold has been reported previously 69,70 . Wrky gene from T. aestivum and Oryza sativa have been identified to play an important role in drought tolerance mechanism 71,72 . The enzyme catalase is a main component of anti-oxidative machinery 73 and its expression and activities are always triggered by the environmental stresses [74][75][76] . The upregulation of catalase gene in Z. nummularia at 72 h has suggested that abiotic stresses enhance the transcription of catalase and subsequently control homeostasis in plant cell as described earlier 77 . The qRT-PCR expression patterns of all the candidate genes were found in agreement with the transcriptome data, therefore, the high-throughput RNA sequencing data generated could be considered fairly conclusive.

Conclusion.
Z. nummularia is categorized as a xerophyte plant species with inherent tolerance to drought stress and hence is a rich genetic resource to investigate drought-responsive genes and mechanism of drought tolerance. In this study, we have used a combination of de novo transcriptome sequencing, differential gene expression profiling and SSR marker identification to perform comprehensive analysis of the mechanism underlying drought stress and also for identification of drought-responsive genes in Z. nummularia. Computational analysis has confirmed that the maximum portion of Z. nummularia transcriptome is sequenced, and numerous genes and pathways are involved in tolerance to drought stress. The maximum number of ESTs showing similarity to the known databases and a substantial number of ESTs remaining un-identified do endorse the discovery of putative novel genes from the transcriptome data. Various SSR molecular markers have been characterized, which shall be useful in genotyping and breeding studies in Ziziphus and related species. The data generated can serve as a significant resource for bioprospecting drought-responsive candidate genes. Our findings will complement the available genomic resources and shall further provide the momentum to the ongoing efforts to ameliorate drought tolerance in crop plants.