Comparative de novo transcriptome analysis identifies salinity stress responsive genes and metabolic pathways in sugarcane and its wild relative Erianthus arundinaceus [Retzius] Jeswiet

Erianthus arundinaceus [Retzius] Jeswiet, a wild relative of sugarcane has a high biomass production potential and a reservoir of many genes for superior agronomic traits and tolerance to biotic and abiotic stresses. A comparative physiological, anatomical and root transcriptome analysis were carried out to identify the salt-responsive genes and metabolic pathways associated with salt-tolerant E. arundinaceus genotype IND99-907 and salinity-sensitive sugarcane genotype Co 97010. IND99-907 recorded growth of young leaves, higher proline content, higher relative water content, intact root anatomical structures and lower Na+/K+, Ca2+/K+ and Mg2+/K+ ratio as compared to the sugarcane genotype Co 97010. We have generated four de novo transcriptome assemblies between stressed and control root samples of IND99-907 and Co 97010. A total of 649 and 501 differentially expressed genes (FDR<0.01) were identified from the stressed and control libraries of IND99-907 and Co 97010 respectively. Genes and pathways related to early stress-responsive signal transduction, hormone signalling, cytoskeleton organization, cellular membrane stabilization, plasma membrane-bound calcium and proton transport, sodium extrusion, secondary metabolite biosynthesis, cellular transporters related to plasma membrane-bound trafficking, nucleobase transporter, clathrin-mediated endocytosis were highly enriched in IND99-907. Whereas in Co 97010, genes related to late stress-responsive signal transduction, electron transport system, senescence, protein degradation and programmed cell death, transport-related genes associated with cellular respiration and mitochondrial respiratory chain, vesicular trafficking, nitrate transporter and fewer secondary metabolite biosynthetic genes were highly enriched. A total of 27 pathways, 24 biological processes, three molecular functions and one cellular component were significantly enriched (FDR≤ 0.05) in IND99-907 as compared to 20 pathways, two biological processes without any significant molecular function and cellular components in Co 97010, indicates the unique and distinct expression pattern of genes and metabolic pathways in both genotypes. The genomic resources developed from this study is useful for sugarcane crop improvement through development of genic SSR markers and genetic engineering approaches.


Results
Physiological and root anatomical assay. Sugarcane wild relative E. arundinaceus genotype IND99-907 was collected from the salinity affected area of Padiyattu Kadavu, Ernakulum, India under germplasm exploration programme during 1999 and adopted to high salinity stress with normal plant growth 29 and a potential source for harnessing the salt stress-responsive genes. On the other hand, the sugarcane genotype Co 97010 is highly sensitive to salinity and has been used as a salt-sensitive standard in many studies 30,31 . The 90 days old plants of  and Co 97010 were imposed with salinity stress by irrigating with 175 mM saline water. The leaf elongation rate (LER) was significantly higher in IND99-907 under salinity stress and growth was ceased in   Table S1). The pre-processing of raw reads by removal of adapter sequences, rRNA sequences, and ambiguous bases with high-quality sequences with a Phred score of ≥ 30 (Q30) resulted in 226, 182, 196, and 198 million clean reads from stressed and control libraries of IND99-907 and Co 97010 respectively. We obtained more than 74% of the high-quality clean reads after pre-processing of the raw data resulting in 802 million clean reads which were good enough to perform the RNASeq analysis.
De novo assembly of transcripts and unigenes clustering. The high-quality reads of stressed and control libraries of IND99-907 and Co 97010 were assembled into four de novo assemblies for 907C, 907S, 97010C and 97010S using Trinity v2.8.5. The quality assessment and comparison of assemblies were performed using Quast and BUSCO tools 34,35 Table S1). The high quality clustered assemblies of IND99-907 and Co 97010 were used for transcript quantification and annotation processes.
Transcript abundance estimation and differential gene expression. The Table S4) and no significant enrichment at FDR ≤ 0.05 was observed for cellular components and molecular functions.
KEGG annotation and enrichment. KEGG Table 2. Dehydrin DHNs are a group of LEA proteins involved in membrane stabilization during abiotic stresses 38 was upregulated in both IND99-907 and Co 97010 and the same was reconfirmed in qRT-PCR. NADH dependent malic enzyme upregulated in response to abscisic acid signalling and drought 39

Discussion
Sugarcane is a glycophyte and grows poorly in saline soils 6 . Salinity stress is a main restrictive feature and significantly affect the growth and biomass accumulation 42,43 . Sugarcane wild relative Erianthus arundinaceus is a potential crop for the production of high biomass and reservoir of many genes related to agronomic, biotic and abiotic stresses. We have isolated and characterized many genes from E. arundinaceus 14,15,[19][20][21][22] and many of genes are still unexplored. Besides, transcriptome studies were reported for many traits such as cold transcriptome 23 , sucrose 24 , agronomic traits 25 , lignin 28 , red stripe disease 26 , smut disease 27 except for salinity stress. Hence, we performed the comparative transcriptome analysis to identify the total salt-responsive genes and metabolic pathways associated with salt tolerance in E. arundinaceus genotype IND99-907 and salt-sensitive sugarcane genotype Co 97010. Physiological analyses like leaf elongation rate, relative water content, proline accumulation, root anatomical structure and Na + /K + ratio revealed the salinity tolerance level of IND99-907. The growth of young leaves was ceased in Co 97010 under salinity stress as compared to growing young leaves in IND99-907. The growth of young leaves in tolerant genotypes was also reported in other crops such as maize 44 and sorghum 45 . The accumulation of proline was very high in IND99-907 under salinity stress as compared to Co 97010. Proline acts as an osmolyte and protects the sub-cellular structures by scavenging of free-radicals and optimizing the osmotic potential during abiotic stresses 46  www.nature.com/scientificreports/ stress as compared to Co 97010 and this is in confirmation with previous reports maintaining the higher relative water content in tolerant genotypes 19 . Soil-salinity exhibits higher osmotic potential in soil and creates physiological drought for plants 5 . The root protoxylem and metaxylem thickness also change during exposure to abiotic stresses 47 . In our study, Co 97010 showed a clear lignification of cell wall around the metaxylem constituting to the cell-wall thickening, vacuolization of the cortex and damaged protoxylem in Co 97010 under salinity stress as compared to normal root structures in IND99-907. The higher concentration of sodium in tissues interfere with the metabolic processes and causes cellular death 5 . The tolerant genotypes minimize the cytotoxic effect of sodium and other cations through ionic exclusion, compartmentalization of ions and maintenance of turgor potential through increased uptake of K + ions 5 . In our study, both EDAX and ICP-OES analysis showed the lower Na + /K + ratio in IND99-907 and the lower Na + /K + ratio is essential for the maintenance of optimal physiological activities such as stomatal movement, photosynthesis and transpiration under salinity stress 48 .
The genome-wide expression of genes for agronomic traits, biotic and abiotic stresses are studied through NGS based transcriptome sequencing in many crops including non-model crops 13 . The comparative transcriptome analysis was performed to identify the salt-responsive genes and metabolic pathways associated with the salt-tolerant E. arundinaceus genotype IND99-907 and sugarcane salt-sensitive genotype Co 97010. We have generated 107.71 GB of raw data and 80.28 GB of clean data from IND99-907 and Co 97010. We assembled the high-quality clean reads (Q30) into four different assemblies with the N 50 values of 1727 bp, 1506 bp, 1389 bp, and 1359 bp respectively for 907S, 907C, 97010S and 97010C, which is found to be superior or on par with the previous transcriptome studies in sugarcane such as cold transcriptome [23][24][25][26][27][28] . In our study, we identified 315,611 and 426,820 unigenes for both IND99-907 and Co 97010 respectively. The stringent computational protocol such as p-value adjustment or false discovery rate (FDR ≤ 0.01) based on the Benjamini-Hochberg method was adopted to control the false positives during differential gene expression analysis 36 . A total of 649 and 501 DEGs (FDR < 0.01) were identified from IND99-907 and Co 97010 respectively reveals the salt-responsive genes and metabolic pathways associated with salt tolerance in E. arundinaceus and sugarcane.
Transcription factors are the major driving factors in the regulation of gene expression and downstream activation of the cascade of metabolic pathways in biotic and abiotic stresses 49 . In our study, DREB1C was upregulated (log 2 FC = 8.70) in IND99-907 and, its overexpression in tobacco and Arabidopsis has enhanced the tolerance levels to salt and freezing stresses 50 . The DREB1H was upregulated (log 2 FC = 8.52) in IND99-907 and it is in confirmation with other crops for cold and salt stresses 51 . The overexpression of G-box-binding factor 3-like conferred the tolerance to salt stress in Arabidopsis 52 . MADS26, associated with meristem response and growth 53  The GO enrichment analysis is a statistical approach to identify the enriched or over-represented and depleted group of genes and to compare the functional profile of DEGs and metabolic pathways 79 . In our study, 'GO term: Response to stress' is significantly enriched in both IND99-907 (FDR = 1.60E-07) and Co 97010 (FDR = 0.031). It is in confirmation with the previous report of GO term: response to stress under salinity stress 80 . In salt-tolerant IND99-907, most of the enriched genes are associated with regulation of cell wall and cytoskeleton reorganization such as (i) Actin (log 2 FC = 21.48) which dynamically regulates the membrane permeability and cytoskeleton organization 81 ; (ii) Plasma membrane-type calcium-transporting ATPase 10 (log 2 FC = 20.56) regulating the calcium fluxes, plant immunity and developmental process 82 ; (iii) cellulose synthase A7 regulating the secondary cell wall synthesis 83 (iv) plasma membrane ATPase 1 (log 2 FC = 9.15) regulating the active proton transport and sodium extrusion under salt-stress and its upregulation has been observed to boost the salt tolerance in transgenic events 84 . We have validated two DEGs namely Plasma membrane-type calcium-transporting ATPase 10 and plasma membrane ATPase 1 through qRT-PCR in IND99-907. Whereas, the GO enriched genes in saltsensitive Co 97010 are majorly representing the metabolic pathways viz., (i) cytochrome c maturase subunit C (log 2 FC = 12.14) associated with electron transport system 85 , (ii) 2,3-bisphosphoglycerate-independent phosphoglycerate mutase-like (log 2 FC = 11.00) associated with glycolysis 86 ; (iii) cysteine proteinase 1 (log 2 FC = 9.52) associated with senescence, protein degradation and apoptosis 87 and (iv) Histidine kinase 4 associated with cytokinin signalling 88  (v) stress-inducible protein coi6.1 (log 2 FC = 8.84) associated with cold stress tolerance 94 (vi) ras-related protein RABE1d-like (log 2 FC = 8.77) is a cellular cargo located on golgi bodies and plasma membrane 95 . In our study, DEGs such as nucleobase-ascorbate transporter 3, clathrin interactor EPSIN-2, stress-inducible protein coi6.1 and ras-related protein RABE1d-like were validated through qRT-PCR in IND99-907. Whereas, the enriched genes in Co 97010 were more related to metabolic pathways such as (i) NADH dehydrogenase subunit 6 (log 2 FC = 13.99) associated with cellular respiration and mitochondrial respiratory chain 96 ; (ii) nitrate regulatory gene2 protein (log 2 FC = 9.51) regulating the nitrate transporter 97  www.nature.com/scientificreports/ that endomembrane and stress-related transport genes were enriched in IND99-907 and basic metabolic pathway related transport genes were enriched in Co 97010. KEGG enrichment analysis identifies the statistically significant enriched pathways from a set of differentially expressed genes from RNAseq data 58 . In this study, the KEGG enrichment analysis showed the significant enrichment of MAPK signalling pathway in IND99-907 (FDR = 0.0023) as compared to non-significant expression in Co 97010 (FDR = 0.2042). The sessile plants perceive the stresses by the cascades of mitogen-activated kinase-related pathways 59, 60 and activate the downstream regulatory genes. A total of 15 mitogen-activated kinases (MAPKs), five mitogen-activated kinase kinase (MAPKKs) and 16 mitogen-activated kinase kinase kinase (MAP3Ks) were regulating the biotic and abiotic stresses in sugarcane commercial varieties (Saccharum spp. hybrids) 61 . In our study, ABA signalling genes viz., MAP3K1 (log 2 FC = 8.04), and protein phosphatase-2C (PP2C) (log 2 FC = 7.54) were highly enriched in IND99-907. PP2C negatively regulates the ABA signalling through inhibition and phosphorylation of SNRKs 62 . MAP3K regulates the intra-and inter-cellular genes associated with the homeostasis of reactive oxygen species 63 and activation of ABA signalling pathways through phosphorylation of SNRK2 kinases 64 . The RAN1 copper-transporting P-type ATPase was downregulated in IND99-907, which is linked to the ethylene-response pathway 65 and negative regulation of ethylene response enhances the plant growth 66 . Whereas in Co 97010, the late responsive genes such as MAPK (log 2 FC = 3.36), serine/threonine-protein kinase SAPK1 (log 2 FC = 8.2), and PR1 (log 2 FC = − 2.06) are significantly enriched. These genes were previously reported for their expression under abiotic stresses in many cultivated crops 67,68 .
The KEGG enrichment pathway: Biosynthesis of secondary metabolites was significantly enriched in both salt-tolerant IND99-907 (FDR = 3.56E−10) and salt-sensitive Co 97010 (FDR = 8.40E−05). Secondary metabolites are essential for physiological adaptive responses in various stimuli 69 . Secondary metabolites such as anthocyanins, flavones, and phenolics inactivate the cytotoxic ions by binding to them, thus protecting the cells from ionic oxidative damage 70 . In our study, highly expressed genes involved in the pathway biosynthesis of secondary metabolites in IND99-907 were (i) DLAT2 of pyruvate dehydrogenase complex (log 2 FC = 20.43) regulating glycolytic and citric acid pathways, and its upregulation conferring tolerance to abiotic stresses is in confirmation with previous reports 71 ; (ii) glutamate decarboxylase (log 2 FC = 10.87) confer tolerance to multiple stresses by regulating the γ-amino butyrate content in plants 72 (iii) 3-ketoacyl-CoA synthase 11-like (log 2 FC = 10.85) gene regulates the elongation of fatty acid and confer tolerance to salt stress 73 (iv) aldose reductase (log 2 FC = 4.31) involved in biosynthesis of osmo-compatible compounds such as sorbitol 74 (v) lysine-ketoglutarate reductase/saccharopine dehydrogenase1 (log 2 FC = 4.41) regulating the signalling of abscisic acid, jasmonate, sugar and starch starvation related pathways 75 was enriched in IND99-907 and these genes are validated through qRT-PCR. The enriched genes in Co 97010 were also stress-related genes such as (i) acetyl-CoA acetyltransferase, cytosolic 1-like isoform X2 (log 2 FC = 8.60) associated with drought tolerance 76 ; (ii) chloroplastic solanesyl-diphosphate synthase 3 (log 2 FC = 8.52) associated with prenylquinones 77 ; (iii) UDP-glycosyltransferase (log 2 FC = 8.31) involved in glycosylation or inactivation of aglycones such as hormones, secondary metabolites, and its overexpression confer tolerance to abiotic stress tolerance 78

Conclusion
In summary, a comparative analysis of transcriptome was carried out between two genotypes Erianthus arundinaceus genotype IND99-907 and salt-sensitive sugarcane genotype Co 97010 to elucidate the salt-responsive genes and metabolic pathways associated with salt-tolerance. Our findings showed clear and distinct metabolic pathways and genes involved in pathways such as MAPK signal transduction, plant hormone signalling and biosynthesis, secondary metabolite biosynthesis and hormones, biosynthesis of amino acid, transporters, and many other pathways were significantly enriched. The gene ontology enrichment analysis also revealed enrichment of many stress-related biological processes such as response to abiotic stimulus, stress, stimulus and endogenous stimulus, generation of precursor metabolites and energy, and many other biological processes. Our study has shown the metabolic pathways and genes linked with salt tolerance in IND99-907 at the transcriptome level. Further studies are required to overexpress the differentially expressed candidate genes from IND99-907 in the background of sugarcane cultivars through biotechnological approaches and development of functional markers associated with salinity tolerance.

Materials and methods
Plant materials and treatment. Salinity-tolerant Erianthus arundinaceus genotype IND99-907 collected from the saline lands from Ernakulum, India 29 and salinity-sensitive sugarcane genotype Co 97010 30,31 were used in this study. Single buds from each genotype with five replications were grown in pots containing river sand: red soil: farmyard manure in 1:1:1 proportion. The experiment was conducted in the polyhouse facility with a 16-h photoperiod, 30/25 °C day/night, 380 − 400 μmolm −2 s −1 . Salt stress was imposed during the formative stage on the 90 th day after planting by irrigating with salinized water containing NaCl, CaCl 2, and Na 2 SO4 in the ratio of www.nature.com/scientificreports/ 2:2:1 30 . Salinity stress was imposed by adding 25 mM of saline water on the first day followed by incrementally adding 25 mM on every day till reaches 175 mM 100 .
Physiological assay of IND99-907 and Co 97010 under salinity stress. The young leaf elongation was measured at an interval of 3, 6, 9, 12 and 15th days after imposing the salinity stress 19 . Samples were collected, cut into 10 cm segments and leaf fresh weight was weighed. The leaves were soaked overnight in distilled water. After soaking, excess water on the leaf surface was removed and turgid weight was quantified. The leaves were oven-dried at 37 °C and dry weight was recorded and RWC was calculated 101 . Proline was estimated by the colourimetric method from the root samples 102 .
To assess the cellular changes in the root cortical and vascular tissues, root samples of 5 cm from the root tip were sampled from the 16 th day of stressed and control treatments. The hand sections were made, mounted on the slide and the observed image was captured from the light microscope under 4.5X and 40X resolutions. The ultrastructure of cellular changes in selected genotypes was analysed using a Scanning Electron Microscope (FEIQuanta250, Icon analytical, FEI, USA). Root sections of 4-5 mm thick were gold coated with a sputter coater and mounted on round aluminum stubs with the aid of double side adhesive tape. The samples were scanned and selected regions were photographed. The elemental distribution of Na + , K + , Ca 2+ and Mg 2+ in the root sections were analysed using a line scan of EDX.
For ionic analysis in roots, the oven-dried and finely grounded roots (100 mg each) were digested with 10 ml of HNO 3 :HClO 4 (3:1) di-acid mixture for overnight. The flasks were gently heated on a hot plate till the development of intense white fumes and the process continued till the digested mixture remains transparent. Leftover digested material made the final volume of 50 ml by adding distilled water. This solution was filtered with Whatman filter paper no. 42 and the extract were used for estimation of Na + , K + , Ca 2+ and Mg 2+ content by ICP-OES (ICP 9000, Shimadzu, Japan).
Transcriptome sequencing. The gene expression patterns are differing with respect to the duration and concentration of exposure to salt stress. The isolation of short-term salt responsive gene requires few hours of exposure to salt stress and long-term salt stress gene isolation requires exposure longer duration for about a week or more. The gene expression patterns are also differing for salt shock and gradual incremental increase in salt stress. The maximum number of genes are expressed within a few hours under salt shock and after a week under gradual exposure to salt stress 100 . Hence, our study is aimed at isolation of long-term salt responsive genes under gradual exposure to salinity stress and hence, we opted incremental increase of salt stress of 25 mM till the final concentration reaches 175 mM and the total RNA from stressed and control samples of IND99-907 and Co 97010 were sampled after a week.
Total RNA was isolated from control and salt-stressed roots sampled on 10 th day of salinity stress from IND99-907 and Co 97010 with three independent biological replicates. Samples were prepared using Tru-Seq RNA Sample Preparation Kit v2 (Illumina Catalog No RS-122-2001) as per the manufacturer instructions. The RNA is fragmented into small pieces using divalent cations, 3' ends were adenylated to prevent the reads from selfligation and both 5' and 3' adapters were ligated. PCR amplification was carried out to generate the final cDNA libraries. The cDNA libraries of all twelve samples were quantified using Qubit and Nanodrop and validated by Agilent 2200 TapeStation System. The samples with RNA integrity number more than seven were sequenced by using Illumina HiSeq2500 with 2 × 100 bp in paired-end format.
Bioinformatics analysis. Raw transcriptome sequences were quality checked using the tool FastQC (http:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc/). Poor quality reads were removed using Trimmomatic 103 with a threshold Phred score of ≥ 30 (Q30) and a minimum length of ≥ 80 following the removal of the adapter sequences 104 . SILVA database (release 132) (https:// www. arb-silva. de/) was used to remove the rRNA present in the sample by alignment using bowtie2 105 . For de-novo assembly, high-quality, adapter-free Q30 reads of all three biological replicates of four conditions were pooled together to generate four clustered de novo assemblies for both control and stressed samples of IND 99-907 and Co 97010 by using Trinity assembler v2.11.0 (https:// github. com/ trini tyrna seq/ trini tyrna seq/ wiki) 106 .
The quality parameters of de novo assemblies were checked by using two approaches, (i) Quast 5.0.2 107 to obtain assembly statistics and (ii) BUSCO v4.1.2 (Benchmarking Universal Single-Copy Orthologs) analysis. Unigenes clustering was done based on 95% identity threshold using CD-HIT v4.8.1 [108][109][110] with parameters "cdhit-est -c 0.95, -n 8" using the following ways, (i) for all four assemblies (containing isoforms) separately (ii) Control and Stressed unigene datasets were merged to generate a single dataset for every genotype (two unigene datasets for IND 99-907 and Co 97010) (iii) Further, unigenes clustering was performed to remove duplicate sequences to generate two reference unigene datasets for IND 99-907 and Co 97010 respectively. TransDecoder tool (https:// github. com/ Trans Decod er/ Trans Decod er/ wiki) was utilized to recognize the coding DNA sequences (CDS) and the long ORFs in all four unigene assemblies.
Validation using qRT-PCR. A total of 43 candidate DEGs were selected based on their involvement in the enriched stress-related pathways and biological processes in both stressed and control transcriptome data of IND99-907 and Co 97010 were used for validation by qRT-PCR (Supplementary Table S7). PrimerQuest tool 117 was used to design the primers for conserved domains identified by Pfam domain search by SMART tool 118 . The total RNA was reverse transcribed by using PrimeScript 1 st strand cDNA synthesis Kit (Takara, Catalog No.: 6110A). The 1.0 µg of total RNA from each sample taken from stressed and control samples was used for cDNA synthesis and total cDNA was synthesized as per the manufacturer instructions. The qRT-PCR validation was carried out using StepOne Real-Time PCR System (Applied Biosystems, Canada) and each gene was validated using three biological and two technical replicates. GAPDH gene expression was used as the internal control 23 . The qRT-PCR Reaction Mix was prepared using TB Green® Premix Ex Taq™ (Tli RNase H Plus) Master Mix (Takara, Catalog No.: RR420A). The 20µL qRT-PCR mix consisted of SYBR green master mix (10µL), ROX reference dye (0.4µL), cDNA (1.0µL or 50 ng), forward and reverse primers (0.40 µl each) and molecular grade sterile water 23,119 . The initial denaturation at 95 °C for 10 min, followed by 40 cycles of denaturation at 95 °C for 15 s and, annealing and extension at 60 °C for 60 s. Relative fold changes of gene expressions were determined using the 2 −ΔΔCT method 120 .

Statement of compliance. Experimental material involves the research between sugarcane wild relative
Erianthus arundinaceus and modern sugarcane genotype (Saccharum spp) is in compliance with institutional, national, and international guidelines and legislation. www.nature.com/scientificreports/