De novo transcriptome analysis of the critically endangered alpine Himalayan herb Nardostachys jatamansi reveals the biosynthesis pathway genes of tissue-specific secondary metabolites

The study is the first report on de novo transcriptome analysis of Nardostachys jatamansi, a critically endangered medicinal plant of alpine Himalayas. Illumina GAIIx sequencing of plants collected during end of vegetative growth (August) yielded 48,411 unigenes. 74.45% of these were annotated using UNIPROT. GO enrichment analysis, KEGG pathways and PPI network indicated simultaneous utilization of leaf photosynthates for flowering, rhizome fortification, stress response and tissue-specific secondary metabolites biosynthesis. Among the secondary metabolite biosynthesis genes, terpenoids were predominant. UPLC-PDA analysis of in vitro plants revealed temperature-dependent, tissue-specific differential distribution of various phenolics. Thus, as compared to 25 °C, the phenolic contents of both leaves (gallic acid and rutin) and roots (p-coumaric acid and cinnamic acid) were higher at 15 °C. These phenolics accounted for the therapeutic properties reported in the plant. In qRT-PCR of in vitro plants, secondary metabolite biosynthesis pathway genes showed higher expression at 15 °C and 14 h/10 h photoperiod (conditions representing end of vegetative growth period). This provided cues for in vitro modulation of identified secondary metabolites. Such modulation of secondary metabolites in in vitro systems can eliminate the need for uprooting N. jatamansi from wild. Hence, the study is a step towards effective conservation of the plant.

Nardostachys jatamansi (D.Don) DC. of family Caprifoliaceae is an important but critically endangered medicinal herb inhabiting specific ecological niches of mostly inaccessible locales of alpine and sub-alpine Himalayas (2200 to over 4800 m amsl at 40-70° incline). The plant propagates vegetatively through rhizomes in nature 1,2 . The rhizome not only supports flowering and sexual reproduction at the end of growth period but also readies the plant for opportunistic growth, immediately, after the return of favourable season. Importantly, it helps the plant persist in its specific ecological niche of alpine habitats. Aerial parts comprised of a rosette of elongated leaves emerge directly from the rhizomes every year during favourable seasons to subsequently form a dense clump of mature clonal plants. Flower bearing shoots emerge from rhizomes and set seeds during late August (month representing the end of favourable season). After flowering and wind dispersal of seeds to new locations during September, the aerial parts of the plant undergo senescence but the perennating rhizome continues to grow and fortify itself. Even during this time, new shoot bud meristems develop from the rhizomes as preparedness for fresh vegetative growth in the next or the following season of favourable conditions. Apparently, diverse functions of growth and fortification occur simultaneously in the rhizomes during the end of favourable growth period. Surely, an underlying complex machinery govern all these processes leading to the biosynthesis of an array of valuable metabolites. However, there is no information available on N. jatamansi to this effect. Hence, de novo transcriptome analysis of the plant parts at flowering stage was undertaken to understand the genetic Results Transcriptome sequencing and de novo sequence assembly. The Illumina GA IIx sequencing platform was used for de novo transcriptome of N. jatamansi at flowering stage. After sequencing, 13.92 and 9.53 million raw reads (a total of 23.45 million raw reads) were obtained from leaves and rhizomes, respectively. After filtering out the adapter sequences, the reads of low quality and the reads of short length, a total of 13.87 and 9.27 (23.14) million reads of high quality were obtained from the leaves and rhizomes, respectively (Supplementary Table 1). When these were assembled de novo using Trinity (default k-mers i.e., 25) and clustered using CD-HIT, a total of 35,777 and 32,491 transcripts from leaves and rhizomes, respectively were obtained. Finally, these were assembled de novo into 48,411 unigenes. The average contig length of the reads was 742.6 bp ± 590.4 and 731.2 ± 631.1 and their average N50 value was 1110 and 1101 bp in leaves and rhizomes, respectively. The statistical summary of the data is provided in Supplementary Table 1, and the length distribution of the clustered transcripts in Supplementary Fig. 1.
After similarity search of sequences, a total of 74.45% transcripts were annotated functionally using UNI-PROT database. However, a number of unigenes could not be annotated functionally. The e-value distribution of the annotated transcripts is given in Supplementary Fig. 2. Functional annotation of unigenes. The unigenes obtained above could be categorized into biological process, cellular component and molecular function according to GO analysis. Out of 74.76 and 74.14% transcripts of leaves and rhizomes, respectively, a total of 24,292 and 26,527 were assigned to either one of the GO categories. Of these, 14,116 unigenes of the two tissues were assigned to biological process term (1496), 14,310 unigenes under 440 cellular component term and 26,565 unigenes under 1332 molecular function term. The top 20 classes from each category are represented in Fig. 1. Most of the genes belonged to regulation of transcription, metabolic process, carbohydrate metabolic process, transport related terms, defence response and signal transduction under the 'biological process' . Under 'cellular component' , integral component of membrane, nucleus, cytoplasm, ribosome, intracellular and plasma membrane were the predominant categories. Under 'molecular function' however, ATP, zinc ion, DNA, nucleic acid and metal ion binding as well as protein kinase activity and RNA binding were amongst the top 10 major classes.
On homology search for species similarity, the transcripts of N. jatamansi showed maximum similarity with Vitis vinifera followed by Daucus carota subsp. sativus, Coffea canephora and Cynara cardunculus var. scolymus. The top 20 species having highest similarity with N. jatamansi are represented in Supplementary Fig. 3. Interestingly, only 11 transcripts were common to both N. jatamansi and Valeriana officinalis, the other genera of family Caprifoliaceae.
Among the sequences searched against KEGG pathway database, a total of 4985 and 4971 transcripts were assigned to 137 and 140 pathways in leaves and rhizomes, respectively (Supplementary Table 3). The most abundant pathways in the leaves and rhizomes were ribosome comprising of 258 and 261 genes followed by carbon metabolism (172 genes of each tissue), spliceosome (172 and 181), biosynthesis of amino acids (169 and 171 genes) and protein processing in endoplasmic reticulum (146 and 136 genes). In the top 20 KEGG pathways, significantly higher number of unigenes of ribosome, spliceosome, carbon metabolism, biosynthesis of amino acids, protein processing in endoplasmic reticulum, RNA transport, purine metabolism, plant-pathogen interaction, plant hormone signal transduction and oxidative phosphorylation were recorded ( Supplementary Fig. 4). In addition, transcripts of 41 and 43 different primary and secondary metabolite biosynthesis pathways were identified in the leaves and rhizomes, respectively. The distribution of the transcripts in the different metabolic pathways of leaves and rhizomes is depicted in Supplementary Table 3. SSR identification. After SSR prediction analysis using MISA perl script, a total of 7240 and 7458 SSRs were recorded in 5800 (22.28%) and 6032 (20.84%) transcripts of leaves and rhizomes, respectively. Based on SSR distribution and mining of transcripts, a total of 1138 and 1123 transcripts having more than one SSR were identified in leaves and rhizomes, respectively. The most abundant SSR repeat types included the mononucleotides (35.73 and 33.66%) followed by dinucleotides (32.59 and 33.45%) in leaves and rhizomes, respectively ( Supplementary Fig. 5). www.nature.com/scientificreports/ Abundance analysis of tissue specific differentially expressed genes (DEGs). The transcripts having log 2 fold change (FC) value above 2 and less than − 2, and p-value below 0.5 in edgeR software were considered (Supplementary Table 4). On analysing the differential expression in pair-wise comparison, 4177 and 4810 transcripts were upregulated in leaves and rhizomes, respectively. Among these however, 1403 from leaves and 1784 from rhizomes could not be annotated functionally. Moreover, 1612 and 1632 were uncharacterized in leaves and rhizomes, respectively (Supplementary Fig. 6). In rhizomes, the top enriched classes in 'biological process' belonged to organonitrogen compound metabolic process (GO: 1901564) and related terms, organonitrogen compound biosynthetic process (GO: 1901566), peptide metabolic process (GO: 0006518), cellular amide metabolic process (GO: 0043603), amide biosynthetic process (GO: 0043604), peptide biosynthetic process (GO: 0043043), cellular carbohydrate metabolic process (GO: 0044275), cellulose catabolic process (GO: 0030245), beta-glucan catabolic process (GO: 0051275), betaglucan metabolic process (GO: 0051273) and cellular polysaccharide metabolic process (GO: 044247). In addition, peptide biosynthetic process, response to oxidative stress (GO: 006979), cell wall organization or biogenesis, secondary metabolic processes (GO: 0019748), phenylpropanoid and lignin metabolism (GO: 0009698, GO: 0009808) were significantly enriched.
Identification of transcription factors. In Plant TFDB Homology search (E value cut off of 10 -5 ), a total of 9982 and 10,655 transcripts belonging to 57 TF families were recorded in leaves and rhizomes, respectively. Of these, the bHLH family was the most dominant (10.45 and 10.27%). This was followed by NAC (7.12 and 7.27%), MYB-related (6.59 and 6.86%), ERF (4.68 and 5.04%) and C 2 H 2 (5.31 and 4.87%) families in leaves and rhizomes, respectively. The top 20 families of transcription factors are represented in Fig. 2  Protein-protein interaction (PPI) network. A total of 318 proteins encoded by DEGs related to plant growth and stress response were identified in transcriptome analysis. These were mapped against STRING database of Arabidopsis thaliana and a network of 179 nodes and 472 edges at PPi enrichment p-value < 1.0e −16 were recorded. The interaction network revealed significant enrichment of GO terms with 177 under biological  Table 6). A strong interaction between the light harvesting genes and the genes of photosynthesis, secondary metabolism (i.e., terpenoid and phenylpropanoid) and genes of stress response i.e., dehydration responsive element binding transcription factor (DREB), heat shock proteins (HSPs) was recorded. Red coloured module represented interactions mainly, in two large clusters followed by 3-4 small clusters of interaction networks. Cluster one showed strong interaction between DAPs of light harvesting complex with DAPs of photosystems. Cluster two revealed multiple strong interactions with the DAPs like 1-deoxy-d-xylulose 5-phosphate reductoisomerase (DXR), mevalonate kinase (MK), phytoene synthase (PSY), omega-6-fatty acid desaturase (FAD6), cinnamate 4-hydroxylase (C4H), geranylgeranyl reductase (GGR), geranylgeranyl pyrophosphate synthase (GGPS1), 9-cis-epoxycarotenoid dioxygenase 4 (NCED4), carotenoid cleavage dioxygenase 7 (CCD7), ABA1 involved in terpenoid, carotenoid and ABA biosyntheses. Blue colored module revealed strong interactions between the 18 DAPs which included sedoheptulose-1,7-biphosphatase (SBPASE), malate dehydrogenase, pyruvate dehydrogenase, phosphoribulokinase, phosphoglycerokinase, phosphofructokinase, fructose biphosphate aldolase, lipoxygenase, aldehyde dehydrogenase (ALDH), glucose 6-phosphate dehydrogenase 4 (G6PD4), and HSPs. Green colored module revealed strong interactions between 22 DAPs of mainly phenylpropanoid biosynthesis pathway and included hydroxycinnamoyl transferase (HCT), phenylalanine ammonia-lyase 1 (PAL1), FAD, C4H, caffeoyl-CoA O-methyltransferase (CCOAMT), chalcone-flavonone isomerase (CHI), laccase, omega-3 desaturase, DREB and ribulose biphosphate carboxylase (RBCL). The RBCL of green module showed strong multiple interactions with ascorbate peroxidase of the same nodule and also with DAPs of photosynthesis of red and phosphoribulokinase of blue module. Lipoxygenase of blue colored module showed interaction with allene oxide cyclase (AOC), jasmonate O-methyltransferase (JMT) and 12-oxophytodienoate reductase (OPR) of red module. The AOC further showed strong interaction with JMT on one hand and OPR on the other. Based on the differentially expressed genes identified in the leaves and rhizomes of N. jatamansi (Supplementary Table 7), a probable genetic machinery underlying various processes (photosynthesis, carbon metabolism, transport of assimilates, flowering, rhizome growth and fortification, stress response and secondary metabolite biosynthesis) occurring in the plant during the end of growth period was proposed (Fig. 4).
qRT-PCR validation of RNA-sequencing data. The reliability of the RNA-seq data was confirmed through qRT-PCR expression analysis of 15 genes (Fig. 5A). The genes of light harvesting such as those encoding PSI, PSII and CAB protein, the water stress related, 44 kDA dehydrin, the flowering related CO2a and KOS, and TPS9 and terpene cyclase of terpenoids biosynthesis were up-regulated in the leaves. The expression pattern of these genes matched perfectly with that of RNA-seq data confirming the simultaneous operation of various processes like light harvesting, flowering and biosynthesis of secondary metabolites during August, the month when the vegetative growth ended. Similarly, perfect matches of the upregulated expression of the bidirectional SWEET transporter, the stress related, dehydrin 2 and the secondary metabolite genes, DXS1, FPS, COMT, α-terpineol and phytoene synthase confirmed the transport of leaf photosynthates to rhizomes and consequent biosynthesis of phenolics, carotenoids and terpenoids in the organ.
Validation of tissue-specific-expression-of-genes involved in growth reproduction and secondary metabolite synthesis in in vitro grown plants. Environment and tissue specific expression of various genes involved in growth, reproduction and secondary metabolite synthesis was evident from qRT-PCR analysis of in vitro plants growing at two different photoperiods and temperature regimes (representing the peak and end of growth periods). While Photosynthesis II protein D1 and bHLH010 were significantly up-regulated in leaves, COMT, Patatin, Feronia, AP2-ERF and Dehydrin 2 were up-regulated in below-ground parts (Fig. 5B).  UPLC-PDA analysis of phenolic acids. A total of 6 and 5 phenolics were recorded in the roots of in vitro raised plants at 15 and 25 °C, respectively. Only 3 phenolics i.e., gallic acid, p-coumaric acid and rutin were recorded in the leaves at both 15 and 25 °C (Table 1 and Supplementary Fig. 8).
With respect to temperature and plant part, the phenolic contents of roots were higher at 25 °C. Interestingly, the contents of p-coumaric and cinnamic acids were very high at 15 °C but they remained undetected at 25 °C. In case of leaves, the contents of gallic acid (16.98 ± 0.03 µg/100 mg FW) and rutin (27.33 ± 0.62 µg/100 mg FW) were significantly higher at 15 ºC, while p-coumaric acid content was at par at both 15 and 25 ºC (7.62 ± 0.01 and 7.49 ± 0.00 µg/100 mg FW, respectively). Irrespective of temperature, caffeic, ferulic and cinnamic acids, and kaempferol remained undetected in the leaves.

Discussion
As envisaged, the de novo transcriptome analysis of leaves and rhizomes of flowering plants of N. jatamansi confirmed the simultaneous operation of various processes like photosynthesis, carbon metabolism, transport of assimilates, flowering, rhizome growth and fortification, stress response and also secondary metabolite biosynthesis during August, the end of growth period (Fig. 4). In the study, DEGs of photosynthesis, generation of precursor metabolites and energy metabolism in leaves; but DEGs of organonitrogen metabolism, cellular carbohydrate metabolism, cellulose and glucan catabolism, cellular polysaccharide metabolism, lignin and peptide biosynthesis, stress response and biosynthesis of metabolites related to phenylpropanoid pathway in rhizomes were significantly enriched in GO analysis ( Fig. 1 and Supplementary Fig. 7a-f). Even KEGG analysis revealed the enrichment of carbon and amino acid metabolism, signal transduction and stress response ( Supplementary  Fig. 4). The dominance of TF families related to growth, flowering, stress response and secondary metabolism also indicated the simultaneous operation of all these processes during the end of growth period. It was quite evident from these results that the assimilates produced in the leaves were being transported to the below-ground parts to support various processes required for the adaptation of N. jatamansi to freezing stress during winter. In PPi network also, a strong interaction between the proteins of metabolic pathways like photosynthesis, carbon and fatty acid metabolism and secondary metabolite biosynthesis was recorded.
In RNA-seq data as well as qRT-PCR expression analysis, several genes of light harvesting (chlorophyll a-b binding and also photosystems I and II reaction centre proteins) were highly up-regulated in the leaves (Fig. 5A). It was evident that even during the end of growth period in late August, when the air temperatures and the daylight hours started declining, the leaves harvested maximum light for carbohydrate biosynthesis, transport and  www.nature.com/scientificreports/ allocation of sucrose to plant parts, and finally the conversion of sugars into storage reserves in below-ground parts. In addition to light harvesting, genes involved in carbohydrate synthesis (sedoheptulose-1,7-bisphosphatase, ribulose phosphate 3-epimerase and sucrose synthase) and metabolism (glucose-1-phosphate adenylyl www.nature.com/scientificreports/ transferase and galactan beta-1,4-galactosyl transferase) were up-regulated in the leaves. The up-regulation of the bidirectional SWEET transporter, the sugar transporter, ERD6-like 16 and glucan synthase like-3, invertase and trehalose 6-phosphate phosphatase in the rhizomes indicated transport of sugars to below-ground parts probably for their accumulation, storage as well as fortification during August, the end of growth period.
Since the plants tested in the present study were in a state of flowering, a number of key unigenes related to all the four flowering pathways (i.e., GA regulated, photoperiodic, autonomous and vernalization) were up-regulated in N. jatamansi plant parts. This indicated the participation of both the leaves and rhizomes in the process of flowering. In this regard, Gibberellin regulated protein 3, Gibberellin beta hydroxylase and Gibberellin 2-beta dioxygenases of gibberellin mediated and Phytochrome A of photoperiodic regulation of autonomous pathways were upregulated in the rhizomes. While gibberellin 2-oxidase involved in GA biosynthesis was upregulated in both leaves and rhizomes, putative CONSTANS interacting protein 2a-like (CO) and the zinc finger CONSTANS-LIKE 2 protein of photoperiodic pathway were upregulated in the leaves only. GA related genes interact with floral integrators, while CO activates FT 5 . Each of these separately interact with the floral meristem identity genes namely, AP2, Sepallata and Agamous for flowering [6][7][8] . Amongst these however, Sepallata 4 (SEP4) and Agamous like-15 (MADS-box protein) were upregulated in the leaves, whereas, only AP2 was upregulated in rhizomes. Moreover, the Flowering Promoting Factor 1, bZIP transcription factor 11 and Feronia in rhizomes but BEL1-like homeodomain (BLH/BELL) and YABBY in leaves were upregulated.
Despite supporting the costly process of flowering, leaves appeared to nurture rhizome growth and fortification. In this regard, the MADS-box transcription factors known to promote storage organ development 9,10 and the BEL1-like homeodomain (BLH/BELL) known to govern meristem formation, tuberization, rhizome growth, flowering and also ovule morphology 11,12 were upregulated in leaves. Genes of fatty acid metabolism i.e., omega-6 fatty acid desaturase and jasmonic acid methyl transferase in leaves, 3-ketoacyl-CoA synthase, allene oxidase cyclase and 12-oxophytodienoate reductase, fatty acyl-CoA reductase, glycerol-3-phosphate 2-o-acyltransferase 4 and glycerol-3-phosphate dehydrogenase in rhizomes, and omega-3 fatty acid desaturase and lipoxygenase in both leaves and rhizomes were up-regulated. Fatty acids and lipids are usually stored in perennating belowground parts of alpine plants for adaptation to cold stress and also for opportunistic growth during the following favourable season after winter [13][14][15] . Many of these genes, particularly, 12-oxophytodienoate reductase participate in signalling via jasmonic acid for acclimation of plants under cold stress 16 . Signalling mediated adaptation to alpine habitat was also evident from the up-regulation of Tir-nbs resistance protein in N. jatamansi rhizome. Stress responsive genes such as HSP20, 70, 90, 24, expansin, patatin and Dof 2.1 were also up-regulated in rhizomes. The heat shock proteins, expansin, patatin and also the transcription factors namely, Dof 2.1, ERF, C2H2 and DREB are known to impart cold/freezing stress tolerance in plants [17][18][19] .
Partitioning of a part of the leaf photosynthates (erythrose 4 phosphate) towards the biosynthesis of shikimic acid and phenyl propanoids was indicated by the upregulated unigenes of these pathways in both leaves and rhizomes. The up-regulation of the unigenes of glycolysis and those of MEP and MVA pathways indicated a distinct diversion of the end products of glycolysis towards the biosynthesis of fatty acids, terpenoids, phytohormones (abscisic acid and gibberellins) and the signalling molecule, jasmonic acid (Supplementary Table 7).
Interestingly, genes of only monoterpene biosynthesis in leaves (> 10 log2 fold change), but di and sesquiterpene in rhizomes were highly up-regulated (> 7 log2 fold change). Genes of squalene epoxide (SE1), PSY, neoxanthin epoxidase (NXS), NCED1 and also those of polyphenols, flavan-3-ols and anthocyanins biosynthesis in rhizomes, but PSY, zeaxanthin epoxidase (ZEP) and violaxanthin de epoxidase (VDE) in leaves were upregulated ( Fig. 4; Supplementary Table 7). SE1 is known for its role in the biosynthesis of saponins, sterols and brassinosteroids 20,21 , whereas, those of PSY leads to carotenoid biosynthesis 22 . Moreover, this, along with ZEP, VDE, NXS and NCED lead to the biosynthesis of ABA 23,24 . Based on these observations, a scheme on simultaneous utilization of leaf photosynthates for various processes was developed. The qRT-PCR analysis of in vitro plants growing at two different temperatures and photoperiods (representative of peak and end of growth periods) validated the RNA-seq data. It also revealed invariably higher gene expression during declining air temperature and photoperiod (Fig. 5C). Moreover, environment-and tissue-specific expression of genes related to growth, reproduction, stress response and secondary metabolite biosynthesis were upregulated during the end of growth period (Fig. 5B).
UPLC-PDA analysis of the phenolic acids also revealed their tissue and temperature specific distribution (Tables 1 and 2). Invariably, higher number of phenolics were recorded in the roots at 15 and 25 ºC as compared to leaves, thereby, indicating a higher inclination of secondary metabolites towards underground parts as compared to aerial parts. In case of gene expression analysis (RNA-seq data) also, higher number of genes related www.nature.com/scientificreports/ to phenylpropanoid pathway were upregulated in below-ground parts. The phenolics recorded in the study are known to impart protection against UV radiation and pathogens in plants 25 . Their upregulation in N. jatamansi is probably an important requisite for their growth and survival in the plant's native alpine habitats. Thus, p-coumaric and cinnamic acids known for their role in lignin biosynthesis were significantly high in the roots at 15 °C. Nardostachys jatamansi is known to have a wide array of medicinal properties that had remained unaccounted for till date 3 . Up to now, terpenoids mainly, sesquiterpenes were considered to be the major group of secondary metabolites responsible for the various medicinal properties of N. jatamansi. However, the phenolic compounds which were identified in the plant in the present study are known to have a number of health endowing properties in humans (Table 2). Thus, it is apparent from these findings that many of the therapeutic properties of N. jatamansi are probably attributable to the phenolic compounds recorded in this study. Probably, these phenolic compounds account for the various pharmacologically important properties reported in N. jatamansi.

Conclusion
The study is the first report on the de novo transcriptome analysis of N. jatamansi. The assumption that even during the costly process of flowering at the end of growth period, leaf photosynthates are utilized for the fortification of rhizomes as adaptive preparation for winter was confirmed. The study also revealed several tissue-specific-secondary-metabolites that have not been reported till date in N. jatamansi. The biosynthesis pathway genes of important secondary metabolites identified in the study, showed invariably higher expression at lower temperature (15 °C) and lesser light hours. UPLC-PDA analysis also revealed tissue and temperature specific differential distribution pattern of phenolics in leaves and roots at different temperatures, with a clear bias towards the underground parts. In addition, the phenolics recorded in the plant accounted for a range of medicinal properties of N. jatamansi. These findings provided cues for higher secondary metabolite synthesis in alternative in vitro systems. Since, it can eliminate the need for uprooting N. jatamansi from wild, the study is a step towards effective conservation of this high value, critically endangered alpine herb.

Materials and methods
Plant material. Nardostachys jatamansi plants growing at Tandi, Lahaul & Spiti, Himachal Pradesh, India (latitude 32° 33′ 53″ N; longitude 76° 58′ 05″ E and altitude 3040 m amsl) were selected for the study. Leaf as well as rhizome samples of the plant were collected during August (peak flowering) and frozen immediately in liquid nitrogen. These were then stored at − 80 °C for further use.
RNA library preparation. The iRIS method of Ghawana et al. 26 was used to extract total RNA from the above-mentioned samples (three biological replicates of each of leaf and rhizome sample). While the quality and quantity of the total RNA were analyzed using Nanodrop spectrophotometer, a bioanalyzer (Agilent 2100 technologies, USA) was used for further evaluation.
Transcriptome sequencing. Equal amounts of RNA from leaf and rhizomes (biological samples) in triplicate were pooled together. Finally, 5.0 µg RNA from the pool was processed for library preparation using TruSeq RNA sample Prep Kit v2 (Illumina Incorporation, USA) according to manufacturer's instructions manual. The libraries were quantified using Qubit dsDNA BR assay kit for Qubit 2.0 Fluorometer (Life technologies, USA). After checking the insert size of the libraries using a bioanalyzer DNA 1000 chip, 10 pM of each library was loaded onto the flow cell using TruSeq PE Cluster Kit v5 on cluster station (Illumina Inc., USA). The flow cell containing the clonally amplified clusters was then loaded onto the Genome Analyzer IIx (Illumina) for paired- www.nature.com/scientificreports/ end (PE) (2 × 72) sequencing. The GERALD base-calling (a CASAVA package total provided by Illumina) was used to transform the raw sequencing data into 'single end' (SE) 72 bp reads. The resulting sequence reads were stored in FASTQ format and the raw sequences submitted to NCBI SRA database (Accession no. PRJNA608834).
After checking the quality of the sequence reads by Fast QC 27 , the quality reads were filtered using the Cutadapt-1.8.3 tool 28 . While the adapter sequences, reads of low quality and also very short lengths were removed, the low quality bases were trimmed at 3′-end.
De novo data assembly. Multiple assembly tools were checked for de novo assembly of transcriptome data but the Trinity based assembly with K-mer of 25 was selected on the basis of N50 value, number of contigs and read usage (> 200 bp) 29 . The assembled transcripts were then clustered on the basis of 95% similarity between sequences. The reads obtained from each sample were merged using CD-HIT 4.5.4 30 . Unigenes were generated on the basis of best balance between the number of contigs produced, average coverage, N50 value of total assembly and average sequence length. Redundant sets of sequences were removed by filtering, and similar sequence stretches merged using CD-HIT-EST based sequential and hierarchical clustering. CD-HIT-EST with 95% similarity cut-off was also used for the reduction of redundancy between similar types of contigs.
Functional annotation of the assembled transcriptome. Quality transcripts obtained as above were functionally annotated using multiple databases like Uniprot 31 , Gene Ontology (GO), KAAS 32 and MISA 33 . BLAST-2.5.0 30 was also used for annotations of transcripts. Homology search was performed against Viridiplanteae datasets from Uniprot database. E-value distribution and species similarities were also performed. Considering the dicot model organisms as 'reference' for pathway identification, the KAAS server was used for KEGG orthology assignment of transcripts.

SSR prediction.
From among the assembled sequences, the status of SSRs in the transcript sequences were predicted using MISA Perl 33 because of the software's ability to predict 'perfect SSRs' as well as 'compound SSRs with spacer sequences, un-interruptedly' . The Fasta files containing the assembled sequences were then used as input files. The minimum number of repeats for microsatellites (unit size/minimum number of repeats) were specified as 2/6, 3/5, 4/5, 5/5 and 6/5; and a variable of 100 bp was set to specify the maximum length of the spacer sequences.

Differential expression analysis.
Unigenes representing the assembled transcripts of each biological sample was generated by CD-HIT clustering, and sequences with < 200 nucleotides were omitted by filtering. Bowtie2 was used to separately map the reads from each plant part onto unigenes 34 . The expression of unigenes were also quantified. The sequencing bias among the biological samples was avoided by edgeR in pair-wise comparisons 35 . The differentially expressed genes between rhizomes and leaves were finally analysed and genes having log 2 fold change value of more than '2' were considered as upregulated. Genes with log 2 fold change less than '2' were considered as down-regulated.
GO enrichment analysis. GO  Quantitative real time PCR. Based on the above findings, genes related to growth and development, secondary metabolite biosynthesis and adaptation were selected for the validation of RNA-seq data. Total RNA from triplicate samples of each of leaf and rhizome were isolated as described above. The Verso cDNA synthesis kit (Thermo Scientific, USA) was used to synthesise the first stand cDNAs. Primers for each of the selected genes were designed using primer express 3.0 software (Supplementary Table 8 www.nature.com/scientificreports/ during flowering and end of vegetative growth period). After 30 days, total RNA was isolated from 100 mg tissue samples of each of roots and leaves in triplicate. These were then used for cDNA synthesis as described above and validated by qRT-PCR. Primer pairs of genes related to photosynthesis, rhizome formation, flowering and secondary metabolite biosynthesis were used. The details of primer sequence for each gene is given in Supplementary Table 8.

UPLC-PDA analysis of phenolic acids. Sample preparation. Fresh tissues (200 mg) of leaves and roots
of in vitro plants growing at two different temperatures and photoperiods, i.e., (i) 16 h light/ 8 h dark at 25 ºC; and (ii) 14 h light/10 h dark at 15 ºC were homogenized in liquid nitrogen to form a fine powder. The samples were extracted with 2.0 ml of 70% methanol by centrifugation for 10 min at 7000 rpm. Extracts were transferred to a fresh centrifuge tube and additional 1.5 ml of 70% methanol was used to recover the left-over sample, twice. Extracts were pooled together and the final volume was made up to 5.0 ml. These were then centrifuged at 7000 rpm for 15 min at 4 ºC. The samples were either stored at 4 ºC or used for UPLC-PDA analysis after filtering through 0.22-µm filter (Millipore, USA).
Standard preparation. Standards of phenolic acids namely, gallic acid, caffeic acid, p-coumaric acid, rutin, ferulic acid, quercetin, cinnamic acid and kaempferol were prepared at 1:1 mg/ml (w/v) concentration in HPLC grade methanol. The standards were purchased from Sigma Aldrich, India.
UPLC-PDA analysis. Quantification of selected phenolic acids and flavonoids was performed using Waters HSS-T3 C18 column (2.1 mm × 100 mm, 1.8 µm), operated by Waters Acquity UPLC, H-class system (Waters, Milford, MA, USA). The mobile phase consisted of A (water containing 0.1% formic acid) and B (acetonitrile (ACN) containing 0.1% formic acid) in gradient elution. The gradient comprised of 10% B at 0-1 min; and 1-8 min of linear gradient from 10% B to 45% B; 95% B at 8-10 min followed by 10% B at 10-13 min. Detection wavelength was set at 270 nm and the elution was performed using an injection volume of 2.0 µl at a solvent flow rate of 0.25 ml/min. Data was calculated by plotting the calibration curve of selected standards and using regression equation. Data was statistically analysed using SPSS v 14.0 software (SPSS Inc., Chicago, Illinois, USA). The statistical significance between the mean values was assessed by Duncan's Multiple Range Test (DMRT) at a probability/significance level of P ≤ 0.05.

Data availability
All data generated or analysed during this study are included in this published article and its supplementary information files. All the sequencing data have been deposited in the NCBI SRA database having accession no. PRJNA608834.
Received: 23 April 2020; Accepted: 11 September 2020 Scientific Reports | (2020) 10:17186 | https://doi.org/10.1038/s41598-020-74049-1 www.nature.com/scientificreports/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.