De novo sequencing and comparative analysis of leaf transcriptomes of diverse condensed tannin-containing lines of underutilized Psophocarpus tetragonolobus (L.) DC

Condensed tannin (CT) or proanthocyanidin (PA) is a unique group of phenolic metabolite with high molecular weight with specific structure. It is reported that, the presence of high-CT in the legumes adversely affect the nutrients in the plant and impairs the digestibility upon consumption by animals. Winged bean (Psophocarpus tetragonolobus (L.) DC.) is one of the promising underutilized legume with high protein and oil-content. One of the reasons for its underutilization is due to the presence of CT. Transcriptome sequencing of leaves of two diverse CT-containing lines of P. tetragonolobus was carried out on Illumina Nextseq 500 sequencer to identify the underlying genes and contigs responsible for CT-biosynthesis. RNA-Seq data generated 102586 and 88433 contigs for high (HCTW) and low CT (LCTW) lines of P. tetragonolobus, respectively. Based on the similarity searches against gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) database revealed 5210 contigs involved in 229 different pathways. A total of 1235 contigs were detected to differentially express between HCTW and LCTW lines. This study along with its findings will be helpful in providing information for functional and comparative genomic analysis of condensed tannin biosynthesis in this plant in specific and legumes in general.

The presence of CT in over 8-10% of dietary dry matter, markedly reduces the nutritive value of the browse shrub mulga (Acacia aneura) 17 . However, the present study focuses on the role of various genes or contigs in the biosynthesis of CT in this underutilized P. tetragonolobus. The present study and its finding will open up new avenue for understanding the molecular basis of biosynthesis of CT for further improvement of this crop.
In recent years, several studies have successfully reported the generation of transcriptome data for gene discovery in non-model plants for which no reference genome sequences are available 18 . Due to the availability of quick, low cost sequencing and high quality annotation and assembly tools, it has become possible to analyze and understand the genome of non-model plants. As a non-model legume P. tetragonolobus is carried out for transcriptome sequencing. Condensed tannin is considered as an anti-nutrient. Effective and optimum reduction of CT concentration from the plants will promote the lesser-known underutilized legumes for further consumption where there is an ever increasing demand for food and nutrition security. So, in order to genetically-improve this plant, it is needed to identify and understand the metabolic pathways and the underlying factors or genes regulating the biosynthesis of CT.
Most tracheophytes synthesize condensed tannins in the vacuole as uniformly stained deposits and are termed tannin accretions, these appear as the lining in the inner face of the tonoplast and are formed in the thylakoid-derived organelles in the name "tannosomes". These are packed in membrane-bound shuttles, it has been suggested that these shuttles agglomerate into tannin accretions. Further, the tannosomes travel from the plastid towards the vacuole in multiple membrane-bound shuttles. So, leaf is the centre of condensed tannin biosynthesis in plants. So, the leaf-tissues of P. tetragonolobus are targeted for transcriptome sequencing.
Recently, de novo assembled trancriptomes of four underutilized legumes namely: Lablab purpurea (Malwi origin), Lathyrus sativus (Indian origin), Psophocrapus tetragonolobus (Nigeria origin) and Vigna subterranean (Africa origin) were carried out and 1139 SSRs out of 33,042 genes and 49,280 contigs of P. tetragonolobus were identified 19 . Recently the report identifying SSRs and SNPs among the de novo assembled transcriptome data of two Sri Lankan accessions of P. tetragonolobus had been published 20 . However, most of their studies related to the detection of SSR and SNP molecular markers from transcriptome data of P. teragonolobus from Nigerian and Sri Lankan origin. It was carried out on Roche 454 sequencing platform. But, the present study focuses on the biochemical screening of P. tetragonolobus for condensed tannin and mining contigs or genes responsible for biosynthesis of condensed tannin in P. tetragonolobus. This is the first report of transcriptome study of phenylpropanoid biosynthesis pathway responsible for condensed tannin biosynthesis in P. tetragonolobus. The study may pave path for further manipulation of genes or transcription factors for reducing the content of condensed tannin in the plant to make it more acceptable.

Result
Quantitative and qualitative estimation of condensed tannin in P. teragonolobus. Quantitative estimation of CT through vanillin assay was carried out among 100 accessions of P. tetragonolobus collected from different parts of the world (Supplementary Table 1). The range of CT varies between (3.59 ± 0.31)mg/g DW and (0.27 ± 0.04) mg/g DW (Fig. 1C) and accordingly denoted as high condensed tannin-containing winged bean (HCTW) for accession (EC-38956-1 from Papua New Guinea) and low condensed tannin-containing winged bean (LCTW) for accession (EC-178268 from Malaysia). Histochemical staining of the plant-parts with dimethylamminocinnamaldehyde (DMACA) provided a clear picture of its deposition in different tissues as it developed a deep-blue colour when stained with DMACA (Fig. 1B). Flavan-3-ol (catechin) is inferred to be the basic monomeric unit as per the analysis of the plant-tissues on HPLC-platform and reported the presence of pelargonidin (0.62 ± 0.1 mg/g DW), delphinidin (0.83 ± 0.08 mg/g DW) and cyanidin (0.38 ± 0.06 mg/g DW) in HCTW and pelargonidin (0.32 ± 0.04 mg/g DW), delphinidin (0.13 ± 0.03 mg/g DW), cyanidin (0.08 ± 0.04 mg/g DW) in LCTW P. tetragonolobus (Fig. 1D). Transcriptome sequencing, de novo assembly and functional annotation of contigs. The complementary DNA (cDNA) libraries prepared from the diverse CT leaf tissues of winged bean were sequenced using Illumina Nextseq500 platform. Paired-end sequencing-by-synthesis (SBS) yielded raw data 13901780 (13.9 million) for VS1 and 30155788 (30.15 million) for VS2 in HCTW line and 9090662 (9.09 million) for VS3 and 10394022 (10.3 million) reads for VS4 in LCTW line of P. tetragonolobus. VS1 and VS2 are the replicated leaf transcriptome data for high condensed tannin-containing (HCTW) line of winged bean and VS3 and VS4 represented the duplicate low condensed tannin (LCTW) lines of winged bean. After trimming, number of high quality reads in HCTW line were (VS1) 12604265 (84.01%) and (VS2) 27769919 (92.09%) and LCTW line were (VS3) 8098380 (89.08%) and (VS4) 9518980 (91.58%). The replicated data of HCTW (VS1 and VS2) and LCTW (VS3 and VS4) were clustered and de novo assembled together using Velvet assembler with a hash length of 45. This generated 102586 and 88433 contigs for HCTW and LCTW line of winged bean, respectively. Contig generation was carried out using Oases-0.2.08 with same hash length that resulted in 87925 and 69464 contigs for HCTW and LCTW lines of P. tetragonolobus, respectively. In both cases, average contig lengths were of 713.1 bp and 736.0 bp with N50 values of 1466 and 1567 for HCTW and LCTW lines, respectively ( Table 1). Distribution of assembled contig length ranged from 151 to > 87925 bases. Maximum number of contigs was of (150-300) bp size with 36646 contigs followed by 13732 contigs of (301-500) bp size in HCTW and LCTW lines of P. tetragonolobus. About (150-300) bp size contigs were highest in number (31631 contigs) followed by 11310 contigs of (301-500) bp size. In both cases, some of the contigs, 1120 in HCTW and 1096 in LCTW were more than 5000 bp size ( Fig. 2A). The contigs from both P. tetragonolobus samples were clustered using CD-HIT-v4.5.4 at 95% identity and query coverage resulting in a total of 44972 contigs. A total of 44972 contigs were generated after clustering and they were further annotated with Arabidopsis thaliana, Glycine max and Lycopersicum esculentum (Supplementary Table 2 contigs were annotated with Arabidopsis thaliana, Glycine max and Lycopersicum esculentum and 115, 1426 and 364 contigs were uniquely annotated (Fig. 2B). A total of 28833 contigs were annotated and 16139 contigs were unannotated. Functional classification and differential gene expression of P. tetragonolobus. In order to assign functions, contigs from HCTW and LCTW were compared against the NR protein database of Arabidopsis thaliana, some members of solanaceae and fabaceae family available at uniport database using blastx algorithm. To investigate the expression level of contigs in HCTW and LCTW, the numbers of clean reads were compared between libraries of each of 44972 assembled contigs through FPKM value. A total of 1235 contigs were found to be differentially expressed between the HCTW and LCTW, among them 183 contigs were up regulated and 1052 contigs were down-regulated (Supplementary Table 3).
Based on the sequence homology, 10492 sequences from HCTW and 674 sequences from LCTW were characterized into 45 functional groups under three main categories i.e. biological processes (BP), cellular component (CC) and molecular function (MF) (Fig. 3). The highest gene expression in other biological and metabolic processes in all the three GO categories followed by cellular component related genes especially 'cell organization and biogenesis' related gene (28.1% in HCTW and 20% in LCTW), 'kinase activity' is quite higher in HCTW compared to LCTW, 'protein binding' (19.1% in HCTW and 8.7% in LCTW), transport related pathway genes (25% in LCTW and 21% in HCTW) and response to abiotic and biotic stimulus (32.6% in LCTW and 31% HCTW) (Supplementary Table 4).

KEGG result.
To identify the biological pathways and their function in the leaf tissues of LCTW and HCTW, 44968 contigs were assembled from the contrasting lines of P. tetragonolobus and were mapped to the reference canonical pathways in KEGG. A total of 5210 contigs from both HCTW and LCTW lines were assigned to 229 KEGG pathways (Supplementary Table 5). 1861 contigs (35.7%) were involved in metabolic pathways, similarly, biosynthesis of secondary metabolites 995 contigs (19%), 271 contigs (5%) of plant hormone and signal transduction and 360 contigs (6.9%) of ribosome-related proteins and 154 contigs for phenylpropanoid pathway (Supplementary Table 6  PageMan analysis 21 also showed the different expression profile of anthocyanidin-biosynthesis related pathway genes in HCTW and LCTW lines of P. tetragonolobus (Fig. 4). The Pageman analysis categorized the genes as: up-or down-regulated in HCTW and LCTW lines. LCTW genotype of P. tetragonolobus exhibited repressed or regulated flavonoid biosynthesis. Anthocyanidin biosynthesis genes were up regulated in HCTW line whereas, in LCTW line certain genes were down regulated e.g. the dihydroflavonol-4-reductase (DFR) gene was down regulated in LCTW line of winged bean. It had also been observed that, some genes related to unknown secondary metabolite biosynthesis like: sulphur containing glucosinolate genes were down regulated in LCTW line as compared to HCTW line. Similarly, certain isoprenoids non-mavalonate pathway related genes were up regulated in HCTW as compared to LCTW line. From these observations, it can be inferred that, there is a cross-talk of different pathways in the biosynthesis of condensed tannin. But, the present study relates its study to phenylpropanoid pathway and the expression of the genes corresponding to this pathway.

Detection of transcription factors in winged bean.
Among all the de novo assembled contigs fifteen different types of transcription factor (TF) families were reported. Among these reported TFs, 33 contigs encode for HCTW and 5 contigs encode for LCTW line P. tetragonolobus. The maximum number of contigs were reported for C2H2 family and WRKY family TFs. The transcription factors were more frequently present in HCTW as compared to LCTW line (Fig. 5). However, bHLH group of transcription factors were only present in HCTW line of P. tetragonolobus.

Identification of simple sequence repeats (SSRs). P. tetragonolobus contigs of HCTW and LCTW were
analyzed and 2237 and 1618 SSRs were reported between LCTW and HCTW lines respectively. Trinucleotide repeats were maximum at 881 and 663 SSRs markers in HCTW and LCTW respectively. Whereas, pentanucleotide repeats were minimum i.e. 16 and 11 in HCTW and LCTW lines respectively ( Table 2).

Genes reported for condensed tannin biosynthesis in P. tetragonolobus. Precursor molecules for
condensed tannin biosynthesis were reported to be derived from the phenylpropanoid pathway. Uniport annotations against members of solanaceae family were used to identify genes encoding enzymes involved in different steps of phenylpropanoid and flavonoid backbone biosynthesis in P. tetragonolobus. KEGG analysis showed that 10 CT-biosynthesis genes especially, anthocyanidin synthase (ANS), 4-coumarate-CoA ligase (4-CCL), chalcone synthase (CHS), chalcone-flavonone isomerase (CHFI), chalcone isomerase (CHI), cinnamyl alcohol dehydrogenase (CAD), dihydroflavonol 4-reductase (DFR), cinnamoyl CoA reductase (CCR), phenylalanine ammonia-lyase (PAL) and anthocyanidin 3-O-glucosyltransferase (A3GT) had lower expression in case of LCTW in leaf tissues as compared to HCTW gene expression of P. tetragonolobus. CHS and ANS genes were more expressed in HCTW plant. A3GT gene which is involved in glycosylation of anthocyanidin compounds are responsible for biosynthesis of cyanidin compounds in plant 22 , this was also reported to have higher expression in HCTW plant. The gene analyses showed that, HCTW line have higher level of polymerization properties of condensed tannin. Among the, 26 contigs of CCL in which 14 contigs were highly expressed and 2 contigs were in lower expression  in HCTW and 10 contigs were highly expressed and 8 contigs in lower expression in LCTW line, 3 contigs of ANS and 5 contigs of A3GT genes were highly-expressed in HCTW line as compared to LCTW and showed that the genes responsible for condensed tannin biosynthesis were highly expressed. A3GT and ANS were the major genes for cyanide compounds and its glycosylation 23 (Fig. 6). 2, 6 and 1 contigs were present in CHS, CHFI and CHI gene respectively which is the initial enzyme for synthesis of condensed tannin and its higher expression were reported in HCTW line. DFR gene had 3 contigs and highly expressed in HCTW as like ANS contigs as DFR is the major responsible gene for ANS biosynthesis (Fig. 7). Similarly in case of PAL and CAD gene, there were 7 and 9 contigs respectively. The expression of all contigs in PAL gene were highly expressed in HCTW whereas, CAD contigs showed mixed type of expression but 6 contigs were highly expressed in HCTW line.  In the HCTW line, mostly the genes of flavonoid biosynthesis were down regulated. The qRT-PCR data revealed the expression profile of these contigs by FPKM value (Fig. 8).

Discussion
In this study, we explored the genes and TFs responsible for condensed tannin biosynthesis in the underutilized P. tetragonolobus. Each contrasting-line expressed and accumulated varying concentration of condensed tannin. Tannins are defined as a group of phenolic metabolites of varying molecular weight (500-30000 Da). Proanthocyanidins are major group of tannin which are deposited in seed and leaves of plants. Presence of high content of tannins in the presently studied wild legume seeds might be due to the metabolism of monomeric polyphenolic compounds (catechin) or polymerization of existing phenolic compounds during development and maturation 24 .  According to Serrano 25 the estimated amount of condensed tannin for daily intake for a person is 53.6 mg/person/day. There are a lot of epidemiological data suggesting, tannin intakes which may prevent the onset of many chronic diseases. But, in case of winged bean HCTW plant seeds contained 3.5 mg/g by dry weight. If people consume it as a source of protein it will cause indigestion. Optimum intake of condensed tannins or proanthocyanidins (PAs) are considered as functional ingredients in nutritional supplements and therefore, they are presently attracting more attention.

Type of winged bean HCTW(VS1_VS2) LCTW(VS3_VS4)
Genome sequences of legume species of L. japonicus 26 , soybean 27 and pigeonpea 28 are now available and the corresponding data may be exploited for interpretation of transcriptome resources from unknown or less revealed data such as winged bean, to support gene annotation and comparative genome analysis. RNA-Seq technique has a lots of application for analyzing a transcript, including exploration of different biological processes of different phenotype of plant, RNA-seq technology has previously been used to characterize the transcriptomes of a number of plant species, including pea and soybean 29,30 . Our experiment was designed in a manner to find the genes responsible for condensed tannin biosynthesis in leaf through Illumina sequencing. Illumina sequencing (de novo sequencing) platform has been mostly used for transcriptome analysis of plants which has no previous reference genome [31][32][33] . Result of winged bean data and the lengths of contigs generated using Illumina platform is similar to Vigna radiata and Pisum sativum transcriptomes. They have been reported with varied lengths of 874 bp and 809 bp respectively 34,35 and were recently published. The winged bean genotypes CPP34 and CPP37 also showed the average contig size is 837 and 823 bp respectively 20 .
TRINITY assembler generated high number of primary assembled contigs from recently duplicated genes which lead to the generation of similar contigs 36 . The small sized sequences (~100-200 bp) may be tough to BLASTX analysis, and so the short sequences remained as unpredicted whose function was unknown 37 . Sequence similarity of contigs to the genomes and transcriptomes of other legume and tomato plant were determined using BLASTn analysis, exploring the levels of conservation up to 63%. Comparable results were observed for the winged bean transcriptome in similarity searches against other legumes 38 . Moreover, up to 63% of the annotated contigs were similar to legume genomes. BLAST analysis showed a highest level of similarity to sequences of Glycine max, followed by Ricinus communis, Vitis vinifera and Populous trichocarpa. The HCTW and LCTW lines also showed similarity with Sorghum bicolor (Fig. 9). Winged bean, Vigna radiata and soybean all belongs to the tropical warm-season legume, and may be mutually more closely related, within the Papilionoideae 39 .
Gene ontology (GO) is a bioinformatics-based gene functional classification system offering an updated and a strictly defined concept to comprehensively describe the properties of genes and gene products of an organism 40   In case of KEGG analysis, well-represented pathways in the winged bean transcriptome included those involved in metabolic pathway, biosynthesis of secondary metabolites, flavonoid biosynthesis, carbohydrate metabolism and energy metabolism. All the major genes involved in the flavonoid biosynthesis pathway were identified and majority of them were present in HCTW lines of winged bean as compared to LCTW plant. Vigna radiata and Pisum sativum transcriptome data showed similar KEGG profile and had similar result with our data 34,35 . Transcription factor families were varied in both type of plant-lines. Transcription factor bHLH was predominantly present in HCTW line. Other studies showed that this TF family is most regulating factor for flavonoid biosynthesis 41,42 . Recently published winged bean transcriptome data of African variety showed ten transcription factors family in which WRKY family was near to similar in HCTW plant 20 . Simple sequence repeats (SSRs) marker was frequently used for diverse genetic analysis by plant breeders due to co-dominant and highly polymorphic nature 43 . Our result showed 2237 SSRs in HCTW and 1618 SSRs in LCTW, which was near to transcript data showed by other legumes 19 . Trinucleotide repeats were more predominantly present in our data like other legumes medicago trancatula and Arachis hypogaea 44,45 .
Accumulation of proanthocyanidin polymers within plant vacuoles is synthesized in the chloroplast and then transported to vacuoles of cell 46 . Tian (2008) reported the chalcone synthase gene expression and localization in chloroplast 47 and Wang (2010) also localized the anthocyanidin synthase in plastid and vacuoles 48 . The gold immunolabelling experiment confirmed the presence of three enzymes of phenylpropanoid and flavonoid pathways like: Cinnamate -4-hydroxylase (C4H) 49 , chalcone synthase (CHS) 47 and anthocyanidin reductase 48 in the chloroplast of developing grape berries. Saslowsky and Winkel-Shirley (2001) suggested that, flavonoids are synthesized from phenylpropanoids by a multienzyme complex which is closely bound to the cytosolic face of the endoplasmic reticulum 50 . Few ultrastructural data also supports the production of phenolic compounds by the chloroplast 46 . The flavonoid monomers generally accumulated in the vacuolar storage pool 51 through different possible routes 52 . However, the mechanism of polymerization of condensed tannins is yet to be elucidated 53 while the nature of precursor molecules remains hypothetical 54 .
Some reports coined that chalcone synthase and anthocyanidin synthase are the possible precursor of condensed tannin 52,53 . So, we selected leaf tissue of HCTW and LCTW lines of winged bean. The seed of the contrasting lines showed variable metabolite concentration. As we have validated certain genes from leaf-tissues of P. tetragonolobus so, it is proposed that, the condensed tannin biosynthesis takes place in leaf tissues and subsequently might be transported to seed (Fig. 10) 46 . As transcriptome PageMan and pathway analysis showed that the anthocyanidin synthase and chalcone synthase genes were more expressed in HCTW line. Similarly, these genes were less expressed in LCTW lines of P. tetragonolobus.
The genes and transcription factors as well as markers identified in HCTW and LCTW lines of winged bean transcriptome study may further be utilized in generating silencing line and may be utilized for MAS breeding programme for reducing the anti-nutrient condensed tannin and improvement of this underutilized legume for enhanced consumption and utilization.

Methods and Materials
Plant material. Plants of P. tetragonolobus were procured from ICAR-National Bureau of Plant Genetic Resources, Akola, Maharashtra, India and was maintained at the botanic garden of CSIR-National Botanical Research Institute, India for further screening and experiment. Quantitative and qualitative estimation of condensed tannin in the plant. Vanillin assay. The total condensed tannin was determined by colorimetric method as well as modified vanillin-hydrochloric acid assay developed by Hagerman 55 . The deduced value was confirmed thrice. The highest and lowest condensed tannin-containing lines of P. tetragonolobus were selected for further study and anlaysis.
HPLC analysis of leaf extracts. The leaves of high and low-CT containing-lines (HCTW i.e. High condensed tannin winged bean and LCTW i.e low condensed tannin winged bean) were quantified by standard HPLC method 56 . The standards cyanidin, pelargonidin and delphinidin were used for further compositional quantification.
Histochemical staining with dimethylamminocinnamaldehyde (DMACA). The histochemical staining of condensed tannin (CT) in leaves was carried out by staining tissues in a solution of ethanol: 6 M HCl (1:1) containing 0.1% (w/v) dimethylamminocinnamaldehyde (DMACA) (Sigma-Aldrich) for 3 to 6 min 57 .  RNA isolation, library preparation and sequencing. Leaf tissues of diverse CT-containing lines (VS1, VS2 biological replicate of HCTW and VS3, VS4 biological replicates of LCTW) of P. tetragonolobus were collected from two months old plants grown in the garden of CSIR-National Botanical Research Institute. RNA was isolated from leaf tissues through RNA isolation kit (Sigma). The isolated RNA was treated with DNase enzyme for removal of contaminant DNA. The quality and quantity of total RNA was calculated through Bioanalyzer (Agilent). Good quality RNA (RIN value > 7) was further subjected to analysis and preparation of cDNA. The cDNA were amplified according to Illumina True-Seq protocol and sequenced on Illumina Nextseq500 platform. The samples were run in duplicates. The transcriptome library for sequencing was carried out according to Illumina Trueseq RNA sample preparation guide (part # 1004898). The enriched poly A RNA (1000 ng) was prepared using RNA purification beads which were fragmented for 4 min at elevated temperature (94 °C) in presence of divalent cations and reverse transcribed with superscript III reverse transcriptase by priming with random hexamers. Second strand cDNA was synthesized in presence of DNA polymerase I and RNaseH. The cDNA was cleaned up using Agencourt Ampure XP SPRI beads (Beckman Coulter, USA) followed by ligation of "Illumina adapters" to the cDNA molecules after end repair and addition of "A"-base. Following SPRI cleanup after ligation, the library was amplified using 11 cycles of PCR for enrichment of adapter ligated fragments. This method of library-preparation had been followed as per the manufacturer's protocol. The prepared library was further quantified using nanodrop and validated for quality by running an aliquot on high sensitivity bioanalyzer chip (Agilent). There were total four libraries prepared for further processing and analysis. Two libraries for high condensed tannin-containing winged bean HCTW (VS1, VS2) and the other two libraries for low condensed tannin-containing winged bean LCTW (VS3, VS4).
De novo assembly and annotation. Raw reads which were generated after sequencing were in fastq file format. The fastq files were trimmed to remove adapter sequences using fastq-mcf tool. The fastq file were filtered to remove reads having average quality score less than 20 in any of the paired end data. The trimmed reads were then de novo assembled using TRINITY 58 with default options. The raw reads data is available at the NCBI public domain with accession number SRA293442 for HCTW variety (VS1 and VS2) and SRA293454 for LCTW variety (VS3-VS4). The assembled contigs were annotated with the NCBI-NR and TAIR10 protein database using BLASTX with e-value < 10 −5 . The data was also annotated with Lycopersicum esculentum and Glycine max genomic database with same parameters. Non-redundant sequence dataset was generated using CD-HIT (Cluster Database at High Identity with Tolerance) by removing homologous sequences with identity threshold of (0.95 or 95%).
Sequence annotation and functional characterization. Differential gene expression analysis was performed using DEGseq program and FDR value was retrieved from q-value program in R-Bioc. The contigs with p-value < 0.05, FDR < 0.05 and Fold change < − 1 or > 1 were selected. The GO annotation was performed using the TAIR10 IDs for the annotated contigs using agriGO (http://bioinfo.cau.edu.cn/agriGO/analysis.php). The GO annotation was retrieved using singular enrichment analysis (SEA) statistical test at p-value < 0.05.
Biological pathway annotation. The PageMan analysis was performed using the differential genes which were annotated with the TAIR10 protein database. The analysis was performed with Benjamini Hochberg multiple testing correction and average statistics method. All the TAIR10 IDs of single assemblies were used to extract the KO IDs from the KAAS database (http://www.genome.jp/tools/kaas/). These KO IDs were submitted in KEGG (http://www.genome.jp/kegg/) to extract all the metabolic pathways 59 .
Detection of SSR molecular markers. Total Contig of HCTW and LCTW were analyzed with using Micro-satellite identification tool (MISA) for identification of simple sequence repeats (SSRs) having mononucleotides to hexanucleotides repeats.
Validation of gene expression by RT-PCR. Total RNA from both plants was extracted and treated with RNase-free DNase I (Ambion). First-strand complementary DNA was synthesized using 5 μ g of total RNA with oligo (dT) primer (Fermentas). The qRT-PCR was done in the Step OnePlus Real-Time PCR System (Applied Biosystems 7500). Primers used for qRT-PCR are listed in Supplementary Table 7. Contig abundances were calculated using Cufflinks 60 ver.0.93 and the output normalized expression values in FPKM (Fragments per kilo-base of exon per million fragments mapped) analogous to single-read FPKM 61 were used for further comparative analysis. To confirm the expression level using FPKM, 12 genes were chosen as cases for PCR validation. Each reaction contained cDNA (1 μ l) and Fast SYBR Green master Mix (Applied Biosystem, USA) 10 μ l in a final volume of 20 μ l. The reactions were performed using the following cycle conditions, an initial 94 °C for 2 min, followed by 30 cycles of 94 °C for 30 s, 60 °C for 30 s, and 72 °C for 30 s, and the final 5 min extension at 72 °C. The mRNA levels were normalized with 18 s rRNA gene 62 . All experiments were used in triplicates for each sample and relative gene expression levels were calculated using 2 −ΔΔCT method.