Phenotypic and Transcriptomic Analysis of Two Pinellia ternata Varieties T2 line and T2Plus line

Pinellia (Pinellia ternata (Thunb.) Breit.), as important medicinal plant, has been used to treat various ailments for a long time. The sixteen ploid plant (2n = 16 * 13 = 208) Pinellia T2Plus line was obtained from an octoploid (2n = 8 * 13 = 104) T2 line by chromosome-doubling technique. Compared with T2 line, the content of various medicinal components (polysaccharide, guanosine, adenosine and ephedrine) was increased in T2Plus line. In this study, the transcriptome of T2 line and T2Plus line were characterized by RNA sequencing (RNA-seq) technology. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis on differential expressed unigenes (DEGs) revealed that multiple metabolic pathway were enriched significantly, such as ‘Starch and sucrose metabolism’, ‘Purine metabolism’, ‘Photosynthesis’ and six transcription factors (MYB, WRKY, bHLH, lateral organ boundaries domain (LBD), homeodomain-zipper (HD-ZIP) and Ethylene-responsive factor (ERF)) play a key role in difference of transcriptome between T2 line and T2Plus line. These metabolic pathways and transcription factors may play an important role in the difference of medicinal components and epigenetic features between these two Pinellia cultivars. This conclusion provides a robust theoretical basis for the mechanism of the formation of medicinal ingredients in Pinellia cultivars.


Results
Comparison of biological characteristics. To solve the problem of uneven quality of Pinellia plants, a superior individual T2 line was obtained by screening cultivar from different regions. Tissue culturing, a technique that enables plants to rapidly reproduce in a sterile environment to achieve detoxification, was performed to generate a high-yield individual cultivar line T2 without contamination viruses or bacteria. Based on the high-quality T2 seedlings, the sixteen ploid Pinellia T2Plus line was obtained by using a chromosome-doubling technique (colchicine-induced). The T2 line, as a single plant Shandong Heze peach leaf Banxia 2 (T2), is an octoploid line existing in nature (2n = 8 * 13 = 104), However, the T2Plus line, as a sixteen times body (2n = 16 * 13 = 208) that does not exist in nature, has high research value. To identify the differences between T2 line and T2Plus line, we analyzed the biological characteristics of these two cultivars. It was found that as the number of leaves increased, the tuber also increased rapidly, but the phenotype in the T2Plus line were significantly better than those in T2 line (Fig. 1a). At the leaf proliferative phase, compared with T2 line, the length of the petiole was shorter, the diameter of the petiole was larger and the ratio of leaf length to width was smaller in T2Plus line (Figs. 1b, S1). At the seedling stage, compared with T2 line, T2Plus line had a low proliferation coefficient and less biomass accumulation (Fig. 1c) showed that T2Plus line has a later germination period and slow growth.
Pinellia is an important Chinese herbal medicine, therefore it is necessary to investigate any changes of the medicinal ingredients between T2 line and T2Plus line. There are many substances as the main components of the medicinal ingredients. For example, polysaccharides are the main components of the anti-tumor effect in the tuber of P. ternata 19 . Lectin, as a kind of protein, plays an important role in anti-tumor, anti-growth and anti-aphid effects 20 . Alkaloids are an essential ingredient in anti-tumor and lipid-lowering applications, as well as for treatment of cough, vomiting and other aspects 21 . The nucleoside substances of P. ternata such as guanine nucleoside, adenine nucleoside and hypoxanthine nucleoside, as water-soluble components, are involved in DNA metabolism, energy metabolism and protein synthesis. They can also improve liver function, contribute to the recovery of damaged liver cell function, stimulate the generation of antibodies in the body, and improve the Quantification and quality assessment of transcriptome data of P. ternata. To investigate the transcriptome differences between T2 line and T2Plus line, we performed RNA-seq experiments using total RNA isolated from these two lines. These two cultivars were analyzed in three independent biological replicates (6 samples in total). Total RNA was extracted from the Pinellia seedlings grown for 30 days, and cDNA libraries were prepared and sequenced with Illumina HiSeq sequencing instrument. After removing adaptor sequences, ambiguous reads and low-quality reads from 293,790,298 raw reads, a total of 255,299,598 clean reads (87.01% of raw reads) were generated, of which the mean length was 150 bp (Table 1). A total of 116,434 transcripts and 107,777 unigenes were generated by De novo assembly using Trinity Software. In these unigenes, 58,312 unigenes (54.10%) were between 200 and 500 bp in length, 26,595 unigenes (24.68%) were between 500 and 1000 bp and 22,870 unigenes (21.22%) were longer than 1000 nt (Fig. 2a). The functional information of these unigenes was obtained from five public protein databases (Swiss-Prot, Protein family (Pfam), evolutionary genealogy of genes:Non-supervised Orthologous Groups (EggNOG), KEGG, GO) (E value < 10 −5 ) by performing basic local alignment search tool nucleic acid (BLASTN) (Table S1). A total of 27,616 (25.62%) unigenes were annotated in at least one database and 11,392 (10.57%) unigenes were annotated in all five the databases (Fig. 2b).
The uniquely mapped reads for each sample were processed by Expectation-Maximization (RSEM) to determine the normalized expression level of each unigene ( Fig. S2; Table S2). The expression level of unigenes in T2Plus line was higher than that of T2 line (Fig. 2c). The Pearson correlation coefficient between the biological replicates of different samples varied from 0.94 to 0.98, indicating the high quality of the replicates (Fig. 2d).
Identification and annotation of differentially expressed genes between T2 line and T2Plus line. We identified gene sets showing significant differential expression between these two cultivars cultivar. In total, 3,239 differential expression genes (DEGs; false discovery rate (FDR) ≤ 0.05, |log2FC | ≥ 1) were generated, including 1,656 up-regulated DEGs and 1,583 down-regulated DEGs in T2Plus line (Fig. 3a,b; Table S3). The GO enrichment analysis of DEGs in T2Plus line was performed and the top 20 GO terms (ranked by p-value) of biological processes were listed ( Fig. 3c; Table S4). Various cell-related terms, such as cell cycle, cell wall organization or bio-genesis, cell division and mitotic cell cycle, were significantly enriched in DEGs. In addition, GO terms, photosynthesis, photosynthetic electron transport chain and chlorophyll catabolic processes were also highly enriched in DEGs ( Fig. 3c; Table S4).
To investigate the changes of metabolic pathways between T2 line and T2Pus line, all DEGs were subjected to Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways analysis and the significant pathways (P ≤ 0.05) were shown ( Fig. 3d; Table S4). A total of 18 metabolic pathways were enriched significantly, including 'Carbon fixation in photosynthetic organisms' , 'Purine metabolism' , 'Photosynthesis' and 'Starch and sucrose metabolism' , etc (Fig. 3d), suggesting that these metabolic pathways may associate with development and the synthesis of medicinal ingredients in Pinellia.
Photosynthetic efficiency analysis. Photosynthesis is closely related to plant growth and development.
The chlorophyll catabolic process and photosynthetic electron transport chain were enriched significantly in GO term enrichment analysis (Fig. 3c). In addition, a total of six DEGs associated with 'Photosynthesis' in KEGG pathway enrichment analysis were down regulated in T2Plus line ( Fig. 5a,b). The results indicate that photosynthetic efficiency in T2Plus line were lower than that in T2 line.
In order to confirm this hypothesis, an analysis of photosynthetic efficiency was performed. A useful measure index of the photosynthetic efficiency of plants is 'net assimilation rate' (A), which is the rate of increase of dry weight per unit of leaf area 23 . The transpiration rate (E) related to photosynthesis reflects the ability of plants to absorb and transport, which was regulated by stomata size. Carbon dioxide (CO 2 ) and water vapor are essential substances for photosynthesis, and the rate at which they enter stomata is measured by 'Stomatal conductance' (Gs) 24 . The 'Concentration of CO 2 intercellular' (Ci) value was maintained at a relatively stable level through both Gs and the efficiency of the mesophyll cell to utilize the substrate CO 2 and a lower 'Ci' value was indicated the better photosynthetic efficiency 25 . In high 'Vapor Pressure Deficit' (VPD) environment, reduction of 'E' would increase 'Water Use Efficiency' (WUE) 26 . Therefore, these six parameters (' A' , 'E' , 'Gs' , 'Ci' , 'VPD' and 'WUE') were measured. Compared with T2 line, the values of ' A' , 'E' and 'Gs' were lower and 'Ci' was significantly higher in T2Plus line than that in T2line, (Fig. 5b; Table 2), suggesting that the photosynthetic efficiency of the T2Plus line is significantly lower than that of the T2 line.

Discussion
Pinellia, as a traditional Chinese medicine, has a variety of pharmacological activities and a wide range of clinical applications. Due to the fact that the yield and quality of the Pinellia in different regions are different, the superior individual T2 line was obtained by screening cultivar from different regions. Next, we used tissue culture technology to detoxify T2 line, which increased the yield and quality of Pinellia. The chromosome-doubling technique, with colchicine-induced T2 suspension culture cells, was performed to generate polyploid Pinellia (sixteen ploid) T2Plus line (2n = 16 * 13 = 208) from octoploid T2 line (2n = 8 * 13 = 104). In T2Pus line, the phenotypes were significantly better, the length of the petiole was shorter, the diameter of the petiole was larger, the proliferation coefficient was lower and there was less biomass accumulation than with the T2 line. In addition, compared with T2 line, the contents of various medicinal components in the T2Plus line were increased, including polysaccharides, guanosine, adenosine, and ephedrine, which increased by 18.69%, 72.88%, 82.01% and 15.86%, respectively. These results suggested that doubling the chromosome of the Pinellia T2 line can noticeably increase its medicinal value.
The molecular mechanisms underlying the bio-synthesis of medicinal ingredients are poorly understood in Pinellia. The doubling chromosome of the T2 line inevitably leads to differences in the transcriptome. To investigate the relationship between the differential transcriptome and the bio-synthesis of medicinal components, we performed RNA-seq experiments using total RNA isolated from these two lines. A total of 1,656 up-regulated DEGs and 1,583 down-regulated DEGs (FDR ≤ 0.05, |log2FC | ≥ 1) were subjected to GO terms and KEGG pathways enrichment analysis.
In GO terms enrichment analysis, the DEGs in T2Plus line were predominantly involved in the cell-related terms, such as cell cycle, cell wall organization or bio-synthesis, cell division and mitotic cell cycle, which plays an important role in plant proliferation, growth and development. The phenomenon that the phenotype of T2Plus line were better than those of T2 line, indicating the cause due to the genes associated with these cell-related terms. In KEGG pathways enrichment analysis of DEGs, a total of 18 metabolic pathways were significantly enriched (P ≤ 0.05). In these pathways, 'Cysteine and methionine metabolism' , 'Glyoxylate and dicarboxylate metabolism' , ' Alanine, aspartate and glutamate metabolism' belong to the basic metabolic pathway, which can provide precursors for the development and the synthesis of medicinal ingredients in Pinellia. There were 9 DEGs involved in ' Ascorbate and aldarate metabolism' . Ascorbate is an antioxidant substance widely present in plant tissues and acts on plant cell resistance to oxidative stress, cell division and elongation 28 . Ascorbate is functioned an important role in plant metabolism and stress tolerance 29 . Interestingly, GO terms, photosynthesis, chlorophyll catabolic process and photosynthetic electron transport chain were associated with DEGs and the expression level of six DEGs related to 'Photosynthesis' pathway was significantly lower in T2Plus line than that in T2 line. In addition, compared with T2 line, photosynthetic efficiency was declined significantly in T2Plus line, suggesting that chromosome doubling reduces photosynthetic efficiency in T2Plus line.
In a cell or organism, many of the biological processes are influenced or controlled by the expression levels of genes regulated by transcription factors (TFs) 30 . In the present study, 539 potential TFs (21 up-regulated TFs and 28 down-regulated TFs) were detected in all of the transcriptome data. Among these TF families, MYB, bHLH, WRKY, LBD, HD-ZIP and ERF, accounted for a relatively large proportion, suggesting that these six TFs may play an important role in the transcriptome differences between T2 line and T2Plus line. MYB transcription factor family as one of the most abundant and diverse TF families in plant is widely involved in the regulation of different life processes, including cell cycle control 31,32 , secondary metabolism 33,34 , stress response [35][36][37] , etc. In current studies, bHLH transcription factors are involved in the regulation of various signal transduction and bio-synthesis and metabolism pathways, such as light signal transduction, hormone synthesis 38,39 , glandular hairs and root hair development and stress response 40,41 . WRKY transcription factors can regulate multiple biological processes by various networks regulation in plant 42 . For example, OsWRKY62 functions is both a negative regulator of innate immunity and a critical mediator of defense responses in Oryza sativa L 43 . Brassinosteroids (BRs) not only promotes the growth but also implicated in plant responses to drought stress. BR-mediated gene expression was promoted and drought responsive genes were inhibited by WRKY transcription factor families (WRKY46, WRKY54, and WRKY70) in Arabidopsis thaliana 44 . LBD transcription factor (lateral organ boundaries domain), a specific transcription factor 45 , play a key role in multiple processes of plant growth, development and metabolism in Oryza sativa L 46 , Arabidopsis thaliana 47,48 , Vitis vinifera 49 , Citrus 10 , Sugar beet 50,51 . HD-Zip transcription factor is composed of homeodomain (HD) and leucine zipper (LZ) and plays an important role in multiple aspects in Arabidopsis thaliana, such as development of leaf 52 , response to abiotic stress 53 and regulation of seed development 54 . As a transcription factor widely present in animals and plants, ERF (Ethylene-responsive factor) has the following functions: regulation of plant growth and development 55 , participation in abiotic stress response [56][57][58] , and participation in biotic stress response 59,60 . These results suggest that these transcription factors may play an important role in the observed differences of phenotype and medicinal ingredients between T2 line and T2Plus line, and may increase the ability of resistance to biotic and abiotic stresses in T2Plus line.

Materials and Methods
Assessment of morphological characteristics. The plant materials used in this study were selected and used to breed a single plant with excellent and even quality, including octoploid P. ternata no.2 (T2 line) of peach leaves, from Heze Shandong province, China, and a new sixteen ploid P. ternata (T2Plus line) was obtained by artificial chromosome doubling using colchicine.
For analyzing phenotypic differences between T2 line and T2Plus line, the growth situation of plantlets growing in the field were recorded during a three-month period. The lengths and widths of leaves, petiole and plant height of 30 plantlets were measured. The fresh weights and dry weights and biomass of 6 plantlets were measured, and the length/width ratios and proliferation coefficients and dry matter rates were determined.

Determination of medicinal ingredients.
To investigate whether there are differences in the medicinal components of P. ternata in these two lines, the contents of proteins, polysaccharides, alkaloids and nucleosides in tubers of P. ternata were determined. The content of polysaccharide was measured by the optimized phenolic   RNA isolation, library construction, and sequencing. For RNA analysis, the tissue samples were collected from cluster plantlets of P. ternata. These two groups included the T2 line and T2Plus line, with three biological replicates, and T2 line was the control group. At 30 days after planting, seedlings are collected and frozen in liquid nitrogen for more than 3 min immediately and stored at −80 °C until use. Five seedlings collected from www.nature.com/scientificreports www.nature.com/scientificreports/ each group were mixed and thoroughly ground to powder with liquid nitrogen, and approximately 50 mg per pooled sample was prepared to use. Total RNA was isolated from each replicated sample according to the instructions of OmniPlant RNA Kit (CWBIO, Beijing, China). Purified RNA samples were quantified and qualified by the Nanodrop 2000 c Spectrophotometer (Thermo Fisher Scientific, USA) and agarose gel electrophoresis analyses, respectively. For each sample, an approximately 2 μg total RNA was used for the quality evaluation. Quantified using Qubit 2.0 Q32866 (Life Technologies, USA) was used to prepare the cDNA library.
We constructed the cDNA library according to the instructions of NEBNext Ultra RNA Library Prep Kit for Illumina (NEB, USA); brief steps are described below. Firstly, the mRNA was supplemented through oligo (dT) bind magnetic beads along with split into small parts., These were reserved as an template for the initial strand and subsequent strand cDNA synthesis. Then, the residual overhangs were transformed into the blunt ends through ligase and polymerase. The resulting peices were end revamped by introducing an "A" base to the 3′ ends of the cDNA. And the NEBNext connecters with a hairpin circle assembly were ligated to the fragments. To select suitable cDNA fragments, Agencourt AMPure XP 60 ml Kit (Beckman Coulter, USA) was applied to purify the library fragments. Thereafter, the products were augmented by polymerase chain reaction (PCR) for the preparation of sequence libraries. The 350-450 bp fragments were recovered using QIAquick Gel Extraction Kit (QIAGEN, Germany). It's crucial of quality testing for the library to ensure that the library meets the requirements of sequencing. The built libraries were then sequenced via an Illumina HiSeqTM2500 platform.
De novo assembly and functional annotation. We removed adaptor sequences, poly-N reads and low quality reads using FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and Trimmomatic (V 0.36) software 61 to obtain clean reads from raw data (containing more than 50% bases with Q ≤ 30). We used Trinity software to perform de novo assembly with a k-mer length of 25 and other default parameters 62 . The transcripts were clustered with a TGI clustering tool, and the longest transcripts were defined as unigenes.
We used BLASTX program with E-value of 10 −5 to search functional annotation in the multiple protein databases, such as Swiss-Prot database 63 , Protein family (Pfam) database 64 , EggNOG database 65 , KEGG database 66 and GO database 67 . mRNA abundance distributions. To obtain the average Fragments Per Kilobase of transcript per Million fragments mapped (FPKM), the abundances of all detected mRNAs of all non-zero samples were averaged. The average FPKM was the transformed by log2 and the R function density() with default settings was performed to estimate a kernel density of this log2 FPKM distribution. The R function (plot(), lines() and legend()) was performed to plot distributions of these kernel densities.

Examination of differentially expressed unigenes.
To calculate the abundance of unigenes, the clean reads sequenced from each sample were mapped back to the unigene library. To quantify the level of gene expression, RSEM was used to calculate FPKM of individual sample. Then, the DESeq R software package 66 was used for comparing differential expression analysis between T2 line and T2Plus line. In multiple tests and analysis, we used FDR to identify the P-value threshold. Only genes with FDR ≤ 0.05 and more than two-fold change in expression between samples were considered as DEGs. On the basis of hyper-geometric test, GO enrichment analysis of DEGs was carried out by the topGO R package 68 . In addition, KOBAS software 69 was applied to test the enriched pathway of DEGs. GO terms and KEGG pathways with corrected P-value ≤ 0.05 were recognized as obvious over-expression.
Identification of transcription factors. Blastx was performed to search all of transcription factors from transcriptome using a plant transcription factor database (PlnTFDB 70 ; http://plntfdb.bio.uni-potsdam.de/v3. 0/ downloads.php) with a cut-off E-value of 1.0 e −10 to identify TFs 71 .
The all datas have been uploaded to the GEO database and the accession number was GSE140998.