Identification of candidate UDP-glycosyltransferases involved in protopanaxadiol-type ginsenoside biosynthesis in Panax ginseng

Ginsenosides are dammarane-type or triterpenoidal saponins that contribute to the various pharmacological activities of the medicinal herb Panax ginseng. The putative biosynthetic pathway for ginsenoside biosynthesis is known in P. ginseng, as are some of the transcripts and enzyme-encoding genes. However, few genes related to the UDP-glycosyltransferases (UGTs), enzymes that mediate glycosylation processes in final saponin biosynthesis, have been identified. Here, we generated three replicated Illumina RNA-Seq datasets from the adventitious roots of P. ginseng cultivar Cheongsun (CS) after 0, 12, 24, and 48 h of treatment with methyl jasmonate (MeJA). Using the same CS cultivar, metabolomic data were also generated at 0 h and every 12–24 h thereafter until 120 h of MeJA treatment. Differential gene expression, phylogenetic analysis, and metabolic profiling were used to identify candidate UGTs. Eleven candidate UGTs likely to be involved in ginsenoside glycosylation were identified. Eight of these were considered novel UGTs, newly identified in this study, and three were matched to previously characterized UGTs in P. ginseng. Phylogenetic analysis further asserted their association with ginsenoside biosynthesis. Additionally, metabolomic analysis revealed that the newly identified UGTs might be involved in the elongation of glycosyl chains of ginsenosides, especially of protopanaxadiol (PPD)-type ginsenosides.

Methyl jasmonate (MeJA) is an important cellular regulator in plants, which is involved in seed germination, root growth, fertility, fruit ripening, senescence, and plant defence mechanisms 17 . MeJA is an effective elicitor of ginsenoside biosynthesis in cultured cells and adventitious roots of P. ginseng 18,19 . In our previous study, we applied integrated transcriptomics and metabolomics to investigate metabolism in MeJA-treated adventitious roots of various P. ginseng cultivars 7 . Increased ginsenoside content corresponded with expression levels of candidate genes in the downstream ginsenoside biosynthetic pathway, including SQE and DDS 7 ; however, the dynamic regulatory mechanism for the final accumulation of ginsenoside, catalysed by UGTs, remains unknown.
Our previous study revealed that, of three ginseng cultivars, adventitious roots of the CS cultivar produced the lowest level of ginsenosides and related gene expression. However, treating CS with MeJA dramatically increased the ginsenoside content and corresponding transcription levels of candidate genes 7 . Therefore, we deduced that CS would be an appropriate cultivar in which to identify putative UGTs associated with ginsenoside biosynthesis in response to MeJA treatment. In this study, we used complementary transcriptomic and metabolomic approaches to investigate the UGT-catalysed biosynthesis of ginsenosides. Expression levels of UGT-related genes, and ginsenoside contents, were analysed in a time-dependent manner using MeJA-treated adventitious roots of P. ginseng, cultivar CS.
Phylogenetic analysis was then performed with the candidate UGTs and known UGT proteins from other plant species (Fig. 2). Intriguingly, the candidate UGTs could be placed into the same groups as previously characterized UGT proteins from other plant species. UGTs belonging to the UGT74 family in Arabidopsis thaliana 23 and P. ginseng 24 recognize substrates of glucosinolates and ginsenoside Rh 2 , respectively. Evidence that members of the UGT74 family are involved in terpenoid biosynthesis has been reported in A. thaliana 25 and P. ginseng (PgUGT71A27) 15 . We identified three new UGT genes (Pg_S4174.7, Pg_S4174.10 and Pg_S2390.5) belonging to the UGT74 family, along with a previously known P. ginseng UGT (PgUGT74A1).
Members of the UGT73 family are involved in saponin glycosylation in Siberian ginseng (Eleuthreococcus senticosus, Araliaceae) 26 , and other plants such as Barbarea vulgaris, Medicago truncatula, and Glycine max 27 . Our analysis revealed that the UGT Pg_S2351.7 is related to the UGT73 family, suggesting a role for that UGT in glycosylation in P. ginseng. Similarly, the involvement of UGT94 and UGT91 family members in triterpene biosynthesis was previously found in P. ginseng (PgUGT94B1) and G. max (GmUGT91H4) 28 . Here, two novel UGT genes (Pg_S6708.3 and Pg_S0804.16) were grouped into the UGT94 family, and one (Pg_S5521.1) was grouped into the UGT91 family. To validate our RNA-Seq data, we randomly selected six genes for reverse-transcriptase quantitative PCR (qPCR) assays (Fig. 3). The resulting expression patterns were consistent with RNA-Seq data, except for one gene, Pg_S4174. 10. We assume that this one discrepancy might have occurred because RNA-Seq read mapping is limited by the presence of highly identical paralog (>95% identity) copies of Pg_S4174.10 in P. ginseng.
Ginsenoside profiles in Panax ginseng adventitious roots. Adventitious root tissues from P. ginseng were prepared after 0, 12, 24, 48, 72, 96, and 120 h of MeJA treatment, and ginsenosides in extracts of those tissues were analysed using ultra-performance liquid chromatography-quad time of flight-mass spectrometry (UPLC-QTOF-MS) in triplicate (Fig. 4). Twenty-one major ginsenoside peaks were observed in LC-MS chromatograms. By comparing retention times and characteristic collision-induced dissociation (CID) fragment ions with reference compounds, 12 major ginsenosides (Rg 1 , Re, Rf, Rg 2 , Rb 1 , Rc, Ro, Rb 2 , Rb 3 , Rd, Rg 3 , and F 2 ) were unambiguously identified. Other peaks were tentatively assigned by comparing their empirical molecular formulae, CID fragmentation patterns, and relative retention times, with those of previously published data (Supplementary Table 1). The analytical method used in this study was similar to that we used previously 7 , with some modifications to characterize a greater number of ginsenosides in a metabolomic profile. In particular, the peaks of two major ginsenosides, Re and Rg 1 , could be separated within a chromatogram.
Base peak ion (BPI) chromatograms showed that the levels of most ginsenosides were dramatically increased following MeJA treatment, but some -Rg 1 (1), Re (2), and Ro (8) -showed relatively constant amounts. Intensities of chromatographic peaks at t R 3-5 min were also significantly increased; however, these metabolites were not expected to be ginsenosides (expected molecular formula were C 18 H 30 O 8 and C 19 H 32 O 10 ) although they could not be identified precisely. For more detailed analysis of ginsenoside profiles, our LC-MS dataset was processed into a peak table including 156 ion markers using Mzmine 2 software 29 . Principle component analysis (PCA) was performed on the processed ginsenoside profile to visualize time-dependent metabolic differences. In the PCA score plot, it is easy to recognize the ginsenoside accumulation pattern by increasing principle component 1 (PC1)   (7), Rb 2 (10), and Rd (14), and malonyl ginsenosides Rb 1 (6) and Rd (15) showed especially high positive correlations with PC1, suggesting that biosynthesis of these ginsenosides is significantly induced by MeJA (Fig. 5B).
Time-dependent accumulation patterns of the major ginsenosides were compared based on the ion intensity of their LC-MS profiles. Fold-changed relative abundance according to time after MeJA treatment was calculated using the Mzmine peak ion table (Fig. 6). As shown in Fig. 6, and as also found by Oh et al., the amounts of PPT-type and oleanane-type ginsenosides were not affected by MeJA treatment 30 , while the levels of PPD-type ginsenosides significantly increased with MeJA treatment and most PPD-type ginsenosides started to accumulate after 48 h. Our previous study suggested that PPD-type ginsenosides are involved in defence mechanisms against biotic stresses, while PPT-types are involved in mechanisms against abiotic stresses 30 . The ginsenosides Rg 3 , Rd, and Rb 3 showed the most dramatic accumulation, with 106.3, 43.0, and 32.2-fold increases, respectively, after 120 h. Contents of the ginsenosides Rb 1 , Rb 2 , and Rc also significantly increased, with 15.2, 14.2, and 17.9-fold changes, respectively. The accumulation patterns of the ginsenoside F 2 , compounds Mc 1 and O, and gypenoside XVII suggested that they became saturated after 96 h. Malonylated ginsenosides, a group of derivatives that are compartmentalized in the vacuole 31 , also increased in content, but these changes were relatively small compared with non-malonylated ginsenosides.
A putative biosynthetic pathway and biological roles of ginsenosides. Figure 7 shows the possible biosynthetic pathways of the PPD-type ginsenosides that significantly accumulated after MeJA treatment. These pathways are proposed based on their sugar moiety sequences. Of the compounds identified, several have two glucose residues at the C3-OH position, and there are two possible pathways for the biosynthesis of ginsenosides Rb 1 , Rb 2 , Rb 3 and Rc (see red and blue lines in Fig. 7). The red pathway follows the route from ginsenoside Rg 3 to ginsenosides Rb 1 , Rb 2 , Rb 3 , and Rc through ginsenoside Rd. Several different UGTs might be involved in this step by adding different glucose moieties, such as Glc, Xyl, Ara(p) and Ara(f). The blue pathway uses different substrates, but the reaction is the same: adding a glucose residue to 3-O-glucoside. In MeJA-treated adventitious roots of P. ginseng, the fold-changes of ginsenosides Rb 1 , Rb 2 , Rb 3 , and Rc show similar patterns as in Fig. 6, implying that these compounds are produced by a single UGT, or are co-controlled by very similar UGTs. Therefore, we suggest that the biosynthesis of ginsenosides Rb 1 , Rb 2 , Rb 3 , and Rc follows the blue pathway shown in Fig. 7.
Malonylation is a common form of plant metabolism used to store toxic materials separately in vacuoles 32 . The fact that ginsenosides undergo malonylation therefore provides evidence for the role of ginsenosides in defence. Malonylated ginsenosides significantly elicited by MeJA treatment in our study were Rb 1 , Rb 2 , Rb 3 , Rc, Rd, and   gypenoside XVII. We therefore suggest that these ginsenosides are the metabolites related to plant defence in P. ginseng, and propose that the 11 UGTs -including eight novel UGTs -identified in this study may contribute to the glycosylation of these ginsenosides.

Conclusion
We investigated time-dependent changes in UGT-related gene expression and ginsenoside content of MeJA-treated adventitious roots of the P. ginseng cultivar CS, which was previously shown to markedly increase ginsenoside biosynthesis upon treatment with MeJA. Transcriptomic analysis identified 11 candidate UGTs, including eight novel and three previously characterized UGTs. Metabolic profiling and phylogenetic analysis revealed that these candidate UGTs are strongly associated with ginsenoside biosynthesis. Furthermore, these newly identified UGTs might be related to steps involving the elongation of ginsenoside glycosyl chains, especially in PPD-type ginsenosides. Further functional characterization studies will elucidate the specific substrates of these UGTs in P. ginseng.

Materials and Methods
Plant materials. Adventitious roots of P. ginseng (CS cultivar) were cultured as previously described 33 and amplified in bioreactors containing 1 L Schenk and Hildebrandt liquid medium supplemented with 3 mg/L indole-3-butyric acid and 5% sucrose. Three grams of these fresh proliferated roots was transferred into 250-mL flasks containing the same medium supplemented with 200 µM MeJA and cultured at 23 ± 1 °C on a rotary shaker at 100 rpm under constant dark conditions. We selected 21 flasks containing healthy growing adventitious roots and harvested these with three replications for each condition: 0, 12, 24, 48, 72, 96, and 120 h after initiation of MeJA treatment. RNA isolation and transcriptome sequencing using the Illumina platform. Total RNA was isolated from MeJA-treated adventitious roots after 12, 24 and 48 h using the RNeasy Plant Kit (QIAGEN, Germany) according to the manufacturer's instructions. After examining its quality and quantity using a bioanalyzer (Agilent Technologies, USA), about 2 μg total RNA was used to independently construct RNA-Seq libraries with an insert size of 300 bp using the Illumina TruSeq RNA Sample Preparation Kit according to the manufacturer's instructions. Pooled libraries were sequenced with a paired-end read length of 150 bp using the Illumina NextSeq 500 platform at LabGenomics Co. (Seongnam, Korea). Raw reads were deposited into the National Center for Biotechnology Information Sequencing Read Archive. Accession numbers are provided in Table 1. Prior to differential gene expression analysis, reads containing bacterial contaminants were removed by mapping against available bacterial genomes using BWA 34 . Adaptor contamination was then removed using the NGS QC Toolkit (v2.3.3) 35 .

Chemicals and reagents.
Identification of UGTs in P. ginseng and differential gene expression analysis. The current version of the ginseng gene set (IPGA_v1.1) was retrieved from the Ginseng Genome Database 20 (https://ginsengdb.snu.ac.kr). Protein domains and motifs of UGT genes in P. ginseng were identified using InterProScan 36  per million (FPKM) using RSEM 37 . The bioconductor package edgeR 38 was used to identify differentially expressed genes in MeJA-treated samples. Genes exhibiting changes greater than two-fold with a significant false discovery rate (FDR) of 0.001 were considered differentially expressed.
Quantitative RT-PCR assay. Total RNA was isolated from the adventitious roots of P. ginseng cultivar CS treated with MeJA using the RNeasy Plant Kit (QIAGEN, Germany) according to the manufacturer's protocol. RNA quality and quantity was examined using formaldehyde agarose gel electrophoresis. cDNA was synthesized using 1 µg of RNA (Invitrogen, USA), and the cDNA was diluted to 1/10 for use in quantitative polymerase chain reaction (qPCR Ginsenoside analysis using LC-MS. Twenty milligrams of freeze-dried sample powder was sonicated with 1 mL of 70% ethanol for 90 min at room temperature. Extracts were centrifuged at 17,000 × g for 3 min, then 500 μL of each supernatant was dried using nitrogen gas and resuspended in 1 mL 50% methanol. Prepared samples were analyzed using a Waters ACQUITY ultra-performance liquid chromatography ( ESI conditions were set as follows: capillary voltage 3.5 kV, con voltage 45 V, source temperature 120 °C, desolvation temperature 300 °C, cone gas flow 50 L/h, and desolvation gas flow 800 L/h. High-purity nitrogen was used as the nebulizer and auxiliary gas, and argon was used as the collision gas. The mass spectrometer was calibrated using sodium formate over a range of 100-1500 Da; leucine enkephalin (m/z 554.2615 [M -H ] − ) was used as the lockmass to ensure mass accuracy and reproducibility. To identify ginsenosides, collision-induced dissociation (CID) data were recorded using MS E methodology 39 . Low collision energy to detect precursor ions was set to 4 eV, and the high collision energy for fragmentation was set to 40-45 eV.

LC-MS Data processing and multivariate analysis.
Mass spectrometry (MS) ion markers were extracted from liquid chromatography-mass spectrometry (LC-MS) raw data using Mzmine2 2.30 software 40 . Mass ion detection was performed with the noise level set to 2000, and based on the detected mass list, peak lists were built with criteria as follows: minimum time span of 0.02 min, minimum height of 3000, and m/z tolerance of 0.006 Da (or 10.0 ppm). Chromatographic deconvolution was achieved using the baseline cut-off algorithm (minimum peak height of 3000, peak duration range of 0.02-0.30 min, and baseline level of 1000). Chromatograms were de-isotoped using the isotopic peaks grouper algorithm with an m/z tolerance of 0.006 Da (or 10.0 ppm) and a t R tolerance of 0.1 min. Peak lists for samples were aligned together using the join aligner module (m/z tolerance at 0.006 Da or 20 ppm, absolute t R tolerance at 0.3 min, weight for m/z of 70, and weight for t R of 30), and the aligned peak list was eventually gap-filled with the peak finder module (intensity tolerance at 50%, m/z tolerance at 0.006 Da or 10.0 ppm, and absolute t R tolerance of 0.2 min). Peaks from MS contaminants were identified by blank injection, and duplicate peaks were manually removed from the aligned and gap-filled peak list. Principal component analysis (PCA) was performed using SIMCA 13.0 software (Umetrics, Umeå, Sweden), with Pareto scaling of data.