Comparative analysis of tuberous root metabolites between cultivated and wild varieties of Rehmannia glutinosa by widely targeted metabolomics

Differential metabolites between tuberous roots from cultivated variety (ZP) and wild variety (YS) of Rehmannia glutinosa were analyzed by widely targeted metabolomics, and annotated to KEGG pathways. 228 secondary metabolites (SM) in ZP and YS were detected, of which 58 were differential metabolites (DM), including 41 flavonoids, 10 phenolic acids, 3 terpenoids, 2 alkaloids and 2 others, and 170 were unchanged; Among 58 DMs, 44 (75.9%) were up-regulated in YS, of which 30 were unique to YS, while 14 (24.1%) were down-regulated in YS, of which 10 were unique to ZP; Among flavonoids, 33 (80.5%) were more highly expressed in YS than in ZP; Among phenolic acids, 7 (70%) were more highly expressed in YS than in ZP; 12 of 58 DMs were annotated into 17 types of KEGG pathways. Among them, benzoic acid and p-Coumaryl alcohol were up-regulated in YS, and annotated into 10 pathways (58.8%) and 4 pathways (23.5%), respectively. In addition, much of DMs possess various pharmacological effects. These results indicated better quality of YS than ZP and the necessity of YS domestication. Taken together, this study will provide a reference for the scientific introduction, comprehensive development and utilization of wild Rehmannia glutinosa.

OPLS-DA. OPLS-DA is a multivariate statistical analysis method with supervised pattern recognition, and can solve the problem that PCA is not sensitive to the variables with little correlation. According to the differential variables, the score plot of every component (Fig. 2a) was formed to further show the differences between the components 28 . It was seen from Fig. 2a that R 2 Y and Q 2 Y(Q 2 ) were 1, while R 2 X equals 0.976, suggesting that OPLS-DA should be stable and reliable. Because Q 2 equals 1, more than 0.9, OPLS-DA is excellent. The OPLS-DA model is verified using 200 alignment experiments. The horizontal line corresponds to the R 2 Y and Q 2 of the original model, while the red and blue dots represent R 2 Y and Q 2 after replacement, respectively. R 2 Y (0.81) and Q 2 (0.52) in Fig. 2b were smaller than R 2 Y (1) and Q 2 (1) of the original model, suggesting that the corresponding points should not exceed the corresponding lines. Therefore, OPLS-DA is meaningful, from which variable infuence in projection (VIP) values are obtained. VIP value is used to screen differential metabolites.
Identification of differential metabolites. A total of 228 SMs between YS and ZP were detected by HMDB, MWDB and METLIN databases (see Supplementary Table S1 online). Using both FC ≥ 2 or ≤ 0.5 and VIP ≥ 1 as the screening standards, 58 DMs were screened and identified ( Table 2). Display of the difference data between groups by Volcano plot. Based on log 2 FC value and p-value, DMs were displayed in Volcano plot (Fig. 3a). In Fig. 3a, there were 44 up-regulated metabolites indicating that their expression contents in YS were higher than that in ZP, 14 down-regulated metabolites indicat- www.nature.com/scientificreports/   www.nature.com/scientificreports/ ing that their expression contents in ZP were higher than that in YS, and 170 unchanged metabolites indicating that their expression contents did not vary between YS and ZP. Meanwhile, Top 20 FC change metabolites were presented in Fig. 3b. This result was consistent with that based on the VIP values and FCs (Table 2).
Heatmap clustering. In order to show the varying law of 58 DMs with significant differences, their heat map was drawn (Fig. 4). The results showed that three YS repeats were grouped into one category and three ZP repeats were grouped into the other category. Because different metabolites had different accumulation trends in different samples, and the closer the accumulation trends, the closer the distances, 58 DMs had obvious expression differences between ZP and YS. 58 DMs were grouped into 5 clusters: (1) flavonoids including 41 ones, which were dominant in the SM of R. glutinosa tuberous roots. Among them, 33 (80.5%) metabolites were expressed higher in YS than in ZP. (2) 10 phenolic acids, of which 7 (70%) metabolites were expressed higher in YS than in ZP. (3) 3 terpenoids, 1 (33.33%) of which was expressed higher in YS than in ZP. (4) 2 alkaloids, half of which was higher expressed in YS than in ZP. (5) 2 metabolites that were higher expressed in YS than in ZP. In total, 44 differential metabolites (75.9%) were higher expressed in YS than in ZP, which was consistent with the screening of DMs based on VIP and FC as well as by Volcano plot ( Table 2, Fig. 3b).
Pharmaceutical activities of DMs suggesting quality changes. It was seen from Fig. 4 that DMs were grouped into 5 types such as flavonoids, phenolic acids, terpenoids, alkaloids and and others. Among them, flavonoids including luteolin, protocatechuic aldehyde and tricin, were a large family of plant secondary metabolites and a subdivision of polyphenols, a versatile class of natural compounds that represented secondary metabolites from higher plants. Flavonoids were an effective ingredient of many Chinese herbal medicines and had various medicinal effect, including antibacterial, anti-inflammatory, anti-oxidant 29,30 . Phenolic acids including benzoic acid and p-Coumaryl alcohol, had been proved to have a variety of pharmacological activities, such as cardiovascular and cerebrovascular effects, anti-tumor, anti-oxidation, anti-inflammation, anti-fibrosis, etc 31 . Terpenoids including phytolaccagenin, geniposidic acid, were also a kind of important compounds in Chinese herbal medicine and played an important role in plant growth and development, resistance and defense, etc 32 . Alkaloids including ehretioside were a kind of nitrogen-containing basic organic compounds existing in nature (mainly plants). Alkaloids had many pharmacological activities, such as analgesia, spasmolysis, anti-inflammation, anti-tumor, etc. In addition, some compounds in alkaloids could interact with chemical components such  Table S2 online). Take Phenylpropanoid biosynthesis of KEGG pathway as an example (Fig. 5). In Fig. 5, p-Coumaric acid, p-coumaryl alcohol, caffeic acid, ferulic acid, coniferyl aldehyde, coniferyl alcohol, p-coumaroyl quinic acid and coniferin were annotated in Phenylpropanoid biosynthesis, of which p-coumaryl alcohol was a differential metabolite upregulated in YS, compared with ZP, but p-Coumaric acid, caffeic acid, ferulic acid, coniferyl aldehyde, coniferyl alcohol, p-coumaroyl quinic acid and coniferin were unchanged. Among 58 DMs, 12 were annotated to KEGG database, 7 of which were annotated to KEGG pathways 23 times (see Supplementary Table S3 online). After some duplicate KEGG pathways were removed, there were still 17 KEGG pathways (Fig. 5). The seven metabolites were pelargonin chloride, peonidin 3-O-glucoside, p-Coumaryl alcohol, luteolin, benzoic acid, protocatechuic aldehyde and pratensein, respectively. These 17 KEGG pathways were categories anthocyanin biosynthesis, phenylpropanoid biosynthesis, biosynthesis of phenylpropanoids, metabolic pathways, biosynthesis of secondary metabolites, flavonoid biosynthesis, flavone and flavonol biosynthesis, phenylalanine metabolism, benzoate degradation, dioxin degradation, toluene degradation, aminobenzoate degradation, biosynthesis of alkaloids derived from shikimate pathway, microbial metabolism in www.nature.com/scientificreports/ diverse environments, degradation of aromatic compounds, isoquinoline alkaloid biosynthesis and isoflavonoid biosynthesis. Among them, phenylpropanoid biosynthesis and biosynthesis of phenylpropanoids were different metabolic pathway names in KEGG, which had similar names, but corresponded to two different pathway IDs (Table 3).

Discussion
Rehmannia glutinosa is an important perennial herb, and its tuberous roots is clinically used to treat fever, nervous conditions, diabetes, and hypertension, and to increase liver function, hematopoietic function and immune defense and so on in different ways 1 . The great medicinal market demand for R. glutinosa leads to its overexploitation and germplasm resource scarcity, and long-term vegetative propagation makes the variety degenerate, quality poor, yield low and genetic basis narrow, so it is urgent to enrich and improve R. glutinosa by using candidate germplasm resources. Wild R. glutinosa is a kind of important candidate resources for this objective. Before we make good use of its wild resources, we used widely targeted metabolomics to compare the SM between its cultivated and wild varieties in that its SM are vital for its important pharmacological activities.
On the one hand, we compared the phenotypic characteristics of ZP and YS. It was found that both differences were very obvious, especially in both sizes and weights (Table 1). This was consistent with that of Zuo et al. 35 . On the other hand, 228 SMs were detected in both YS and ZP using widely targeted metabolomics technology coupled with MRM and public databases (see Supplementary Table S1 online). They were divided into 170 unchanged metabolites and 58 DMs (44 upregulated, 14 downregulated) ( Table 2, Fig. 3b). 58 DMs were divided into 5 types such as flavonoids (41), phenolic acids (10), terpenoids (3), alkaloids (2) and others (2) ( Table 2).
It is known that plant metabolomes are composed of over 200,000 metabolites that control plant development, and even Arabidopsis contains some 5000 metabolites 1 , so R. glutinosa should contain more than these 228 SMs in YS and ZP of R. glutinosa Libosch. Moreover, 1049 metabolites were ever identified in the developing tuberous roots of R. glutinosa variety Jinjiu 36 . Therefore, the high throughout identification of widely targeted metabolomics used in this study was limited, compared with untargeted metabolomics 36 and others 37,38 . Its limitation should be attributed to the lack of large herbal medicine public metabolite databases.
The SMs of medicinal plants are very important substances in their life activities, closely related to their defense against diseases and insect pests and environmental stress, and important pharmacological activities. Based on these, our DMs may reflect the differences between wild and cultivated varieties of R. glutinosa. In this study, (1) the number of upregulated metabolites (44) were bigger than that of downregulated metabolites (14) in YS, compared to ZP (Table 2, Fig. 3b), Among 41 flavonoids, 33 (80.5%) were more highly expressed in Figure 5. Phenylpropanoid biosynthesis. The red dots: the differentially expressed metabolites that are increased, the blue dots: the detected metabolites, but there is no significant difference. www.nature.com/scientificreports/ YS than ZP, while among 10 phenolic acids, 7 (70%) were more highly expressed in YS than ZP. The number of unique metabolites to YS (30) was bigger than that of unique metabolites to ZP (10). Because these DMs possess important effects, these results suggested that the quality of YS is better than that of ZP. For example, flavonol had antioxidation; luteolin had a variety of pharmacological effects such as anti-tumor, antibacterial, anti-inflammatory, antiviral and analgesic effects 39 . Anthocyanin, a water-soluble flavonoid compound, had a variety of biological functions such as protecting plants from UV damage, scavenging reactive oxygen species, resisting adversity and changing the color of plants, as well as health efficacy such as anti-aging, anti-obesity and prevention of cardiovascular diseases 40 . According to investigation and research, flavonoids were synthesized by using intermediate products from phenylalanine converted via phenylpropane route as synthetic precursors, which were formed by different synthetic routes. Due to their various pharmacological effects such as cardiovascular protection, anti-cancer, anti-oxidation, anti-inflammation, liver protection and anti-tumor etc. flavonoids had become hot spots in the development and research of natural medicine at home and abroad 27,41,42 . According to our results, the expression of flavonoids and phenolic acids in YS was much higher; (2) Among quality control standards of R. glutinosa such as catalpol and rehmannioside D (now) or verbascoside (ever) in the Pharmacopoeia of the People's Republic of China 43 , rehmannioside D, catalpol and verbascoside were all contained in unchanged. These result showed that YS could be used to modify ZP, of which catalpol′ result is consistent with a previous report 44 ; (3) In this study, as the autotoxic metabolites such as ferulic acid, benzoic acid, protocatechuic aldehyde, 4-Hydroxybenzoic acid and so on, their the contents in YS were also much higher than that in ZP. Among them, benzoic acid had antibacterial and antiseptic effects 45 , ferulic acid could inhibit obesity and improve the steady state of blood lipid and blood sugar 46,47 , and protocatechuic aldehyde had cardiovascular and cerebrovascular protective effects 48 . Moreover, 30 kinds of DMs such as protocatechuic aldehyde, luteolin, tricin, diosmetin, homovanillic alcohol, jaceosidin, pratensein, hispidulin, malonyglygenistin, bartsioside, scutellarin and other compounds were unique to YS and reported for the first time in R. glutinosa.
These results indicate ZP and YS contained unique DMs, unique ones to YS were much more than that to ZP, and that artificial breeding increased the contents of main active metabolites in cultivated variety of R. glutinosa and selected out many metabolite in its wild variety. In addition, KEGG database helps researchers to study genes, expression information and metabolite content as a whole network, and provides integrated metabolic pathways involved in such as the pathways of carbohydrate, nucleoside and amino acids as well as biodegradation of organic compounds with enzymes. Therefore, it is a powerful tool for metabolism analysis and metabolic network research in vivo 1,49 . In the present study, 12 of 58 DMs between ZP and YS were annotated to 17 non-repetitive KEGG pathways (Fig. 6, Table 3). The main differential metabolic pathways between ZP and YS included metabolic pathways and biosynthesis of secondary metabolites. Among them, benzoic acid and p-Coumaryl alcohol up-regulated in YS were annotated into 10 (58.8%) and 4 (23.5%), respectively (Table 3). These results provided a clue for analyzing the metabolism of these metabolites and DMs, and their metabolic networks in R. glutinosa. In conclusion, based on the phenotypic differences between YS and ZP, we detected 228 SMs of YS and ZP by using widely targeted metabolomics, and www.nature.com/scientificreports/ then identified 58 DMs between ZP and YS via multivariate analysis. It was found that the metabolites of YS were more unique, and some of them were quality control metabolites in YS instead of ZP. Our results indicated better quality of YS than that of ZP and the necessity of YS domestication, and will provide a reference for the scientific introduction, comprehensive development and utilization of wild R. glutinosa. In addition, the related metabolic pathways will provide a theoretical basis for the subsequent exploration of biosynthesis of related metabolites in R. glutinosa.  www.nature.com/scientificreports/ rate, 0.40 mL/min 50 ; temperature, 40 °C; injection volume: 5 μL. The effluent was alternatively connected to an ESI-triple quadrupole-linear ion trap (Q TRAP)-MS. LIT and triple quadrupole (QQQ) scans were acquired on a triple quadrupole-linear ion trap mass spectrometer (Q TRAP), API 4500 Q TRAP LC/MS/MS System, equipped with an ESI Turbo Ion-Spray interface, operating in a positive ion mode and controlled by Analyst 1.6.3 software (AB Sciex). The ESI source operation parameters were as follows: ion source, turbo spray; source temperature 550 °C; ion spray voltage (IS) 5500 V; ion source gas I (GSI), gas II(GSII), curtain gas (CUR) were set at 55, 60, and 25.0 psi, respectively; the collision gas(CAD) was high. Instrument tuning and mass calibration were performed with 10 and 100 μmol/L polypropylene glycol solutions in QQQ and LIT modes, respectively. QQQ scans were acquired as MRM experiments with collision gas (nitrogen) set to 5 psi. DP and CE for individual MRM transitions were done with further DP and CE optimization. A specific set of MRM transitions were monitored for each period according to the metabolites eluted within this period. Similar but inconsistent experimental procedures had been successfully applied and implemented by Zhang et al. 51 before.

Materials and methods
Qualitative and quantitative analyses of metabolites. Metabolite structure analysis referred to existing mass spectrum public databases such as MassBank (http:// www. massb ank. jp/), KNAPSAcK (http:// kanaya. naist. jp/ KNApS AcK/), HMDB (http:// www. hmdb. ca/) 52 , MoTo DB (http:// www. ab. wur. nl/ moto/) and METLIN (http:// metlin. scrip ps. edu/ index. php) 53 and others. The primary and secondary spectra detected by mass spectrometry were analyzed qualitatively, and isotopic signals were removed during the analysis of some substances, including repeated signals of K + ions, Na + ions, NH 4+ ions, and fragment ions that were themselves other larger molecular weight substances repeating signal; Metabolites were quantified using MRM mode for mass spectrum peaks of metabolites, a peak per metabolite (mass spectrum file). Mass spectrum file was processed by MultiaQuant software for integration and correction of chromatographic peaks, a chromatographic peak per metabolite 54 .
Statistical data analysis. Metabolite data were log2-transformed for statistical analysis to improve normality and normalized. In order to explore the metabolites of cultivated and wild R. glutinosa, the 228 SMs had been used for cluster analysis by R v3.5.0 (http:// www.r-proje ct. org/). The cluster heat map were obtained using the agglomeration method of 'complete linkage' based on the Euclidean distances of 228 SMs between accessions. The color scale indicates the intensity of the metabolites (log2-transformed). Peak areas were integrated using the IntelliQuan algorithm 50 . Differences in the metabolites of root tissue between ZP and YS were determined using Welch's t-test (P < 0.01). The significantly changed (P < 0.01) metabolites were used for subsequent PCA. In parallel, unsupervised PCA was carried out by R v3.5.0 (https:// www.r-proje ct. org/). The supervised OPLS-DA was carried out by R v1.0.1, MetaboAnalystR (https:// www.r-proje ct. org/) 51 . Data were processed using Analyst 1.6.3 software.

KEGG function annotation.
DMs were annotated to KEGG pathways by KEGG database 34 . These KEGG pathways were classified according to their types.
Compliance with ethical standards. The conducted experiment complies with the laws of China.