Metabolomic and transcriptomic analyses reveal the regulation of pigmentation in the purple variety of Dendrobium officinale

We performed an integrated analysis of the transcriptome and metabolome from purple (Pr) and normal cultivated varieties (CK) of Dendrobium officinale to gain insights into the regulatory networks associated with phenylpropanoid metabolism and to identify the key regulatory genes of pigmentation. Metabolite and transcript profiling were conducted by ultra-performance liquid chromatography electrospray tandem mass spectrometry (UPLC-ESI-MS/MS) and RNA sequencing. Pr had more flavonoids in the stem than did CK. Metabolome analyses showed that 148 differential metabolites are involved in the biosynthesis of phenylpropanoids, amino acids, purines, and organic acids. Among them, the delphinidin and quercetin derivatives were significantly higher in Pr. A total of 4927 differentially expressed genes (DEGs) were significantly enriched (p ≤ 0.01) in 50 Gene Ontology (GO) terms. Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses revealed significantly enriched phenylpropanoid biosynthesis and phytohormone signal transduction in Pr versus CK. The expression levels of flavanone 3-hydroxylase (F3H) and leucoanthocyanidin dioxygenase (LDOX) affected the flux of dihydroflavonol, which led to a color change in Pr. Moreover, DEG enrichment and metabolite analyses reflected flavonoid accumulation in Pr related to brassinosteroid (BR) and auxin metabolism. The results of this study elucidate phenylpropanoid biosynthesis in D. officinale.


Results
Pigment accumulation in the D. officinale purple variety (Pr) stem. Pigment variation in the purple variety (Pr) stem differed from that in the normal cultivated variety (CK) stem (Fig. 1a-c). Red pigments were observed in the Pr methanol HCl (hydrochloric acid in methanol) extracts of the stem (Fig. 1d). We also measured the pigment content of methanol extracts and found that the flavonoid and anthocyanin contents were 28.8% and 203.1% higher, respectively, in Pr than in CK (Fig. 1e,f). The extracts were analyzed by thin-layer chromatography (TLC) and exhibited chromatographic bands corresponding to the flavonoids rutin, quercetin, and myricetin ( Figure S1). These results indicate that Pr accumulated more flavonoids in the stem than did CK.
Nontargeted metabolomic analysis and overall metabolite identification. To further explore differences in the metabolite content of CK and Pr, the total extracts were subjected to ultrahigh-pressure liquid chromatography triple quadrupole mass spectrometry (UPLC-TQMS) for nontargeted metabolomics. The repeatability of CK and Pr extracts was judged by an overlapping analysis of the total ion current (TIC) in the quality control (QC) samples in positive and negative modes ( Figure S2). A principal component analysis (PCA) was used to analyze the contribution rate of the first two components, which were 75.75% and 80.37% in positive and negative modes, respectively. The three period materials were clearly separated, and each formed a cluster in positive and negative modes (Fig. 2a).
Identification of the differentially accumulated metabolites. Differentially accumulated metabolites (DAMs) were defined by a fold change ≥ 2 or ≤ 0.5, and a variable importance in project (VIP) ≥ 1 between Pr and CK (p < 0.05). A total of 148 DAMs were identified (Table S3), and volcano plots were generated to show that 44 of 148 (29.7%) were up-regulated and 104 of 148 (70.3%) were down-regulated (Fig. 3a). A hierarchical cluster analysis (HCA) was also performed to assess the DAMs and QC sample accumulation patterns (Figure S4). Partial least-squares discriminant analysis (PLS-DA) was used to observe differences in the metabolic compositions of the two materials, and each formed a cluster (Fig. 3b). Based on the KEGG classifications, we created a HCA heat map to investigate the metabolites involved in phenylpropanoid, amino acid, purine, and organic acid metabolism in Pr and CK. The majority of the flavonoids were accumulated at significantly higher  (Table S1). PCA revealed that Pr and CK presented different gene expression patterns. Pr clustered away from CK in PCA1, which explains 64.23% of the variation ( Figure S5a). Moreover, the HAC showed that the Pr and CK gene expression profiles of the three independent biological replicates clustered together ( Figure S5b).
DEGs expression profiles associated with flavonoid biosynthesis. Flavonoids are synthesized through the phenylpropanoid metabolism pathway, and many enzymes participate in the catalyzing steps. In this study, the transcript levels of 56 flavonoid biosynthesis-related structural genes were analyzed (Fig. 6a). Two putative PAL (Dca018706 and Dca009106) and two C4H (Dca007625 and Dca004773) genes were identified, and all were down-regulated in Pr compared to CK. Five 4CL (Dca001588, Dca010921 ,Dca000325, Dca021161, and Dca012359) genes were identified, among which the expression levels of Dca021161 and Dca012359 increased were also identified, and Dca020665 was up-regulated 4.7-fold in Pr vs. CK. Usually, flavonoids are glycosylated in plant cells, thereby increasing their solubility and facilitating their transport 12 . Fifteen UGT genes were identified, and three genes showed more than tenfold up-regulation in Pr vs. CK, including 40-fold Dca014179, 14.8-fold Dca028143, and 13.4-fold Dca006464. Two 5GT (Dca024061 and Dca009740) and one 3MAT (Dca017013) gene showed more than 11-fold upregulation in Pr compared to CK, except for Dca009740 (Fig. 6a). Anthocyanins are synthesized on the cytosolic surface of the endoplasmic reticulum (ER) and transported to the vacuole 19 . Eleven GST genes and six MATE genes were identified, and most of these genes were upregulated in Pr vs. CK, such as 15.8-fold Dca026334, 14.7-fold Dca028214, 8.3-fold Dca019469, and 4.6-fold Dca024549 (Fig. 6a). Moreover, 15 anthocyanin biosynthesis-related genes were chosen for expression validation by qRT-PCR (Fig. 6b, Figure S7). All of these 15 selected genes showed similar expression patterns in the RNA-seq data, which indicated that the RNA-seq data were reliable.

Integrated metabolomic and transcriptomic analyses in phenylpropanoid metabolism.
We integrated metabolomic and transcriptomic data to analyze the phenylpropanoid metabolism pathway in D. officinale (Fig. 7a). The phenylpropanoid pathway initiates from the aromatic amino acid phenylalanine, which decreased 4.3-fold in Pr vs. CK. Cinnamic acid decreased 3.1-fold in Pr vs. CK, while two PAL were down-regulated by 18.7-fold and 17.3-fold, respectively. 4-Coumarate decreased 4.2-fold in Pr vs. CK, while two C4H were down-regulated by 14.7-fold and 6.5-fold, respectively. Two CHS were up-regulated by 16.2-fold and 1.1-fold, respectively; nevertheless, chalcone decreased 1.2-fold in Pr vs. CK. Naringenin decreased 5.8-fold in Pr vs. CK, which was sustained by a 1.1-fold decrease in its mRNA. The 4CL family plays key roles in lignin metabolism and participates in sinapoyl-CoA, feruloyl-CoA, and caffeoyl-CoA biosynthesis 12 . We found that sinapic acid, www.nature.com/scientificreports/ conifer alcohol, and sinapoyl alcohol increased 3.53-fold, 9.65-fold, and 6.56-fold, respectively. One COMT gene (caffeic acid 3-O-methyltransferase) increased 3.80-fold in Pr compared to CK. Lignan is also an important active compound in Chinese herbs, which shares a common upstream pathway with lignins and branches after the synthesis of coniferyl alcohol 30 . These results suggest that the medicinal value of D. officinale requires further exploration.
As with lignin metabolism, increased levels of flavone and flavonol (i.e., dihydrokaempferol, dihydroquercetin, eriodictyol, quercetin, quercetin glucoside, and rutin) were associated with significant increases in mRNA (Fig. 7a). For example, dihydroquercetin was estimated to increase 19-fold in Pr compared to that in CK, which was underpinned by a fivefold increase in its mRNA. Substantial changes in Pr were also observed for compounds connected to anthocyanin metabolism (Fig. 7a). Anthocyanin metabolism plays an important role in plant coloration; cyanidin and delphinidin contribute to red to magenta and magenta to purple colors, respectively 12 . In Pr the total delphinidin content increased 45 times compared to CK, while one LDOX (Dca020665) was upregulated 4.7-fold and the other (Dca026251) showed no obvious change, suggesting that the pigmentation of Pr was due to delphinidin (Fig. 7a,b).
For a systematic overview of the associations between the DAMs and DEGs, we compared the log 2 fold changes between Pr and CK in mRNA and metabolites in Fig. 7c (the data details are shown in Table S8, Table S9). The metabolites and their correlated mRNA shown in red dots and green dots were both more abundant in Pr. For instance, three up-regulated genes (Dca026362, Dca020695, and Dca023004) of F3H were correlated with accumulation of dihydrokaempferol, dihydroquercetin, quercetin and eriodictyol. Down-regulated PAL (Dca018706) was correlated with the reduction of phenylalanine and cinnamic acid. The mRNA shown in orange    www.nature.com/scientificreports/ which are involved in the auxin, cytokinin, ethylene, JA, ABA, GA, SA, and BR signaling pathways ( Figure S8, Table S6). To further explore plant hormone metabolism associated with flavonoid biosynthesis, 16 related GO terms were identified, and two pathways were significantly enriched (p < 0.05) in Pr compared to in CK (Fig. 8a), including "response to auxin" (GO:0009733), "positive regulation of auxin-mediated signaling pathway" (GO:0010929), and "response to brassinosteroid" (GO:0009741). For the GO term "response to auxin", we analyzed the expression of 21 auxin metabolism-and signaling pathway-related genes and found that most of the genes were down-regulated in Pr (Fig. 8b). In "response to brassinosteroid", genes encoding AP2/ERF and B3 domain-containing protein and ornithine aminotransferase (Dca002334 and Dca002616) were significantly up-regulated in Pr, while two other genes (Dca017429 and Dca019769) were down-regulated in Pr (Fig. 7b). The endogenous brassinolide content was higher in Pr than in CK (Fig. 8c), and the auxin precursor (tryptophan) content was reduced in Pr (Fig. 8d). Together, these results suggest that BR and auxin metabolism affect flavonoid accumulation in Pr.

Identifications of TFs families in Pr vs. CK. Several TFs have been reported to play important roles in
flavonoid biosynthesis. In our study, 477 putative TF encoding genes belonging to 16 major TF families were analyzed in D. officinale (Table S10). A large number of TFs were included in the MYB family (90 genes), bHLH family (85 genes), AP2/ERF family (78 genes) and WRKY family (54 genes) (Table S11). Furthermore, DEGs analysis revealed that most of the TFs were significantly down-regulated in Pr (Fig. 9). For example, the AP2/ERF family, Dca024190 and Dca007398, decreased 118-fold and 141-fold in Pr vs. CK, respectively. The MYB family (Dca004957) was greatly decreased by 434-fold in Pr vs. CK. Three bHLH family (Dca024215, Dca011715, and Dca021380) were down-regulated by more than 100-fold in Pr vs. CK. However, some up-regulated TFs showed especially strong responses in Pr (Fig. 9). For example, the AP2/ERF family, Dca016529 and Dca016530 were

Discussion
Polysaccharides and alkaloids are common active constituents of Dendrobium species [31][32][33] . The biosynthesis of polysaccharides and alkaloids is strongly associated with plant primary and secondary metabolism, and plants contain three major groups of secondary metabolites: phenolics, terpenoids, and alkaloids. Phenylpropanoids are the first class of phenolics that can be divided into several groups, such as flavonoids, anthocyanins, and lignin 34 . However, few studies have qualitatively or quantitatively studied phenylpropanoids in Dendrobium. Eight flavonoid glycosides were identified in D. catenatum from three locations. Total flavonoid content in Guangxi samples were the highest, with a content of 3.87 μg/mg, while the total flavonoid content of the samples from the Guangdong and Zhejiang provinces were lower, at 2.40 and 2.85 μg/mg, respectively 8 . In this study, we identified a Pr of D. officinale with higher flavonoid and anthocyanin content in stems compared to that in CK (Fig. 1). www.nature.com/scientificreports/ Nontargeted metabolomics revealed 148 different metabolites involved in the biosynthesis of phenylpropanoids, amino acids, purines, and organic acids (Fig. 2). Phenylpropanoid biosynthesis is a complex process, and the initial three steps of the pathway, catalyzed by PAL, C4H, and 4CL, are necessary and provide the basis for all subsequent branches 12 . This suggests that the PAL, C4H, and 4CL genes play important roles in flavonoid enrichment. The pal1 pal2 double mutants were deficient of tannin pigments in the seed coat 35 . However, two isoforms of PAL and C4H were identified in our transcript data, and both were down-regulated in Pr compared to CK www.nature.com/scientificreports/ (Fig. 6). It could be that the late biosynthetic genes play a more important role than the early biosynthetic genes in phenylpropanoid metabolism in mature plants. For instance, the overexpression of SQD2.2, which encodes a glycosyltransferase in rice, conferred plants with enhanced flavonoid levels at the mature stage 36 . Flavonol synthase (FLS) and dihydroflavonol 4-reductase (DFR) have strong competition for common dihydromyricetin substrates that affect flower color in grape hyacinth 37 .
Integrated transcriptomic and metabolomic analyses demonstrated the gene-to-metabolite networks and enabled the mechanisms involved in Pr pigmentation to be deciphered (Fig. 7a). The flow of individual metabolites with mRNA, including F3H, DFR, and LDOX, suggests that delphinidin is the predominant anthocyanin in the purple variety. The biosynthesis of anthocyanin via the phenylpropanoid pathway is controlled by two types of genes: structural genes and regulatory genes. Structural genes can be divided into early biosynthetic genes (such as PAL and C4H) and late biosynthetic genes (such as LDOX and UGT ) 14 . In our study, the expression pattern of late biosynthetic genes was more complex than that of late biosynthetic genes. For example, PAL and C4H correlated with their metabolites and were down-regulated in Pr vs. CK. In contrast, one LDOX (Dca020665) was up-regulated 4.7-fold and the other (Dca026251) showed no obvious change in Pr vs. CK, and their related products, delphinidin and cyanidin, increased by 45.5-fold and 1.1.6-fold, respectively, in Pr vs. CK (Fig. 7a). Regulatory genes including MYB, bHLH and WD-repeat proteins, constitute a ternary complex predominantly responsible for orchestrating regulation of anthocyanin synthesis 38 . Our transcriptomic data identified 24 MYB genes and 32 bHLH genes as DEGs in Pr vs. CK, and most of them were significantly down-regulated (Fig. 9). This could be due to metabolic feedback or the preference of regulatory genes. In Arabidopsis, the MYB-bHLH-WD40 complex predominantly regulates late biosynthetic genes over early biosynthetic genes 39 . The expression of other late biosynthetic genes, such as DFR and LDOX, is nearly off or is undetectable in ttg1 and bHLH anthocyanin mutants 39 . Moreover, anthocyanin accumulation is closely related to subcellular transport, including vesicle trafficking, membrane transporters and GST 20 . GSTs are also associated with ER and vacuole membranes that produce high levels of anthocyanins 40,41 . In Arabidopsis, the Golgi-localized membrane protein can transport proanthocyanidin to the central vacuole via vesicle trafficking 42 . In our study, a large number of DEGs were classified into eight GO terms related to "cellular component", such as "plasma membrane" and "trans-Golgi network" (Fig. 4b). This finding suggests that membrane transport is likely to be an important regulator of anthocyanin accumulation in Pr.
The final flower color is determined by three major anthocyanins: pelargonidin, cyanidin, and delphinidin. Cyanidin contributes to red to magenta colors, and delphinidin contributes to magenta to purple colors 12 . Anthocyanins are widely present in plants and are responsible for the purple coloration of plant stems and leaves. Several studies have reported that sugars (e.g., glucose, sucrose, maltose, and turanose) induce anthocyanin biosynthesis in Arabidopsis hypocotyls and leaves 43,44 . The stems of cultivated D. officinale are rich in mannose and glucose in the mature stage 45 , suggesting that the purple variety of D. officinale is a good model for studying the regulatory mechanisms of secondary metabolite accumulation. Dihydrokaempferol and dihydroquercetin provided substrates for quercetin derivative biosynthesis that were significantly up-regulated in Pr, especially rutin (Fig. 7a). Rutin is a potent antioxidant that inhibits sorbitol, reactive oxygen species, advanced glycation end-product precursors, and inflammatory cytokines 46 . Therefore, the active medicinal components of D. officinale require further exploration for their pharmacological potential.
Auxin is an essential hormone that regulates almost every aspect of plant growth and development. Normally, auxin is transported from synthesis sites in the apices to the distal parts of the plant. Indole-3-acetic acid (IAA) is the primary natural auxin that plays a role in gravity-induced flavonoid synthesis in the root tips 21 . IAA is synthesized via tryptophan (Trp)-dependent and Trp-independent pathways 47 . Our data showed that the Trp content was reduced in Pr (Fig. 8d) and that most of the auxin-related genes were down-regulated in Pr (Fig. 8b). In Arabidopsis tt mutants, the flavonol-overproducing mutants exhibited reduced auxin transport, whereas the no flavonoids-accumulating mutant increased auxin transport [48][49][50] . These data suggest that high flavonoid accumulation is negatively correlated with auxin metabolism in Pr. Brassinolide, the most potent BR, is produced by several cytochrome P450 monooxygenases from campesterol 51 . Similar to animal hormones, acetyl-CoA enters the mevalonic acid pathway to make mevalonate 26 . Some studies have indicated that exogenous BR could stimulate the biosynthesis of flavonoids in leaves 52 . Moreover, brassinolide signaling has been implicated in PIN2 sorting and intracellular distribution to delimit root gravitropism 53 . Our transcriptomic data identified the "response to brassinosteroid" and "lipid and wax metabolism" GO terms, and the endogenous brassinolide contents were increased in Pr compared to CK (Fig. 8c, Figure S6). Therefore, brassinolide stimulated increased flavonoid concentrations in Pr. These results suggest that brassinolide signaling and auxin signaling crosstalk play important roles in the regulatory network of phenylpropanoid metabolism in D. officinale.

Materials and methods
Plant materials and culture conditions. The stem samples were harvested from two-year-old cultivated D. officinale, including the purple variety and normal variety, in August 2019 from a growth chamber of Zhejiang University, Hangzhou, China. The growth conditions were set at 25 ± 2 °C with a light/dark cycle of 12/12 h and a 65-75% relative humidity. Stem tissues were collected from three independent plants for transcriptomic analysis (three biological replicates). The stem tissues were collected from six independent plants for metabolome analysis (six biological replicates). All samples were immediately frozen in liquid nitrogen and stored at − 80 °C.
A high-resolution tandem mass spectrometer TripleTOF 5600plus (SCIEX, UK) was used to detect metabolites eluted from the column 54 . The Q-TOF was operated in both positive and negative ion modes. The curtain gas was set 30 PSI, ion source gas1 was set 60 PSI, ion source gas2 was set 60 PSI, and ion source temperature was 650 °C. For the positive ion mode, the ions pray voltage floating was set at 5000 V. For the negative ion mode, the ions pray voltage floating was set at 4500 V. The mass spectrometry data were acquired in IDA mode. The TOF mass ranged from 60 to 1200 Da. The survey scans were acquired in 150 ms and as many as 12 product ion scans were collected if a threshold of 100 counts per second (counts/s) was exceeded with a 1 + charge state. The total cycle time was fixed to 0.56 s. Four time bins were summed for each scan at a pulser frequency value of 11 kHz by monitoring the 40 GHz multichannel TDC detector with four-anode/channel detection. Dynamic exclusion was performed for 4 s. During the acquisition, the mass accuracy was calibrated for every sample. The online KEGG database was used to annotate the metabolites by matching the exact molecular mass data (m/z) of samples with those from the database. If the mass difference between the observed and the database value was less than 10 ppm, the metabolite would be annotated and the molecular formula of metabolites would further be identified and validated by the isotopic distribution measurements. We also used an in-house fragment spectrum library of metabolites to validate metabolite identification. Student's t-tests were conducted to detect differences in metabolite concentrations between the two phenotypes. The p-value was adjusted for multiple tests using the FDR (Benjamini-Hochberg). Supervised PLS-DA was conducted through metaX to discriminate the different variables between groups 55 . Metabolites with significant differences in content were defined as having a variable importance in the project (VIP) ≥ 1 and a fold change of ≥ 2 or ≤ 0.5.
Flavonoid and anthocyanin measurement. Flavonoids were washed twice with 70% ethanol to remove chlorophyll, and the remaining sediment was suspended in methanol (1:10, w/v) and incubated overnight in the dark at 4 °C to extract the flavonoids. The methanol extract was used to measure the concentration of total flavonoids, which was determined using the AlCl 3 method as described previously 52 . Absorbance at 510 nm for flavonoids was determined, and rutin was used as the standard.
Anthocyanin content of the stem was determined using the protocol described by Mita et al. 56 . The stems (100 mg) were extracted for 1 day at 4 °C in 1 mL of 1% (v/v) hydrochloric acid in methanol. The mixture was centrifuged at 13,000 rpm for 15 min and the absorbance of the supernatant was measured at 530 nm and 650 nm.
Thin layer chromatography of all extracts was performed on TLC precoated silica gel 60 GF254 plate using hexane ethyl acetate formic acid (7:3:0.1) as solvent system. TLC plates were detected under UV light at 254 nm and 366 nm. TLC fingerprints were determined using the protocol outlined by Sithisarn et al. 57 .
Transcriptomic analysis. Stem samples from three independent biological replicates for both CK and Pr were harvested and frozen in liquid nitrogen. Total RNA was extracted using Trizol reagent (Invitrogen, CA, USA) following the manufacturer's procedure 32 . Total RNA quantity and purity were analyzed with Bioanalyzer 2100 and RNA 6000 Nano LabChip Kit (Agilent, CA, USA) with RIN number > 7.0. Approximately 10 µg of total RNA representing a specific adipose type was subjected to isolation of (A) mRNA with poly-T oligo attached magnetic beads (Invitrogen). Following purification, the mRNA was fragmented into small pieces using divalent cations at elevated temperatures. Then, the cleaved RNA fragments were reverse-transcribed to create the final cDNA library in accordance with the protocol for the mRNASeq sample preparation kit (Illumina, San Diego, USA).The average insert size for the paired-end libraries was 300 bp (± 50 bp). Paired-end sequencing was performed on an IlluminaHiseq4000 (Illumina Inc., San Diego, CA, USA) by the Lianchuan Biotechnology Company (Hangzhou, China). The RNA-seq data were been submitted to the BIG Data Center of the Chinese Academy of Sciences (https ://bigd.big.ac.cn) with accession number CRA002691.
Differentially expressed genes analysis. The raw data were filtered using Trimmomatic (https ://www. usade llab.org/cms/?page=trimm omati c) and then were assessed using FastQC (https ://www.bioin forma tics. babra ham.ac.uk/proje cts/fastq c/) to obtain clean reads. Clean reads were aligned to the reference genome of D. catenatum 2 using the HISAT program 58 . Subsequently, the expression levels of unigene were calculated as FPKM (fragments per kilo bases of exons for per million mapped reads) using the software package StringTie 59 . The DEGs were further characterized and estimated using the edgeR package according to the results from StringTie 60 . FDRs ≤ 0.05, log 2 fold-change (FC) > 1 or log 2 FC < − 1 and with statistical significance (p value < 0.05) were used as threshold for determining the DEGs between Pr and CK. The DEGs were also subjected to Gene Ontology (GO) enrichment analysis using GOseq software 61 and KEGG pathway enrichment analysis by KOBAS 2.0 62 .
Real-time quantitative PCR. Total RNA was isolated from different tissues using TransZol reagent (TransGen Biotech, Beijing). RNA extracts were treated with DNaseI (NEB, UK) to eliminate DNA contamination. First-strand cDNA was produced from the RNA template by reverse transcription using the TIANscrip-tRTKit according to the manufacturer's instructions (TransGen Biotech, Beijing). Quantitative real-time PCR