Biosyntheses characterization of alkaloids and flavonoids in Sophora flavescens by combining metabolome and transcriptome

Sophora flavescens are widely used for their pharmacological effects. As its main pharmacological components, alkaloids and flavonoids are distributed in the root tissues wherein molecular mechanisms remain elusive. In this study, metabolite profiles are analyzed using metabolomes to obtain biomarkers detected in different root tissues. These biomarkers include alkaloids, phenylpropanoids, and flavonoids. The high-performance liquid chromatography analysis results indicate the differences in principal component contents. Oxymatrine, sophoridine, and matrine contents are the highest in the phloem, whereas trifolirhizin, maackiain, and kushenol I contents are the highest in the xylem. The transcript expression profiles also show tissue specificity in the roots. A total of 52 and 39 transcripts involved in alkaloid and flavonoid syntheses are found, respectively. Among them, the expression levels of LYSA1, LYSA2, AO2, AO6, PMT1, PMT17, PMT34, and PMT35 transcripts are highly and positively correlated with alkaloids contents. The expression levels of 4CL1, 4CL3, 4CL12, CHI5, CHI7, and CHI9 transcripts are markedly and positively correlated with flavonoids contents. Moreover, the quantitative profiles of alkaloids and flavonoids are provided, and the pivotal genes regulating their distribution in S. flavescens are determined. These results contribute to the existing data for the genetic improvement and target breeding of S. flavescens.


Results
Metabolomic profiles in the root tissues of S. flavescens. The chemical components in three root tissues of S. flavescens were determined using the ultra-high-performance liquid chromatography-mass spectrometry (UPLC-MS) analysis. In positive ion mode, 13,184 components were detected, and 589 components were identified (Supplemental file 2: Dataset S1). The principal component analysis (PCA) results revealed a clear separation between the three root tissues of S. flavescens in positive ion mode (Fig. 1a). A total of 387 potential biomarkers were detected in positive ion mode through one-way analysis of variance (ANOVA; false discovery rate [FDR] ≤ 0.05; Fig. 1b and Supplemental file 2: Dataset S1). M725T169 ( Fig. 1c and Supplemental file 2: Dataset S1).
In negative ion mode, 11,101 components were detected, among which 297 components were identified (Supplemental file 2: Dataset S2). The PCA results also revealed a clear separation between the three root tissues of S. flavescens (Fig. 1d). A total of 257 potential biomarkers were detected in negative ion mode through ANOVA analysis (FDR ≤ 0.05; Fig. 1e    www.nature.com/scientificreports/ kurarinone, and sophoraflavanone G) were detected through HPLC (Fig. 2). The contents of the three alkaloids were the highest in the phloem (23.93, 1.88, and 1.83 mg/g, respectively), followed by the xylem (17.73, 0.56, and 1.40 mg/g, respectively) and periderm (11.97, 1.04, and 0.21, respectively; Fig. 2a). The contents of trifolirhizin, maackiain, and kushenol I were the highest in the xylem (0.21, 0.49, and 1.31 mg/g, respectively). Finally, kurarinone and sophoraflavanone G contents were the highest in the periderm (0.01 and 0.01 mg/g, respectively; Fig. 2b). These results showed that alkaloids and flavonoids existed distribution differences in the root tissues of S. flavescens.
Transcriptome profiles in the root tissues of S. flavescens. Illumina Hiseq paired-end sequencing technology was used to analyze the transcriptome in the root tissues of S. flavescens (Table 1). The average raw reads was 7.61 G, and the average clean reads was 6.81 G after filtered by fqtrim software. The percentages of Q20, Q30, and GC were 98.35%, 94.89%, and 45.83%, respectively. All 296,523,264 high-quality 150 bp clean reads were used for de novo assembly, and in total 35,012 contigs > 500 bp in length were obtained (Table 2). Then, a total of 58,327 genes were assembled, with an N50 contig size of 1,237. Annotation was also performed on the basis of the sequence similarity searches with a cutoff E-value of 10 −5 against public databases, including the GO, Kyoto Encyclopedia of Genes and Genomes (KEGG), Pfam, SwissProt, eggNOG, and Nr databases, to investigate the function of assembled unigenes (     (Fig. 3). The PCA results showed that the three root tissues existed slightly difference (Fig. 3a). The Venn results showed that 5129 unigenes were shared and expressed among the three tissues. A total of 3454, 3666, and 4117 unigenes were explicitly defined in the periderm, phloem, and xylem, respectively (Fig. 3b). With the comparison between the periderm and the phloem and xylem, 82 (Up: Down, 63: 19) and 654 (Up: Down, 454: 200) DEGs were found, respectively (Supplemental file 1: Figure S2). With the comparison between the phloem and the xylem, 186 (Up: Down, 45: 141) DEGs were found. These results showed that the unigenes in the root tissues of S. flavescens had different expression levels.

Co-expression analysis of the transcripts and active components in S. flavescens. In S. fla-
vescens, all transcripts were grouped into 23 unique modules, and two modules, namely, MEmagenta and MElightyellow, were positively correlated with active components (P < 0.05; Fig. 4a). MEmagenta was significantly and positively correlated with the contents of trifolirhizin (R = 0.73) and maackiain (R = 0.71; Fig. 4b).
MElightyellow was positively correlated with the contents of kurarinone (R = 0.88).   In the flavonoid synthesis pathway, a total of 3, 3, 3, 4, and 4 transcripts were highly and positively correlated with the contents of trifolirhizin, maackiain, kushenol I, kurarinone, sophoraflavanone G, respectively (R > 0.8, P < 0.05; Fig. 6b). The expression levels of two 4CL (4CL1 and 4CL12) and one 2′OH (2′OH2) transcripts were markedly and positively correlated with trifolirhizin and maackiain contents. The expression levels of two 4CL (4CL1 and 4CL13) and one 2′OH (2′OH1) transcripts were highly correlated with kushenol I contents. The expression levels of CHI5 and CHR1 transcripts were highly correlated with kurarinone and sophoraflavanone G contents.

Discussion
In this study, the distribution of alkaloids and flavonoids demonstrated tissue specificity in S. flavescens roots. Transcript expression profiles also existed tissue specificity in the roots. The weighted gene co-expression network analysis (WGCNA) results confirmed that the pivotal transcripts regulated the distribution of alkaloids and flavonoids in the root tissues. This study will provide useful information for investigating the genetic and biochemical mechanisms of alkaloid and flavonoid syntheses.  18 , hypoglycemic and hypolipidemic effects (oxymatrine) 19 , human colorectal cancer preventions (sophoridine) 20 , anti-proliferation (trifolirhizin) 21 , and inflammasomeactivating effect (maackiain) 22 . Thus, the contents of alkaloids and flavonoids in the roots of S. flavescens were further quantitatively analyzed. Quantitative analysis results revealed that the contents of the total alkaloids and three alkaloids (oxymatrine, sophoridine, matrine) were higher in the phloem than those in the periderm and www.nature.com/scientificreports/ xylem. This finding was consistent with the reports of previous work 23 . The contents of trifolirhizin, maackiain, and kushenol I were the highest in the xylem, and the contents of kurarinone and sophoraflavanone G were the highest in the periderm. Four phenolic acid compounds (benzoic acid, caffeic acid, ferulic acid, and chlorogenic acid) and four flavonol compounds (kaempferol, catechin hydrate, epicatechin, and rutin) were higher in the aerial parts than the roots 13 . This uneven accumulation pattern of secondary metabolites may affect the rational use of medicinal plants. Understanding the molecular biological mechanism of the active components is of great significance. The transcript expression in S. flavescens was tissue-specific. A total of 52 upstream transcripts and 137 downstream CYP transcripts involved in alkaloid synthesis were identified (FPKM ≥ 5), among which 59.62% and 51.82% were expressed at the highest levels in the xylem. In a previous study, the preferential expression of the gene for putative lysine/ornithine decarboxylase committed in the initial step of matrine biosynthesis was the highest in the leaf and stem 24 . The above finding evidentially indicated that the different expressions of these genes resulted in the uneven distribution of alkaloids. In the flavonoid compound biosynthesis pathway, 26 (66.67%) transcripts were highly expressed in the xylem. In a previous study, 41 transcripts were investigated and showed distinct expression profiles in different parts of S. flavescens 13 . The transcripts related to alkaloid and flavonoid biosynthesis in S. flavescens demonstrated organ-specific expression patterns, implying that they might have different physiological processes for biosynthesis, depending on the organ.
The correlation analysis results further showed that 28 and 12 transcripts were positively correlated with the contents of alkaloids and flavonoids, respectively (R > 0.8, P < 0.05). In the alkaloid biosynthetic pathway, the expression levels of LYSA1, AO6, and PMT transcripts were highly and positively correlated with the contents of alkaloids. In a previous study, seven enzyme genes involved in the alkaloid biosynthesis in S. flavescens were identified 25 . In the current study, three 4CL, three CHI, two 2′OH, and four CHR were highly and positively correlated with flavonoids in the flavonoid biosynthetic pathway. Phenylalanine ammonia-lyase, C4H, and 4CL were the three enzymes to form the substrate of the flavonoid compound p-coumaroyl-CoA 26 . Then, CHS catalyzed the formation of chalcone, and CHI catalyzed the chalcone formation of naringenin, a major metabolite in the synthesis of various flavonoids 27,28 . A previous study identified 13 enzyme genes involved in the flavonoid biosynthesis in S. flavescens 29 . In the current study, useful data for investigating the molecular and chemical information of the distribution of alkaloids and flavonoids in S. flavescens are provided.

Materials and methods
All experimental research and field studies on plants, including the collection of plant material in this study, had complied with relevant institutional, national, and international guidelines and legislation.

Plant materials.
Three-year-old roots of S. flavescens were collected from Wenshan in Yunnan Province at their flowering stage. S. flavescens was cultivated with the standard operating procedures established by the Good Agriculture Practices 30 . All roots were carefully washed and separated into three different parts: the periderm, phloem, and xylem (Supplemental file 1: Figure S1). The samples were divided into two parts for metabolite and transcriptome analyses.
Metabolite analysis. All of the samples were dried and crushed, and 0.1 g of the powdered sample was weighed and mixed with 1.0 mL of pure methanol under vortex for 1 min and incubated at room temperature for 10 min 11 . The mixture was stored overnight at − 20 °C and centrifuged at 4000 g for 20 min. The upper layer was collected, filtered through a 0.22 µm filter, and transferred to a sample vial. The vial was injected into a column for UPLC-QTOF-MS analysis.
A high-resolution tandem mass spectrometer TripleTOF5600plus (SCIEX, UK) was used to detect metabolites. The Q-TOF was operated in the positive and negative ion modes. The curtain gas was set to 30 PSI, the ion source gas1 was set to 60 PSI, the ion source gas2 was set to 60 PSI, and an interface heater temperature was set at 650 °C. Multivariate data analysis was performed using MetaboAnalyst 4.0 software (http:// www. metab oanal yst. ca/) 11 . The PCA was performed to analyze the distribution of samples. One-way ANOVA was used to detect the difference of variance, and variance with FDR ≤ 0.05 was deemed as potential biomarkers. Variable VIP was used to evaluate the variable contribution.
The sample extracts were also used for alkaloid and flavonoid quantitative analyses. An Agilent HPLC-UV 1260 series system (Agilent, USA) equipped with a quaternary pump, automatic sampler, column compartment. A VWD was also employed. A 4.6 mm × 250 mm C 18 reversed-phase column (with an inner diameter of 5 µm; Eclipse XDB, Agilent, USA) was used for separation, and the sample injection volume was set as 10 µL. www.nature.com/scientificreports/ The conditions for alkaloids were set as follows: column temperature of 30 °C, a flow rate of 1.0 mL/min, and a wavelength of 220 nm 31 . The gradient was composed of 80% acetonitrile (A), 10% ethanol (B), and 10% water (C). The conditions for flavonoids were set as follows: column temperature of 35 °C, a flow rate of 1.0 mL/min, and a wavelength of 295 nm 32 . The gradient was composed of acetonitrile (A) and water (B), and the linear gradient was set as follows: 0-25 min for 19%-50% A, 25-30 min for 50%-70% A, 30-40 min for 70% A, and 40-50 min for 70%-40% A.
RNA extraction and illumina sequencing. The total RNA was isolated from different tissues in accordance with the instructions indicated in a plant RNA isolation kit (BioTeke, Beijing, China). The quality of RNA was evaluated on 1% agarose gel, and RNA concentrations were determined with a Nanodrop 2000 spectrophotometer (Thermo Technologies). cDNA library construction and sequencing were performed in accordance with the standards of progress. First, mRNA was enriched from the total RNA by oligo (dT) magnetic beads and broken into short fragments 33 . Then, a random hexamer and RNA fragments were used to prime cDNA synthesis.
After purification and connection with adapters, the cDNA library was constructed through PCR amplification. The length of an insert sequence was verified with an Agilent 2100 bioanalyzer system (Agilent Technologies, Santa Clara, CA, USA), and the library was quantified by an ABI Step One Plus real-time PCR system (Applied Biosystems, America). Finally, the qualified cDNA library was sequenced with an Illumina HiSeqTM 2000 system (Illumina Technologies).
Transcriptome analysis. All raw reads were subjected to the cutadapt (v1.9) and fqtrim (v0.94) software following quality control to produce clean reads: (1) raw reads including adapter aequences and empty adapter were discarded; (2) reads including unknown N bases comprising more than 5% of the total length were filtered; (3) reads including low-quality bases that comprise more than 20% of the total length were discarded 33 . Then, The indicaters of Q20% (sequencing error rate less than 0.01), Q30 (sequencing error rate less than 0.001), and GC% were calulated to evaluate the quality of clean reads. All the 150 bp pair-end RNA-Seq reads were submitted to NCBI (Accession number: PRJNA661972). De novo assembly was performed in Trinity (v2.4.0) software using 150 bp pair-end reads with default parameters 34 . One assembly was performed using nine sequencing reads, and 58,327 transcripts were obtained. These resultant transcripts were searched against the NCBI nonredundant nucleotide (Nt) database, NCBI nonredundant protein (Nr), and SwissProt protein for functional annotation by using the BLAST algorithm with an E-value cutoff of 1e −535 . The functional categories of these unique sequences were further analyzed using the above databases and the KEGG database in BLAST and Blast2GO programs as previously reported in the literature [36][37][38][39] .
The clean reads were mapped to the reference by using Bowtie 2 (v2.2.6) to estimate the expression profiles of the transcripts 40 . The expression levels were calculated with the FPKM by using RSEM software (v1.3.1), and the bowtie parameter was set at mismatch 2 41 . The identification of DEGs was performed using the following criteria: fold change (FC) ≥ 2 and FDR ≤ 0.05. The candidate transcripts involved in the alkaloid and flavonoid biosyntheses were selected in accordance with previous reports and databases with FPKM values of the transcripts converted to log 10 values (FPKM ≥ 5). They were visualized in a heatmap with pheatmap package (v1.0.12) in R to identify the different expression profiles among the three tissues 42 .
Co-expression analysis. The WGCNA was used to analyze the relationships between transcript expressions and component contents with the R package (v3.2.5) 43,44 . The R package along with its source code and additional material are freely available at https:// horva th. genet ics. ucla. edu/ html/ Coexp ressi onNet work/ Rpack ages/ WGCNA/. The network construction and module detection method with default settings were used, including an unsigned topological overlap matrix. All parameters were set as defined: "soft_power = 22, TOM-Type = 'unsigned' , minModuleSize = 30, reassignThreshold = 0, and mergeCutHeight = 0.25". The P-value of 0.05 was set as the threshold for a significant correlation.
The candidate transcripts involved in the alkaloid and flavonoid biosyntheses were further selected in accordance with the annotation information to analyze the relationship of transcript expression with alkaloid and flavonoid contents. Pearson correlation coefficient of alkaloid and flavonoid contents with FPKM of transcripts were normalized and then calculated using SPSS (v17.0) software. Pearson correlation bubble chart was constructed with the R package (v3.5.0) to identify pivotal transcripts related to the contents of alkaloids and flavonoids 42 .

Conclusion
To sum up, the alkaloids and flavonoids showed tissue specificity in S. flavescens roots. Gene expression profiles also showed tissue specificity. The metabolomes and transcriptomes systematically confirmed the pivotal transcripts regulating the distribution of alkaloids and flavonoids. This study elucidated the mechanism of alkaloids and flavonoids synthesis, accumulation, and transportation, which provide the basis for improving the production of alkaloids and flavonoids through genetic engineering. In addition, these genetic resources could provide comprehensive information on gene discovery, transcriptional regualtion, and variety selection for S. flavescens.

Code availability
All of the transcriptome sequences were submitted to NCBI (Accession number: PRJNA661972).