Toward understanding of the methoxylated flavonoid biosynthesis pathway in Dracocephalum kotschyi Boiss

Nowadays, with the development and advancement of next-generation sequencing technologies, a new path has been provided for transcriptomic studies. In this study, the transcriptome of Dracocephalum kotschyi Boiss., as an endemic and endangered plant which is contained a large amount of valuable secondary metabolites with antioxidant and anticancer properties, was sequenced. Then functional annotation and gene ontology analysis for 165,597 assembled transcripts were performed, most were associated with the metabolic pathways. This might be because there are various active biochemical pathways in this plant. Furthermore, after comprehensive transcript annotation, the putative genes involved in the main metabolic pathways of D. kotschyi were identified. Then, the biosynthetic pathway of its valuable methoxylated flavones was proposed. Finally, the accumulations of important methoxylated-flavone metabolites in three different tissues were quantified by HPLC. The relative expression of the genes involved in the proposed pathway was investigated by qRT-PCR, which indicated high expression levels in the bud tissue. The present results may lead to the design strategies to preserve the genetic diversity of endangered D. kotschyi plants and apply the new methods for engineering its valuable methoxylated-flavones pathway.

Functional annotation. In this investigation, 165,597 de novo assembled transcripts of the D. kotschyi were compared with TrEMBL, Swiss-Prot, and Pfam (Protein family) databases to assign putative functions and Gene Ontology browser, KEGG, as well as EggNOG databases for further gene evolutionary histories and functional annotations. Comparison of the D. kotschyi transcripts with TrEMBL database as an extensive protein database using the BLASTx search with the stringency of E-value 1e −10 revealed that 104,606 sequence transcripts had high levels of similarity to the ones from related plant sequence ( Supplementary Fig. 1b). Among the 104,606 annotated transcripts, 46.20% (46,993 transcripts) showed high similarity to Handroanthus impetiginosus, followed by 25.79% (26,236 transcripts) to Erythranthe guttata, 2.16% (2195 transcripts) to Salvia miltiorrhiza. Abundance estimation of each gene and isoform were calculated based on FPKM (fragments per kilobase of transcript per million mapped reads) and TPM (transcripts per million) values (Supplementary excel file 1). The distribution of TPM values for genes and transcripts is shown in Supplementary Fig. 2.
Moreover, all assembled unigenes were aligned with other public sequence databases, including Swiss-Prot, Pfam, KEGG, Gene Ontology browser, and EggNOG. The numbers of annotated assembled transcripts mapping to each database are summarized in Supplementary Table 3. A total of 47,776 transcripts were annotated in the six databases. A Venn diagram displayed shared and unique unigenes of D. kotschyi according to Swissprot, Pfam, EggNOG, GO, and KEGG public databases ( Supplementary Fig. 3a). GO assignment was used to reduce complexity and classify the functions associated with the D. kotschyi unigenes. Of 165,597 unigenes, 78,188 biological process (47.2% of all transcripts), 80,356 molecular function (48.5%) and 79,769 cellular components (48.1%) terms were identified ( Supplementary Fig. 3b). Based on gene ontology analysis, 91,017 transcripts could be assigned to one or more GO terms. Within the cellular component category, the three most enriched terms were "Membrane (GO:0016020)" with 28,763, "Nucleus (GO:0005634)" with 21,922, and "Cytoplasm (GO:0005737)" with 14,320 transcripts. In the biological process domain, the three most common categories were "Cellular process (GO:0009987)" with 62,195, "Metabolic process (GO:0008152)" with 55,070, and "Nucleobase-containing compound metabolic process (GO:0006139)" with 22,967 transcripts. In the molecular function domain, the three most abundant groups were "Binding (GO:0005488)" with 59,971, "Catalytic activity (GO:0003824)" with 48,681, and "Nucleic acid-binding (GO:0003676)" with 20,957 transcripts (Supplementary Fig. 4 and Supplementary excel file 2).
To further investigate the D. kotschyi transcriptome data, we searched the annotated sequences for COG/ NOG classifications according to the popular EggNOG database. This analysis showed that 46,951 (28.35%) transcripts were assigned to 22 COG categories. The cluster for "Posttranslational modification, protein turnover, chaperones" (6886) represented the largest group, followed by "Signal transduction mechanisms" (6142) and "Transcription" (5994). The category of "Nuclear structure" (5) was the smallest group ( Supplementary Fig. 5 and Supplementary Table 4). Also, 32,263 (19.48%) transcripts were poorly characterized in the COGs categories and grouped in the "Function unknown" classification. Complete the summary of this annotation is in Supplementary excel file 3. To find the active biological pathways of the unigenes, we also conducted a search of all transcripts against the KEGG database with the BlastKOALA tool (KEGG Orthology And Links Annotation) (https:// www. kegg. jp/ blast koala/). Of 106,731 sequences, 42,715 had significant KO term matches with known enzymes in the KEGG database and were assigned to 324 KEGG pathways (Supplementary Fig. 6 and Supplementary excel file 4). The most strongly represented biological pathways were "metabolic pathways" (958), "Signal transduction" (754), and "Biosynthesis of secondary metabolites" (519).
Moreover, our analysis of the D. kotschyi transcriptome data to search the whole set of transcription factors (TFs) in PlnTFDB revealed that 6,626 assembled transcripts (6.20%) encode putative TFs. The predicted TFs can be classified into 50 TF families based on their DNA-binding domains (DBDs) according to the criteria of Plant-TFdb (Supplementary excel file 5). Among predicted TF families, the C2H2-type zinc finger family showed the most abundant predicted TF (1385, 20.90%) followed by WD40-like (985, 14.87%), MYB-HB-like (626, 9.45%), bHLH (349, 5.27%) ( Supplementary Fig. 7). Identifying this large set of TFs provides a rich resource for future characterization of specific TFs in various biochemical pathways in D. kotschyi.
Simple sequence repeats (SSRs) analysis. SSR markers provide many advantages over the other marker systems, and their moderate density still serves as the best co-dominant marker system for constructing framework linkage maps 16  www.nature.com/scientificreports/ are abundant, easy to develop and detect, and highly polymorphic 17 . Transcriptome SSRs also exhibit high interspecific transferability [18][19][20] . We counted the frequency of SSRs with different numbers of tandem repeats, and among the 165,597 examined sequences (171,215,531 bp total length), 68,044 SSRs in 46,695 sequences (28.1% of total sequences) were identified. Additionally, 14,862 sequences contained more than 1 SSR locus. Herein, the SSR motifs such as mono-, di-, tri-, tetra-, penta-, hexanucleotides were identified. The mono-37.71% (25,661) and dinucleotide 45.96% (31,273) repeat motifs showed the highest frequency, respectively ( Supplementary  Fig. 8a). The dinucleotide repeat motif AG/CT was the most abundant, followed by A/T, AC/GT, AT/AT, AAG/ CTT, and C/G ( Supplementary Fig. 8b). The most abundant type was SSRs with ten tandem repeats, followed by six tandem repeats and five tandem repeats (Supplementary Fig. 8c and Supplementary excel file 6).
Analysis of candidate genes involved in the biosynthetic pathways of (poly) methoxylated flavones. Until now, no research aiming at the isolation of the genes involved in the methoxylated flavones biosynthesis pathway has been carried out. Therefore, for the first time, we identified the full-length genes and then combined the literature review [13][14][15]21,22 , and our experimental analysis, subsequently re-constructed the whole methoxylated flavones biosynthesis pathway, from phenylalanine to all methoxylated flavones which were detected in this plant 5,6,23 . Based on available information about flavones biosynthesis 22,24 , after the apigenin formation, the various type of flavonoid hydroxylases (FH) and flavonoid O-methyltransferases (FOMTs), leading the biosynthesis pathway to the production of various compounds which may be species-specific (or family). Identification of candidate genes, including FOMTs and FHs, was carried out based on structural similarity and strong protein sequence similarities of putative transcripts against related sequences from other species. In 2019, Mohammadi and co-workers considering transcriptome and metabolic data as well as literature sources could identify the genes involved in the diosgenin biosynthesis pathway, therefore, proposed the most possible pathway of diosgenin biosynthesis 25 . In this study, the same method and procedure were used to identify the genes involved in the biosynthesis pathway of methoxylated flavones. Finally, we investigated and identified different pathways in D. kotschyi and proposed an entire pathway for the biosynthesis of valuable anticancer and antioxidant compounds in this plant ( Fig. 1). In this study, using transcriptome sequencing data of D. kotschyi, sequences of all genes involved in the target biosynthetic pathways, including methoxylated flavones, terpenoids (MEP; methylerythritol phosphate pathway), and rosmarinic acid, were identified based on the high similarity of the putative genes with the related sequences from closely related species (Supplementary Table 5). Herein, fourteen genes including early biosynthetic genes PAL, C4H, 4CL, CHS, CHI and FNS (blue box), five genes encoding FOMTs, i.e., F3OMT, F4ʹOMT, F6OMT, F7OMT, and F8OMT (yellow box), as well as three FHs, i.e., F3H, F3ʹH and F6H (yellow box), in the biosynthesis of methoxylated flavones were identified (Fig. 1). On the other hand, TAT, RAS, and HPPR genes involved in the rosmarinic acid biosynthesis pathway were identified (purple boxes) ( Fig. 1 and Supplementary Table 5). We also identified ten genes functioning in the biosynthesis of plastidic MEP pathway and main monoterpene compound in the essential oil of D. kotschyi, that is, ocimene, including DXS, DXR, MCT, CMK, MCS, HDS, HDR, IDI, GPPS, and OCS, which shown in the green box ( Fig. 1). In the sequel to this study, all FOMTs and FHs, which involved in the production of valuable flavones in this plant, as well as the RAS gene in the rosmarinic acid metabolic pathway, were selected for further analysis ( Supplementary Fig. 9). The in silico analysis revealed that the studied genes possess functional motifs, which are required to activity and identity of these enzymes (Fig. 2). All FOMTs have five conserved motifs in their structure that are highly similar to each homologous gene from related species 26 . These motifs were shown in Fig. 2, and their multiple sequence alignment is presented in Supplementary Fig. 10. Based on PredictNLS and DeepLoc results, the FOMT enzymes were predicted to be located in the cytoplasm 21 , and their methyltransferase activity was approved by GO:0,008,168 term. Also, the phylogenetic tree showed that the FOMTs genes of Zarrin-Giah were closely linked with those in the Lamiaceae family ( Supplementary Fig. 9a). F3H contains five conserved motifs with two functional domains, including ferrous iron-binding site with HxDx n H motif (in His216, Asp218, and His274) and 2-oxoglutarate binding site with RxS motif (in Arg284 and Ser286) ( Fig. 2 and Supplementary Fig. 11). This enzyme is predicted to be localized to cytoplasm 27 with known oxidoreductase activity (GO:0,016,491) ( Supplementary Fig. 9b). F3ʹH belongs to the CYP450 Family and possesses six motifs 28 which contains the Proline reach domain (Hing) in P35PGPRPWP42, Heme binding site in F443GAGRRICAG452, and Oxygen binding site in A307GTDTT312 ( Fig. 2 and Supplementary Fig. 12). This enzyme was predicted to be located in the endoplasmic reticulum 29 , and its monooxygenase activity was confirmed with GO:0,004,497 term ( Supplementary Fig. 9c).
F6H is a member of A-type CYP450s 30,31 and has four common conserved family motifs ( Fig. 2 and Supplementary Fig. 13), GO:0004497 represents the monooxygenase activity of this enzyme, and subcellular localization prediction indicated that this enzyme is associated with the endoplasmic reticulum membrane ( Supplementary  Fig. 9d). The RAS enzyme belongs to the BAHD acyltransferases family and contains two conserved regions that possess the HXXXD motif, which contains the catalytically active histidine and the DFGWG motif that is thought to be responsible for the steric position of the active site 32 ( Fig. 2 and Supplementary Fig. 14). This enzyme is predicted to be located in the cytoplasm, and its acyltransferase activity was approved by GO:0,016,746 term 32,33 . Phylogenetic analysis of the RAS enzyme also revealed that it has close relation with those in the Lamiaceae family ( Supplementary Fig. 9e).

Expression analysis by qRT-PCR.
In this study, we used isoform-specific primers of nine candidate unigenes (F3OMT, F4ʹOMT, F6OMT, F7OMT, F8OMT, F3H, F3ʹH, F6H, and RAS genes) associated with methoxylated flavones and rosmarinic acid biosynthetic pathway in three different tissues of D. kotschyi at flowering stage (leaves, flowers, and buds). The flower tissue showed low expression for all candidate genes, so relative expression for each gene in other tissues was compared with this tissue. Figure 3a and  www.nature.com/scientificreports/ of DkF3OMT, DkRAS, and DkCRS genes in the leaf (10.5, 6.8, and 6.6 respectively) and bud tissues (12.8, 9.8, and 10.4 respectively) was higher than those in the flower tissue. In addition, the expression of DkF3H and DkF3OMT genes in the leaf (1.9 for each gene) and bud tissues (4.7 and 5.1 respectively) was lower than those in the flower tissue. Relative expression analysis of candidate genes in the bud tissue to the leaf tissue showed that DkF7OMT had the higher expression and DkF3OMT had the lower expression in bud tissue (Fig. 3C).
Flavonoids and methoxylated flavones contents. Initially, the correlation coefficients (R 2 ) were calculated for all standards of compounds, which were about 0.99 ( Supplementary Fig. 15). The chromatograms of HPLC peaks relevant to the standard of each metabolite in different tissues are shown in Fig. S16 and Fig. 4A. These results demonstrate the accumulation pattern and RT for each compound at 280 nm. As shown in Fig. 4B, the flavonoids contents varied among the different organs and tissues. In general, all of the measured metabolites had a higher amount in the mixture of flowers and buds and flower tissue. Calycopterin had a significant amount in leaf tissue (15.98 mg/g DW). After that, a high quantity of penduletin (9.30 mg/g DW) was achieved

Discussion
Up to now, only 11 nucleotide sequences can be found by searching NCBI databases (the search was performed on 07.07.2019) from the endemic endangered medicinal plant, D. kotschyi. According to previous studies, this plant is rich in anticancer, antioxidant, and antibacterial components, such as methoxyflavones and rosmarinic acid 5,6 . We found that 105,210 transcripts (63.53%) from D. kotschyi were annotated in at least one database. It should be noted that the transcripts lacking a functional annotation (36.47% of the assembled transcripts), which has occurred among previous studies 34 , may represent orphan genes or long non-coding RNAs (lncRNAs) in D. kotschyi or could be explained by their location in untranslated regions of the genes (UTRs) 35 . According to the top blast hit analysis, the highest similarity to H. impetiginosus, E. guttata, and S. miltiorrhiza was observed. This similarity might be related to the closed genetic relationship of D. kotschyi to this species, considering that all belong to the Lamiaceae family. GO analysis of annotated transcripts revealed high catalytic activity among the GO molecular function terms of the annotated unigenes.
On the other hand, the transferase enzyme family was one of the main groups in this molecular function term. It should be noted that this enzyme family (such as FOMTs) plays an essential role in methoxyflavones biosynthetic pathway 11 . Many TF families are known to regulate secondary metabolite biosynthesis in plants 36 . For example, MBW complex (contains MYB, bHLH, and WD40 families) plays a critical role in the activation of the late biosynthetic genes in the flavonoids biosynthesis 37 according to the results (Supplementary Fig. 7 and Supplementary excel file 5), these three families were the most abundant groups of predicted TFs. The SSR analysis showed that roughly 28% of transcripts contain SSRs and this distribution is similar to other Lamiaceae family members such as D. tanguticum 11 and sweet basil 12 . The availability of this large number of SSRs not only can have a tangible effect in large-scale genotyping studies for various applications in D. kotschyi, such as genetic diversity assessment, evolution, association mapping for important traits, and comparative genomics 38 , but also will assist in molecular breeding of D. kotschyi for developing distinct chemotypes 39 .
The first report of the biosynthetic pathway of penduletin and calycopterin has already been proposed by Fattahi et al. 13 . Herein, a complete methoxylated-flavonoid biosynthesis pathway, along with genetic evidence and functional studies on the genes involved in the pathway, has been presented. However, the methoxylated flavone biosynthesis pathway in D. kotschyi remains poorly understood, and even more, the genes involved in this pathway have not been characterized yet. Therefore, one of the main aims of this research is to study metabolic pathways and biosynthetic genes functioning in flavone formation. In the present work, by combining the literature review [13][14][15][21][22][23] , our experimental data, and bioinformatics data analysis, we isolated the full-length CDS of genes encoding enzymes and constructed a pathway by which methoxylated flavone compounds can be produced (Fig. 1). For each studied gene, functional motifs were detected and then, the phylogenic tree, showing its relatedness with similar genes from other species, was constructed ( Supplementary Figs. 9, 10, 11, 12, 13 and  14). The results could be used to engineer the metabolic flux toward valuable methoxylated flavone biosynthesis to improve the production of them in either the whole plant or the hairy root cultures of Zarin-Giah, which have been considered as a promising alternative platform. To elucidate, methoxylated flavone compounds are synthesized through the phenylpropanoid pathway, which chiefly uses phenylalanine as a common precursor. Early biosynthetic genes PAL, C4H, 4CL, CHS, and CHI convert phenylalanine to naringenin, the central precursor www.nature.com/scientificreports/ of most flavonoids, in a series of steps. Afterward, FNS convert naringin to apgenin herein; it was detected in all studied tissues. Apgenin can promote the biosynthetic pathway to the production of various compounds like luteolin, one-step reaction that is catalyzed by F3ʹH. Our transcriptome sequencing and qRT-PCR data confirmed the presence of putative F3'H gene. Luteolin metabolite was also observed in D. kotschyi metabolic profile 5 (Fig. 1). In the second route, apigenin converts into isokaempferide (Kaempferol 3-methyl ether), which is catalyzed by F3H and F3OMT enzymes. Our findings indicated the sequence of putative F3H and F3OMT genes and the existence of the metabolite in all tissues. In the third path, 7C-OH moiety of ring A of apigenin is methylated with F7OMT and leads to the production of genkwanin as a central intermediate of methoxylated flavones in D. kotschyi. According to previous research, this route is proposed as a favorable route for apigenin derived metabolites 15 . In the next step, the scutellarein-7-methyl ether (SCU7Me) compound is formed by F6H enzyme. However, this metabolite has not been identifying from D. kotschyi; whereas our study confirmed the presence www.nature.com/scientificreports/ of F6H gene by transcriptome and qRT-PCR evidence. The amounts of some intermediates are present at low concentration in metabolic pathways. It may be due to the fact that they are unstable or quickly convert to other stable metabolite 40 . On the other hand, the role of this enzyme as a CYP450 belong to CYP82D family has been evaluated 14 . After the aforementioned step, the biosynthetic pathway reaches a branch point. The cirsimaritin syntase (CRS, F6OMT) converts SCU7Me, which has a free hydroxyl residue at C-6 position, into cirsimaritin (Fig. 1). This enzyme was considered to be a member of F4ʹOMT family due to its high similarity, which differs only in three nucleotides. The analysis of its kinetic properties in sweet basil showed that it could also use ladanein as a substrate; however this compound has not been identified in D. kotschyi 15 . The CRS in sweet basil has more catalytic efficiency (k cat /K m ) for ladanin (2.55) than SCU7Me (1.28). Our transcriptome and HPLC evidence showed the expression of putative CRS gene and the production of cirsimaritin in all studied tissues, respectively. www.nature.com/scientificreports/ After the cirsimaritin formation, which is a committed precursor for main methoxylated flavonoids of Zarin-Giah, there is a branch-point by acting two enzymes; F3OMT, leading to penduletin, and F8OMT, leading to biosynthesis of xanthomicrol and calycopterin. After the hydroxylation of cirsimaritin by F3H, O-methylation can be provided using the F3OMT enzyme and leads to the production of penduletin which this enzyme also was reported in tomato 41 . Penduletin has well-known antiviral properties as could inhibit enterovirus replication 42 . Based on genetic and enzymatic evidence, this reaction occurs by F3H and F3OMT enzymes ( Fig. 1 and Supplementary Table 5). As mentioned above, herein, both F3H and F3OMT genes were identified and qRT-PCR result showed that they had high gene expression in the leaf and bud tissues. Metabolite content assay revealed that penduletin had accumulated in all tissues. Penduletin could be probably converted into another compound called penduletin 4'-methyl ether by F4΄OMT enzyme. However, according to present studies, qRT-PCR result showed high relative expression of this gene (Fig. 3 and Supplementary Table 5), whereas, this compound has not been identified in D. kotschyi, But the production of this substance does not seem unlikely because it was noted that in previous study due to the low amounts of standards and compound quantities, the peaks of penduletin, was quantified as xanthomicrol with a xanthomicrol linear calibration curve 5 . Cirsimaritin could take another route with F8OMT enzyme which leads to formation of xanthomicrol, with known anticancer and antioxidant properties 43 . Although, xanthomicrol formation is occurred by methoxlyation of carbon atomic number 8 of cirsimaritin 19 , no transcripts belong to the F8H enzyme were found in D. kotschyi transcriptome dataset. This may be due to the loss of this transcript during assembly or the presence of a similar and unknown enzyme for this function; and finally, it may be catalyzed by one of the known hydroxylase enzymes which could have several substrates also reported in flavone biosynthesis pathways 44 . With the formation of xanthomicrol, the substrate required for the biosynthesis of calycopterin would be provided. F3H and F3OMT enzymes convert xanthomicrol into calycopterin through methoxylation of C-3 position in xanthomicrol. A high calycopterin level was detected in D. kotschyi leaves 5 . Additionally, in our study, relative expression analysis showed that F3OMT were expressed in all studied tissues, especially highly in leaf tissue (Fig. 3). Moreover, it has been proposed that calycopterin could be converted into the end product called 3'-methoxycalycopterin with five methoxy groups by F3'H and F3'OMT enzymes. Although transcriptome sequencing and qRT-PCR data confirmed the presence of two putative F3'H and F3'OMT genes, but 3'-methoxycalycopterin has not been detected in D. kotschyi, which may be due to the lack of a standard for detection and measurement. It is worth to note that we could consider the conversion of SCU7Me into cirsimaritin as a first-committed step in producing and accumulating methoxylated flavonoids in Zarin-Giah. In addition, a very small amount of cirsimaritin compared with calycopterin and penduletin as the final products of methoxylated flavonoids pathway, indicates that this compound is more likely to be an intermediate compound for the final product biosynthesis of that pathway (Figs. 1  and 4B). Regarding the structures of penduletin and xanthomicrol, if methoxy group is eliminated from C-8 position in xanthomicrol or C-3 position in penduletin, it could be expected that these two metabolites would be directly converted into each other (Fig. 1). However, O-demethylase enzyme which could remove methyl (CH3-) groups from methoxylated flavones, has not been characterized, yet (Fig. 2). Anyway, this part of the pathway requires further research in the future. High expression values of studied genes in D. kotschyi at flowering stage, is consistent with expression analysis of genes involved in the biosynthesis of flavonoids in Chrysanthemum morifolium 45 and Epimedium sagittatum 44 , which can indicate the essential role of those genes at the beginning of the flowering stage and flower development in D. kotschyi.

Conclusion
De novo assembly of transcriptome data for the organisms without reference genomes has provided a new path in studying and recognizing these species. D. kotschyi, as a rare and endangered plant, has a rich set of secondary metabolites with antioxidant and anticancer properties. Herein, the biosynthesis pathway of valuable methoxylated flavones in this plant was suggested using our experimental analysis and literature review. Furthermore, the relative expression of putative genes involved in methoxylated flavones biosynthesis in three different tissues indicated higher expression of related genes in the bud tissue. Finally, based on HPLC results, significant amounts of antioxidant and anticancer compounds were found in the flower and leaf tissues. De novo assembly. Quality assessment of the reads was performed using FASTQC (Version 0.11.5) (https:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ fastqc/), and to obtain high-quality clean reads for de novo assembly, raw reads were pre-processed using Trimmomatic (version 0.35) (http:// www. usade llab. org/ cms/? page= trimm omatic) to remove the adapter sequences, and low quality reads with ambiguous base (N). All the downstream analyses were based on the resulting clean reads. The read abundance was normalized to 50X coverage using the Perl script insilico_read_normalization from the Trinity package to improve assembly time and minimize memory requirements 46 . We evaluated four de Bruijn graph-based assemblers including Trinity (v2.4.0) (https:// github. com/ trini tyrna seq/ trini tyrna se), SOAPdenovo-Trans (v1.03) (https:// github. com/ aquas kyline/ SOAPd enovo-Trans), Oases-Velvet (v0.2.08) (https:// github. com/ dzerb ino/ oases) and Trans-ABySS (v1.5.5) (https:// github. com/ bcgsc/ trans abyss) with two different k-mer consisted of 25 and 32 on the transcriptome data. The resultS of the de novo assemblers were compared with each other according to N50, assembled transcript number, mapped reads, and alignment rate criteria. After carefully considering these points, Trinity software was used to perform de novo assembly. The workflow of bioinformatics analysis is schematically represented in Supplementary Fig. 18. Identification of the genes involved in the main metabolic pathways in D. kotschyi, including methoxy-flavone biosynthesis pathway, rosmarinic acid, and 2C-methyl erythritol 4-phosphate (MEP), was carried out using all candidate transcripts which well annotated by the databases mentioned above and literature review. The annotated assembled sequence was queried in protein BLAST to reach proteins sharing similarities in sequences. Afterward, multiple alignment of similar genes obtained from other species was performed by the online program Clustal Omega (https:// www. ebi. ac. uk/ Tools/ msa/ clust alo/), and conserved motifs and/or functional domains were characterized, then displayed by WebLogo tool (https:// weblo go. berke ley. edu/ logo. cgi). Bootstrap analysis was applied based on 1000 replicates to evaluate the statistical significance of the phylogenetic tree nodes. Finally, the phylogenetic tree was constructed by a Neighbor-joining algorithm using MEGA-X (https:// www. megas oftwa   48 , and the graphs were generated using GraphPad Prism Soſtware (v8.0; GraphPad, SD, USA).

Functional annotation analysis.
HPLC analysis. 500 mg of freeze-dried plant tissues were homogenized in 15 ml methanol 80%. In the next step, the samples were subjected to ultrasonicator (Euronda Company -Italy) for around 90 min at 50 °C. After, samples were incubated at temperature of 50 °C and stirred with a magnetic stirrer hot plate for 2 h. The mixture was filtered through fluted filter paper to remove solid particles and then evaporated to dryness in vacuo at 30 °C. The residues were dissolved in 5 ml of 80% methanol with HPLC grade. Finally, high-performance liquid chromatography (HPLC) was applied to extract evaluation following the filtration (Millex-GP syringe filter unit, 0.22 μm) 8,50 . www.nature.com/scientificreports/ In this study, important metabolites including apigenin, isokaempferid, cirsimaritin, penduletin, calycopterin, and rosmarinic acid were measured. An LC 1150 series HPLC system (Agilent technologies, GBC, USA) controlled by Chemstation software, equipped with a 20 μl injection loop, was used for separation and determination of biochemical compounds in D. kotschyi tissues, including leaf, flower, bud, and a mixture of flowers and buds. A Shim-pack VP-ODS 25 cm × 4.6 mm I.D. (Shimadzu Kyoto, Japan) packed with five μm particles as the analytical column for reverse-phase liquid chromatographic analysis was used. For the separation of phenolic acids, the Nourozi et al. 8 method was used to achieve more optimal results and higher resolution peaks with slight modifications. Thus, an isocratic mixture consisting of 40% acetonitrile (solvent A) and 60% acetic acid (2.5% V/V in water) (solvent B) was utilized during 40 min 8,49 . HPLC Chromatograms were detected using a photodiode array detector at 280 nm wavelength. Each compound was identified by its retention time and by spiking with standards under the same conditions. The quantification for the samples was conducted by measuring the integrated peak area and the content, which was calculated using the calibration curve by plotting the peak area against the different concentrations of the respective standard compounds (Supplementary Fig. 15). All standard compounds were supplied from Sigma-Aldrich-Germany company, including Rosmarinic acid; CAS Number: 20283-92-5, Apigenin; CAS Number: 520-36-5, Isokaempferid; CAS Number: 1592-70-7, Cirsimaritin, CAS Number: 6601-62-3, Penduletin; CAS Number: 569-80-2.

Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.