Comparative transcriptome analysis to identify candidate genes involved in 2-methoxy-1,4-naphthoquinone (MNQ) biosynthesis in Impatiens balsamina L.

Impatiens balsamina L. is a tropical ornamental and traditional medicinal herb rich in natural compounds, especially 2-methoxy-1,4-naphthoquinone (MNQ) which is a bioactive compound with tested anticancer activities. Characterization of key genes involved in the shikimate and 1,4-dihydroxy-2-naphthoate (DHNA) pathways responsible for MNQ biosynthesis and their expression profiles in I. balsamina will facilitate adoption of genetic/metabolic engineering or synthetic biology approaches to further increase production for pre-commercialization. In this study, HPLC analysis showed that MNQ was present in significantly higher quantities in the capsule pericarps throughout three developmental stages (early-, mature- and postbreaker stages) whilst its immediate precursor, 2-hydroxy-1,4-naphthoquinone (lawsone) was mainly detected in mature leaves. Transcriptomes of I. balsamina derived from leaf, flower, and three capsule developmental stages were generated, totalling 59.643 Gb of raw reads that were assembled into 94,659 unigenes (595,828 transcripts). A total of 73.96% of unigenes were functionally annotated against seven public databases and 50,786 differentially expressed genes (DEGs) were identified. Expression profiles of 20 selected genes from four major secondary metabolism pathways were studied and validated using qRT-PCR method. Majority of the DHNA pathway genes were found to be significantly upregulated in early stage capsule compared to flower and leaf, suggesting tissue-specific synthesis of MNQ. Correlation analysis identified 11 candidate unigenes related to three enzymes (NADH-quinone oxidoreductase, UDP-glycosyltransferases and S-adenosylmethionine-dependent O-methyltransferase) important in the final steps of MNQ biosynthesis based on genes expression profiles consistent with MNQ content. This study provides the first molecular insight into the dynamics of MNQ biosynthesis and accumulation across different tissues of I. balsamina and serves as a valuable resource to facilitate further manipulation to increase production of MNQ.

Previous genetic and biochemical studies had found that the 1,4-naphthalenoid ring was derived from shikimate 38,39 and O-succinylbenzoate (OSB) 40 , thus implicating the shikimate-and OSB pathways, also known as the 1,4-dihydroxy-2-naphthoate (DHNA) pathway in the production of MNQ 32 . The shikimate pathway consists of six core enzymatic reactions resulting in the synthesis of chorismate, which is the starting compound for the subsequent seven reactions in the DHNA pathway. The catalytic activity of a trifunctional enzyme, PHYLLO converts chorismate to OSB, where it is sequentially catalysed to form OSB-CoA, then 1,4-dihydroxy-2-naphthoate-CoA (DHNA-CoA) and finally hydrolysed into DHNA by the enzymes acyl-activating enzyme 14 (AAE14), naphthoate synthase and DHNA thioesterase (DHNAT), respectively 41 . DHNA is a key precursor used in the biosynthesis of phylloquinone (2-methyl-3-phytyl-1,4-naphtho-quinone or vitamin K 1 ) in plants, in addition to other specialized 1,4-naphthoquinones such as lawsone 39,42 , juglone 36 , anthraquinones 43 , and lapachol 34 . In I. balsamina, only phylloquinone and lawsone are directly derived from DHNA ( Fig. 1), with lawsone being the precursor of MNQ 32 . Three enzymatic reactions are required to convert DHNA into phylloquinone and this pathway has been fully characterized due to the latter's importance as an electron carrier in photosystem I (PSI) during photosynthesis 44 . However, the enzymes for specialized 1,4-naphthoquinones biosynthesis downstream of DHNA have not been identified, including that for MNQ biosynthesis. What is currently reported is that lawsone is formed via oxidative decarboxylation of DHNA by an unknown enzyme, and an enzyme with S-adenosylmethionine-dependent O-methyltransferase activity was proposed to convert lawsone to MNQ 32,45 . In terms of transport and storage stability, the functions of an oxidoreductase to reduce lawsone followed by a glycosyltransferase to produce a glucosylated form of reduced lawsone (1,2,4-trihydroxynaphthalene-1-Oglucoside, THNG) were also postulated, as THNG had been isolated in I. glandulifera, and most probably in I. parviflora and I. balsamina 46 . Currently, none of the genes involved in MNQ biosynthesis pathways has been characterized for I. balsamina, although many studies exist on its bioactivities, total content, and different extraction and purification methods.
In this study, quantification of MNQ and lawsone in different tissues of I. balsamina were performed using High-Performance Liquid Chromatography (HPLC) analysis; and the transcriptomes of leaf, flower, and capsules in three stages of development (early-, mature-and postbreaker-stages) of I. balsamina were generated using Illumina HiSeq4000 paired-end sequencing technology and analysed. HPLC results ascertained that comparatively higher amounts of MNQ are distributed in pericarps of I. balsamina, in contrast to lawsone which was mainly present in leaves. Key findings from comparative analysis of the transcriptomes include successful characterization of all the genes of the shikimate and DHNA pathways for I. balsamina; and correlation analysis of differential gene expression patterns and spatial distribution of MNQ suggests de novo synthesis of MNQ in the capsules of I. balsamina, and allowed identification of 11 candidate unigenes encoding three enzyme classes proposed to be involved in the final steps of MNQ biosynthesis in I. balsamina. Overall, the transcriptomes and results obtained from this study provide a basis for the further analysis of the biosynthetic pathways and serve as a resource for further research towards increased production of natural MNQ.

Material and methods
plant material. Cultivated plants of the pink, multi-petal form of I. balsamina were obtained from a local nursery (Kajang, Selangor, Malaysia) and then continually seed propagated in a home garden setting (externally in an open condition). For HPLC quantification, I. balsamina plants were grown from 1st of July to 8th of September 2017, in 1-L growth bags using a mix of black garden soil and clay (2:1). For this period, the average high and low temperatures recorded were 32 °C and 23 °C respectively, photoperiod of 12:12 light: dark cycle,    www.nature.com/scientificreports/ For total RNA extraction, five tissue types were harvested from 10 weeks old plants according to the criteria stated above. Mature leaf, flower, and capsules at three developmental stages (early-, mature-and postbreaker stage were collected on 9th September 2017 and immediately placed into RNAlater solution (Ambion, Austin, TX, USA) prior to total RNA extraction.

HPLC quantification of MNQ and lawsone content.
Freshly collected plant tissues were dried separately using silica gel at room temperature, ground to fine powder, and stored in falcon tubes in the dark at 25 °C prior to extraction. Solvent extraction was performed for each sample using ethyl acetate (1:100 ratio; 1 g 100 mL −1 ) for 7 days (solvent was replaced every 3 days) at 25 °C under continuous shaking at 120 rpm. The extracts were then filtered, solvent evaporated using a rotary evaporator (30 °C, 90 hPa, 120 rpm), re-dissolved with 6 mL ethyl acetate and left till the solvent evaporated to dryness in the dark. Dried residues were reconstituted with methanol, adjusted to concentrations of 1000-2500 ppm, and filtered through a 0.45 µm membrane filter. HPLC analysis was carried out on a Shimadzu 20A series HPLC system (Shimadzu, Kyoto, Japan) with a Brownlee Analytical C18 column at 25 °C. Each run was set at 20 min with gradient elution as follows: 15  Standards of lawsone and MNQ (100 ppm) were also used to spike samples during HPLC analysis for peak validation. Standard curves were constructed from the analysis of the reference standards (five different concentrations, minimum of three replicates, two separate HPLC runs) and plotting peak area against the concentration of each reference standard. The regression equation and coefficient of determination (R 2 ) were calculated, and linearity was expressed in terms of correlation coefficient (r). Quantification of compounds from different samples was done by comparing sample peak areas against the standard curves. All statistical analysis was performed using SPSS version 23 (SPSS Incorporation, Chicago, IL, USA) and the data were subjected to one-way analysis of variance (ANOVA) to determine differences between groups. Tukey's post hoc test or Games-Howell (assumption of variance not assumed) test was performed for inter-group comparison and ρ-value ≤ 0.05 was considered significant.
Total RNA extraction, cDNA library construction, and transcriptome sequencing. Total RNAs were extracted following an optimized protocol described by 47  transcriptome data processing and de novo assembly. Raw data were processed to eliminate lowquality reads, Illumina adapter sequences, and reads with high content of unknown bases (N). Resulting clean reads after filtering were de novo assembled using Trinity program 48 , and TGICL 49 was used to cluster transcripts, eliminate redundancy and obtain unigenes. TransDecoder software 50 was used to predict coding regions (open reading frames, ORF) of the unigenes (default parameters, minimum of 100 amino acid sequence). The longest ORFs were then subjected to BLAST analysis against SwissProt and Hmmscan databases to obtain Pfam protein homology sequence for the prediction of coding DNA sequences (CDS). www.nature.com/scientificreports/ primary ontology classes of molecular function, cellular component and biological process. The DEGs identified were also subjected to KEGG pathway enrichment analyses. The GO and KEGG pathway terms were considered significantly enriched with a corrected Ρ-value ≤ 0.05.

1,4-Dihydroxy-2-naphthoate (DHNA) biosynthesis pathway gene expression analysis.
Hierarchical clustering analysis was performed based on the log-transformed FPKM values using R studio 58 with hclust function to analyse DEGs identified between different tissue groups related to annotated genes involved in the 1,4-dihydroxy-2-naphthoate (DHNA) biosynthesis pathway in I. balsamina as well as candidate genes postulated to function in the last steps of MNQ biosynthesis (downstream of the DHNA intermediate).
Correlation analysis of the candidate genes was conducted using nonparametric Spearman R method with the default two-tailed ρ-value and 95% confident interval.

Validation of DEGs with quantitative real-time PCR (qRT-PCR) analysis.
To verify expression data shown by the transcriptomes, qRT-PCR was performed on 20 selected genes from the terpenoids backbone-(mevalonate, MVA and 2-C-methyl-d-erythritol 4-phosphate, MEP), shikimate-and DHNA pathways. The I. balsamina total RNA samples used in the qRT-PCR assays were the same batch as those used for the transcriptome sequencing. Primers for each of the gene were designed using Primer 3 tool (https ://prime r3.ut.ee/) following the criteria of GC% of 45-55% and melting temperature of 55-60 °C (Supplementary Table S1). First strand cDNA synthesis was performed using Tetro cDNA Synthesis Kit (Bioline, London, UK) with oligo (dT) 18 primer according to the manufacturer's instructions. Sample cDNAs were diluted to a final concentration of 100 ng/µL. qRT-PCR was performed in Eppendorf RealTime PCR Cap Strips (Eppendorf, Hamburg, Germany) using SensiFAST SYBR No-Rox Kit (Bioline, London, UK). qRT-PCR reactions were performed in triplicates for each gene and tissue part, in a total 20 µL reaction containing 300 ng template cDNA, 1X SensiFAST No-Rox mix, 400 nM forward and reverse primers, and adequate nuclease-and RNase-free water using a MasterCycler EP Gradient Thermal Cycler (Eppendorf, Hamburg, Germany). Cycling conditions involved an initial denaturation of 95 °C/2 min, followed by 40 cycles of 95 °C/5 s, primer-specific annealing temperature at 60 °C/10 s and extension at 72 °C/10 s. The melting curve for each amplicon was performed from 60° to 95 °C to verify primer specificity. Aldolase, elongation factor 1-alpha (EF1a) and ubiquitin-conjugated enzyme (UCE) genes served as internal reference genes and were used to normalise the gene expression data. Relative expression level of target genes was calculated using the 2 −∆∆CT method 59,60 , with the following formula: A linear regression model was used to correlate the log-transformed relative quantification value of the genes from qRT-PCR results with the respective log-transformed relative gene expression values in the transcriptome data.

Results
Lawsone and MNQ contents in Impatiens balsamina. To investigate the relationship between MNQ content and gene expression, contents of lawsone and MNQ in different tissues of I. balsamina were determined by HPLC. As seen in Fig. 3, results confirmed that lawsone and MNQ accumulated at significantly different quantities in distinct tissues of I. balsamina (p ≤ 0.05). Quantification of lawsone based on the standard curve generated showed that the total average lawsone content was 8.662 mg g −1 dry weight. Mature leaves, at an average of 7.431 ± 1.915 mg g −1 dry weight yielded the highest amount of lawsone, a difference of ~ 9-folds (p ≤ 0.05) compared to young leaves (0.650 ± 0.039 mg g −1 ), with significantly lesser amounts of lawsone recorded in roots (0.195 ± 0.034 mg g −1 ), flowers (0.191 ± 0.067 mg g −1 ), stems (0.033 ± 0.013 mg g −1 ), early-(0.051 ± 0.035 mg g −1 ), mature-(0.060 ± 0.017 mg g −1 ) and postbreaker stage pericarps (0.051 ± 0.020 mg g −1 ). Lawsone was not detected in the seed samples. Two to three retention peaks were observed in mature leaves and standards during the lawsone HPLC analysis, validated as lawsone from similar retention times (RT), spectra profiles, and spiking using standard solution in multiple HPLC runs (Supplementary Figs. S1A, S2A; Supplementary Tables S2, S3). These two/three peaks observed correspond to the three known tautomeric forms of lawsone (1,4-naphthoquinone, 1,2-naphthoquinone and 1,2,4-naphthotrione) 61,62 , and suggests concentration may influence tautomer formation. It was reported that different concentrations and temperature have effects on favouring either the enol or keto form of 7-hydroxyquinolines tautomer in equilibrium 63 , thus further research is needed to determine affecting factors of lawsone tautomers. As shown in Fig. 3, MNQ was only detected in the pericarps (all three stages) of I. balsamina. Based on the standard curve generated, a total average content of 4.259 mg g −1 dry weight was calculated, i.e. the average content of MNQ quantified were 1.864 ± 0.697 mg g −1 , 1.508 ± 0.189 mg g −1 and 0.887 ± 0.244 mg g −1 dry weight for the early-, mature-and postbreaker stage pericarps respectively (Supplementary Figs. S1B, S2B; Supplementary Tables S2, S3).
Transcriptome sequencing of Impatiens balsamina and de novo assembly. Paired-end transcriptome sequencing generated 59.643 Gb of total raw reads for the five sets of transcriptomes comprising of leaf, flower, and three capsule developmental stages (early-, mature-and postbreaker stages) of I. balsamina, with two biological replicates for each tissue. The raw transcriptome data have been deposited in NCBI GenBank with Sequence Read Archive (SRA) accession PRJNA526137. Summary of the sequencing output for the ten transcriptomes from I. balsamina is shown in Table 1. After filtering of low-quality reads, adaptor trimmed and unknown (N) base reads, 55.194 Gb of clean reads were de novo assembled into 595,828 transcripts with  Fig. S5). KEGG annotation resulted in the assignment of I. balsamina unigenes into a total of 138 pathways ( Supplementary Fig. S6).  Table S8). Overall, the DEGs were mostly classified in 'Metabolism' (57.08% of total KEGG enriched DEGs), followed by 'Human diseases' (52.01%), 'Organismal Systems' (31.60%), 'Environmental Information Processing' (19.70%), 'Genetic Information Processing' (13.72%), and 'Cellular Processes' (12.01%). DEGs which were assigned with KEGG ID but not mapped to any pathway accounted for 1086 unigenes (2.14%). With regards to the relevant pathway terms involved in secondary metabolisms, 'Biosynthesis of other secondary metabolites' contained 1.86% DEGs, whereby 0.78%-and 0.30% DEGs were classified under the subcategories of 'Phenylpropanoid biosynthesis' and 'Flavonoid biosynthesis' respectively. Other pathways included 'Metabolism of terpenoids and polyketides' Table 1. Sequencing output for the ten transcriptome libraries from leaf (L), flower (F), and three developmental stages of capsules (early-(E), mature-(M) and postbreaker (P) stages) of Impatiens balsamina. a Each tissue has two biological sample sequencing outputs.   Table S9) and DHNA pathways. Expression data of 27 I. balsamina unigenes involved in the DHNA pathway were mapped onto the pathway. The end-product of this pathway, i.e. DHNA is a key precursor for producing MNQ. HPLC quantification showed that MNQ contents were singly higher in capsules, compared to leaves and flowers of I. balsamina, thus allowing exploration of MNQ biosynthesis by comparing expression data of the DHNA pathway genes in different tissues respectively. As seen in Table 2A, most of the unigenes corresponding to each DHNA pathway gene were expressed oneto-five-fold higher in early stage capsule compared to flower: menF (log 2 FC = 3.477), majority of the unigenes encoding PHYLLO (log 2 FC range − 2.029 to 4.750), AAE14 (log 2 FC range 3.363-3.693), menB (log 2 FC range 1.542-5.478) and DHNAT (log 2 FC range 1.191-4.094), supporting the biosynthesis of MNQ in the capsule of I. balsamina. In addition, lawsone and MNQ are both synthesized from the same core DHNA pathway, with lawsone (detected mainly in leaves) being the immediate precursor of MNQ (abundant in capsules). The observation that relative expression levels of most DHNA pathway genes were significantly up-regulated in early stage capsule even when compared to leaf (which produces lawsone) suggests that MNQ is de novo synthesized in early stage capsule. In early stage capsule, four of the five DHNA pathway genes were significantly higher than in leaf: menF (log2FC = 4.220), most unigenes encoding PHYLLO (log2FC range 2.932-3.726), AAE14 (log2FC range 3.009-4.319) and menB (log2FC range 3.668-5.068). Only DHNAT, which functions to convert DHNA-CoA to DHNA, did not show significant difference i.e. DHNAT was highly expressed in both leaf and early stage capsule (Table 2A).

Biosynthesis of MNQ in
DHNA is a branch point intermediate (key precursor) for the biosynthesis of phylloquinone, and early stage capsule may possess photosynthetic activity because it is green (Fig. 2). Phylloquinone, due to its PSI function are synthesized and accumulated in green and photosynthetic parts (e.g. leaves), in contrast to other non-photosynthetic parts of the plant 41 . Formed in three-steps starting with the conversion of DHNA to demethylnaphthoquinone via the phytylation process of DHNA phytyl transferase (ABC4) 64,65 ; then reduction to demethylphylloquinol involving demethylnaphthoquinone oxidoreductase [or NAD(P)H dehydrogenase C1, NDC1], phylloquinone is finally formed by demethylphylloquinone methyltransferase 66,67 . As shown in Table 2B, differential expression analysis revealed ABC4 was significantly up-regulated in early stage capsule (significant log 2 FC of E vs. F ranged from 1.501 to 2.274) compared to flower, suggesting that early stage capsule is likely to possess photosynthesis activity attributed to active expression of phylloquinone-related genes. However, no significant difference was detected in the expressions of ABC4 and 2-phytyl-1,4-beta-naphthoquinone methyltransferase (menG) between leaf and early stage capsule in I. balsamina. In fact, NAD(P)H dehydrogenase C1 (NDC1) was significantly down-regulated in early stage capsule (significant log 2 FC of E vs. L ranged from − 4.675 to − 1.381) compared to leaf (Table 2B). This provide more convincing evidence that early stage capsule     Fig. S8). For identification of SAM-dependent  (Table 3). www.nature.com/scientificreports/ In addition, it is also known that a reduced, glycosylated form of lawsone (THNG) exists, generated using an oxidoreductase that uses NADH or NADPH as electron donors as well as a glycosyltransferase 32 . The I. balsamina transcriptomes contained 82 unigenes with the description 'oxidoreductase activity, acting on NAD(P)H' , and 122 unigenes with 'UDP glycosyltransferases' . Results of the correlation analysis identified a total of three-and five candidate unigenes encoding 'NADH-quinone oxidoreductase' and 'UDP glycosyltransferases' respectively, both based on expression patterns showing significant positive relationships to lawsone content (p ≤ 0.01) ( Table 4). In this case, it was assumed that lawsone produced would be transformed into THNG for stability, solubility, transport and sequestration, as well as to physiologically inactivate the compound in the plant 68 . Nucleotide sequences of these candidate unigenes are provided in Supplementary Table S11.

Quantitative real-time PCR validation.
To validate the transcriptomes, qRT-PCR were performed for 20 selected genes from the MVA-, MEP-, shikimate-and DHNA pathways on combinations of three tissue types ( Fig. 6; Supplementary Tables S12 and Table S13). Melting curve analysis performed by qRT-PCR after 40 cycles of amplification detected the presence of single peaks indicating the expected amplicons were amplified for each gene. Results of linear regression analysis indicated a relatively high correlation (R 2 = 0.7962) of log-transformed gene expression (fold changes) between the normalized qRT-PCR and transcriptome datasets (Supplementary Table 3. Expression profile of shortlisted putative S-adenosylmethionine-dependent O-methyltransferase genes correlated to MNQ content in distinct tissues of Impatiens balsamina. Correlation analysis was performed using Spearman correlation method. Significant values are marked with asterisk mark, which ** refers to p-value ≤ 0.01. Significant up-or down-regulation is determined based on normalized DEG analysis, with Fold Change ≥ 2.00 and Adjusted P-value ≤ 0.05.  Table 4. Expression profile of shortlisted putative oxidoreductase and UDP-glycosyltransferase genes correlated to lawsone content in distinct tissue of Impatiens balsamina. Correlation analysis was performed using Spearman correlation method. Significant values are marked with asterisk mark, which * refers to p-value ≤ 0.05, ** refers to p-value ≤ 0.01, and **** refers to p-value ≤ 0.0001. Significant up-or down-regulation is determined based on normalized DEG analysis, with Fold Change ≥ 2.00 and Adjusted P-value ≤ 0.05. NA = not annotated to SwissProt database.

Discussion
Both MNQ and lawsone quantified using HPLC analysis were shown to be present in significantly different concentrations in distinct tissues of I. balsamina. The overall amount of lawsone isolated from I. balsamina (average of 0.866% w w −1 dry weight) was found to be comparable to the henna plant, Lawsonia inermis (1-1.4% w w −1 ) 69 . MNQ was detected only in the three pericarp stages of I. balsamina at a total average content of 4.259 mg g −1 dry weight, similar to a previous finding 10 which also reported the highest amount of MNQ being isolated from the pericarps of I. balsamina. In our study, the lawsone content extracted was ~ two folds higher than MNQ, concurring with a previous study 70 for I. capensis, I. noli-tangere and I. parviflora. Unlike MNQ which was only detected in the capsules, lawsone was found in various parts of I. balsamina, including leaf, flower, and root, consistent with a recent study by 71 on I. glandulifera. MNQ and lawsone were detected in the leaves, fresh seed pod capsules, roots, and whole flowers of I. glandulifera. Natural variation in the distribution of lawsone and MNQ in different parts of I. balsamina may be explained by the physiological roles of these compounds such as plant defense (antimicrobial, natural insecticides), allelopathy and UV absorption 10,71-74 , contributing to the ecological success of I. balsamina. In this study, the transcriptomes of leaf, flower, early-, mature-and postbreaker stage capsules of I. balsamina were generated and analysed. The transcriptome sequencing outputs and functional annotation results obtained (total of 94,659 unigenes obtained, 73.96% unigenes successfully annotated) are comparable to the recently reported transcriptome results of I. walleriana and I. hawheri 75 . In the I. balsamina transcriptome datasets, 26.04% of the unigenes did not possess significant similarity to sequences of other species, which is close to the percentage of 'orphans' or 'taxonomically restricted genes' (TRGs) in a given species 76 . TRGs are known as genes in a given species that do not have homologs in other species and postulated to account for 10-20% of genes in eukaryotic genomes 76,77 . These genes are likely to be related to the evolution of novelty and adaptive speciesspecific processes 78     www.nature.com/scientificreports/ The biosynthesis of MNQ involves two major pathways, namely the shikimate-and DHNA pathways. All the genes (enzymes) of both these pathways for I. balsamina were successfully identified from the transcriptomes. Differential expression of the DHNA pathway genes in five different tissues allowed the gaining of significant insights into the biosynthesis of MNQ in I. balsamina. DEG analysis revealed that majority of the genes involved in the DHNA pathway (up to synthesis of DHNA) were highly and significantly expressed in early stage capsule compared to flower and leaf, validating MNQ biosynthesis and further suggestive of de novo formation of DHNA in the capsule of I. balsamina leading to final production of MNQ. It was also observed that the highest expression of DHNA pathway genes occurred in early stage capsule, and were then down-regulated in matureand postbreaker stage capsules. This suggests that the biosynthesis of DHNA is highly active in early stage of capsule, gradually declining in the later stages of capsule development, correlating well with MNQ content in the pericarps of the three stages of capsules.
DHNA is a compound potentially diverted into two different pathways for the biosynthesis of phylloquinone and MNQ. Phylloquinone is a primary metabolite important for its function as an electron carrier in photosystem I (PSI) during photosynthesis. It was observed that higher expression of ABC4 occurred in early stage capsule compared to flower but no significant change compared to leaf, but NDC1 was down-regulated in early stage capsule compared to leaf. These results are indicative of the presence of functionally active phylloquinone pathway genes in early stage capsule, which could be explained by the fact that developing fruits can be photosynthetically active 79 . Pericarps have some ability to perform photosynthesis that has been proposed to play a notable role in seed growth and development in tomato 80,81 , wheat and barley 82,83 , Mercurialis annua and other Euphorbiaceae 84 , and certain species of Brassicaceae 85 . Results of NDC1 unigenes encoding the second enzyme of the phylloquinone pathway showing lower expressions combined with higher expressions of DHNA pathway genes in early stage capsule compared to leaf, serve to suggest that DHNA produced in early stage capsule is sufficient to support MNQ production in situ, branching off from the other DHNA downstream pathway i.e. phylloquinone biosynthesis.
From the correlation analyses of gene expression data and MNQ (and lawsone) content in different tissues of I. balsamina, a total of 11 unigenes were shortlisted from the transcriptomes that corresponded to the three enzyme classes proposed to catalyse the synthesis of MNQ (via lawsone) in capsules. According to Swissprot annotation, the three SAM-dependent O-MT shortlisted candidate unigenes mainly encode phosphoethanolamine N-methyltransferase, an enzyme that plays a key role in the synthesis of the metabolite phosphatidylcholine via a phospho-base methylation pathway in plants 86,87 . The additional candidate unigenes related to lawsone biosynthesis identified encoding NADH-quinone oxidoreductase are either respiratory burst oxidase homologs or Cytochrome P450s. Respiratory burst oxidase homologs are plant NADPH oxidase that plays key roles in cellular signalling network of reactive oxygen species and various processes such as plant development, hormonal and environmental stresses [88][89][90][91][92] . Cytochrome P450s, such as 71A1 is involved in the metabolism of compounds associated with the development of flavour in the fruit ripening process 93,94 , and 77A2 was found to involved in the flower bud development 95,96 . Candidate unigenes identified for the second putative enzyme of 'UDP glycosyltransferases (UGT)' corresponded to several glycosyltransferases (GTs), particularly UDP-glucose flavonoid 3-O-glucosyltransferase 6 (GT6), UGT73C6 and UGT87A2. UGTs catalyse glycosylation which is one of the final steps in producing secondary metabolites 68 . UGTs belong to the subfamily of GTs that play an important role in plant secondary metabolism 97,98 , known to participate in the regulation of hormones and biosynthesis of secondary metabolites such as indolyl-3-butyric acid, cytokinin 99-101 flavonoids, phenylpropanoids, terpenoids, steroids 102 , and flavanol glycoside 103 , although functions of most UGTs are still unknown 97,98,104 . Using an approach combining quantitative HPLC and comparative transcriptome analysis, putative candidate genes involved in MNQ downstream pathway have been identified, especially the S-adenosyl-l-methionine-dependent methyltransferases will warrant further studies to functionally validate their respective roles in the biosynthesis of MNQ.

Conclusions
In this study, de novo transcriptome sequencing and analyses of the leaf, flower and early-, mature-and postbreaker stage capsules allowed identification of all the annotated genes involved in the shikimate and DHNA pathways responsible for the production of MNQ in I. balsamina. Correlation between expression of shikimateand DHNA pathway genes with MNQ pools, combined with knowledge of previous labeling experiments by 32,45,46 suggest that MNQ biosynthesis branches off the phylloquinone pathway. Significant upregulation of most genes of the DHNA pathway in early stage capsule compared to flower and leaf suggests that MNQ is synthesized de novo in a tissue-specific manner in the capsule of I. balsamina. A total of 11 candidate unigenes corresponding to the enzyme families of S-adenosylmethionine O-methyltransferases, oxidoreductases, and UDP glycosyltransferases postulated to catalyse the final reaction of MNQ production as well as lawsone stability were identified based on their expression levels being significantly and positively correlated with MNQ-and lawsone content in different tissues of I. balsamina. Knowledge and better understanding of the genes involved in these biosynthesis pathways (and their expression patterns) now provide the required genomics resource for targeted manipulation of these pathways either via genetic engineering or synthetic biology.