Metabolomics analysis of milk thistle lipids to identify drought-tolerant genes

Milk thistle is an oil and medicinal crop known as an alternative oil crop with a high level of unsaturated fatty acids, which makes it a favorable edible oil for use in food production. To evaluate the importance of Milk thistle lipids in drought tolerance, an experiment was performed in field conditions under three different water deficit levels (Field capacity (FC), 70% FC and 40% FC). After harvesting seeds of the plant, their oily and methanolic extracts were isolated, and subsequently, types and amounts of lipids were measured using GC–MS. Genes and enzymes engaged in biosynthesizing of these lipids were identified and their expression in Arabidopsis was investigated under similar conditions. The results showed that content of almost all measured lipids of milk thistle decreased under severe drought stress, but genes (belonged to Arabidopsis), which were involved in their biosynthetic pathway showed different expression patterns. Genes biosynthesizing lipids, which had significant amounts were selected and their gene and metabolic network were established. Two networks were correlated, and for each pathway, their lipids and respective biosynthesizing genes were grouped together. Four up-regulated genes including PXG3, LOX2, CYP710A1, PAL and 4 down-regulated genes including FATA2, CYP86A1, LACS3, PLA2-ALPHA were selected. The expression of these eight genes in milk thistle was similar to Arabidopsis under drought stress. Thus, PXG3, PAL, LOX2 and CYP86A1 genes that increased expression were selected for protein analysis. Due to the lack of protein structure of these genes in the milk thistle, modeling homology was performed for them. The results of molecular docking showed that the four proteins CYP86A1, LOX2, PAL and PXG3 bind to ligands HEM, 11O, ACT and LIG, respectively. HEM ligand was involved in production of secondary metabolites and dehydration tolerance, and HEM binding site remained conserved in various plants. CA ligands were involved in synthesis of cuticles and waxes. Overall, this study confirmed the importance of lipids in drought stress tolerance in milk thistle.


Material and methods
Plant materials. Milk thistle seeds were obtained from the Pakanbazr, Isfahan, Iran. Plant study comply with relevant institutional, national, and international guidelines and legislation.
Field experimental design and sampling. The experiment was established on Shahid Beheshti University's experimental field in Tehran, Iran (51.23°N and 35.48°E) in a moderate and mountainous climate. The experiment was designed in a randomized complete block design (RCBD) with four replicates in three different drought stress levels [Field capacity (FC), 70% of FC and 40% of FC]. The soil texture consisted of 1/3 sand, 1/3 clay and 1/3 leaf composts. Average annual rainfall and temperature were 145.2 mm and of 22 °C, respectively. Average of air temperature, precipitation, relative humidity and wind speed were recorded monthly in meteorological site (Table 1). Field plot was 150.0 m 2 (10 m wide and 15 m long). Seeds were sown every 0.5 m in the rows while 1 m distance was applied between the rows. Milk thistle seeds were purchased from the Pakanbazr Company, Isfahan, Iran. The seeds were sown on 16 March 2017, and irrigated every 2 days.
Water deficit was applied to the plants at flowering stage on 24 June 2017. Weighing method was used to measure soil moisture. For this purpose, soil samples, three samples each day, were randomly taken from different areas of the field. Soil was irrigated just in first day and then did not irrigate until day six, when soil dried. Wet soils were weighted, and then oven-dried at 70 °C for 2 days. Dry weight of samples were measured. The difference between wet and dry weight showed the soil moisture. The soil moisture content was considered as 100% for field capacity, the moisture content for treatments with 70% and 40% FC were calculated accordingly. Therefore, irrigation was done every two days for FC, every 4 days for 70% FC, and every 6 days for 40% FC irrigation was Table 1. Atmospheric information of experimental field (Iran, Tehran, Shemiranat) from meteorological site.

Isolation of oil and methanolic extract and GC-MS analysis.
Dried seed were completely milled into powders. To isolate oil extract, 10 g of seed powder was used in a Soxhlet extractor in presence of n-hexane solvent. The extraction was done at 70 °C for 6 h and the final extract was stored in a dark glass. Oil-free powder was incubated at 37 °C for a week to dry, and then used for methanolic extraction 21 .
To isolate methanol extract, 2 g of oil-free powder was mixed with 200 ml of 80% methanol on a shaker for two days. The mixture was then passed through a filter paper and then stored at 4 °C. The same procedure was repeated for remaining powder on the filter paper and resulted extract was added into the first collected portion. Finally, the extracts were exposed to room temperature to be concentrated for 2 weeks 22 .
GC-MS analysis were performed using a Trace Gas Chromatograph 2000 Series supplied with a Finnigan Trace mass spectrometer, using helium as carrier gas (36.445 cm/s), supplied with a DB-1 J&W capillary column (30 m × 0.25 mm i.d.,0.25 mm film thickness). The chromatographic conditions were followed by an initial temperature at 70 °C for 5 min, a temperature rate: 5 °C/min, and a final temperature at 290 °C for 10 min. Injector temperature was 300 °C, transfer-line temperature was 300 °C with a split ratio of 20:1. The extracts were dissolved in n-hexane solvent separately and 1-2 µl of each extracts was injected into apparatus. GC-MS data and peaks were analyzed, components of each extract were identified and their amounts were measured.
Bioinformatics analysis. Kyoto Encyclopedia of Genes and Genomes (KEGG) (https:// www. genome. jp/ kegg/) 23 and Metabolic Pathways From all Domains of Life (MetaCyc) (https:// metac yc. org/) 24 databases were used to identify biosynthesis pathways as well as enzymes and genes involved in synthesis of extracts components. For this purpose, the desired lipids name was searched through KEGG and MetaCyc databases and in the "Compounds" section, the desired lipid was selected and the required information was extracted. The genes involved in the synthesis of these lipids were selected from the Arabidopsis model plant because most of its genes are known. GENEVESTIGATOR-Visualizing the world's expression data software (https:// genev estig ator. com/) 25 was applied to investigate expression of genes under drought stress. Then Biclustering tool was used to compare the expression level of selected genes in different Arabidopsis datasets. Functional protein association networks (STRING) (https:// string-db. org/) 26 and ShinyGO v0.61 programs (http:// bioin forma tics. sdsta te. edu/ go/) 27 were used to determine proteins interaction and biology processes interactions, respectively. ANOVA analysis for all measured lipids were performed using R version 3.5.3 (https:// www.r-proje ct. org/) 28 and RStudio version 1.1.463 (https:// www. rstud io. com/) 29 . Data were analyzed in a completely randomized blocks design (CRBD) with treatments as fixed effects and replications as random effect. Mean comparison was calculated by Duncan test presented in the agricolae package at 5% significance level of probability 30 . Box plot to represent relative gene expression was also drawn by R software. Molecular analysis. RNA extraction and cDNA synthesis. Total RNA was extracted from 0.2 g mature seeds of milk thistle in the flowering stage using a total RNA kit (RB1001, RNA, Iran) and subsequently treated with DNase I (RB125A, RNA, Iran) to eliminate probable genomic DNA contamination. The quality and quantity of the extracted RNA were assessed with 1% agarose gel and NanoDrop 1000 spectrophotometer (Thermo Scientific, USA), respectively. Purified RNA (Concentration 5 µg) was used to synthesize first-strand cDNA through cDNA Synthesis kit (RB125A, RNA, Iran) following the manufacturer's protocola, and stored at − 20 °C until use.
Primer design, sequence submit and qRT-PCR analysis. To design primers of the genes of interest, nucleotide sequences of same family plants (Helianthus annuus and Carthamus tinctorius) were obtained from the NCBI database. These sequences were then blasted on the milk thistle assembly data (from our previous experiment) using the BlastStation tool and the most identity sequences were selected. To primer design for qPCR, areas near to end of poly adenine, with a length of 150-250 bp were chosen. Homodimer, heterodimer, stem-loop, GC percent, and TM temperature were evaluated using Oligo 7 software (https:// www. oligo. net/) 31 and Vector NTI ® Express Designer Software (https:// vector-nti. softw are. infor mer. com/ 11.0/) 16 . The 18SrRNA gene was used as the reference gene. Finally, primers were synthesized by Bioneer Company (South Korea). The sequence and other information of primers are listed in Table 2.
For sequence submit of selected genes on NCBI site, the CDS format of selected sequences in milk thistle was obtained using Vector NTI software. Finally, to confirm the results, these sequences were blasted at the NCBI site, resulting in sequences of these genes in other plants.
RT-qPCR amplification was carried out by Rotor-Gene 2000 (Corbett Life Science, Sydney, Australia) using SYBR ® Green Real-Time PCR Master Mix (RB120, RNA, Iran). 20 µl reactions included 10 μL 2 × SYBR, 1 μL of synthesized cDNA, 1 μL of each primer (20 nmol), and 7 μL RNase-free water. The thermal conditions consisted of an initial step for 5 min at 95 °C, followed by 35 cycles amplification (1 min at 95 °C, 1 min at 50-60 °C depending on primers tm, and 15 s at 72 °C). To investigate the specificity of each amplicon, post-amplification meltingcurve ranging from 60 to 95 °C were assessed in every reaction. Cycle threshold values and PCR efficiency were computed by LinRegPCR program (https:// www. gear-genom ics. com/ rdml-tools/) 32 , and relative expression levels were calculated by Relative Expression Software Tool (REST) (https:// www. gene-quant ifica tion. de/ rest. html) 33 with the following formula. The E in the equation refers to the primer efficiency. All the qRT-PCR analysis were done in three biological and three technical replicates.  34 . Homology modeling was implemented to obtain the 3D (three-dimensional) structures of these proteins using trRosetta (https:// yangl ab. nankai. edu. cn/ trRos etta/) 35 . HHsearch against the PDB70 database was used to detect template in trRosetta server. TM-score of the predicted models was estimated based on the probability of top predicted distance and convergence of top models. Next, 3D structures of proteins were used to create Ramachandran plot by ProFunc server (http:// www. ebi. ac. uk/ thorn ton-srv/ datab ases/ ProFu nc/) 36 . Proteins functionality were predicted by Protein Structure Classification Database at UCL (CATH) (https:// www. cathdb. info/) 37 . Molecular docking was performed by protein-ligand binding site prediction (COACH-D) (https:// yangl ab. nankai. edu. cn/ COACH-D/) 38 using 3D models as inputs. Atomistic interactions between proteins and predicted ligands were studied within 5A of ligand residues using VMD-Visual Molecular Dynamics software (https:// www. ks. uiuc. edu/ Resea rch/ vmd/) 39 . Hydrogen bonds between proteins and ligands were identified using PyMOL software (https:// pymol. org/2/) 40 . All modeled structures also were visualized though PyMOL.

Results
Identification of oil and methanolic extracts compounds. 12 compounds in oil extract and 13 compounds in methanolic extract of milk thistle seeds were identified by analysis of GC-MS peaks (Figs. 1 and 2). Name, chemical formula, category, biosynthetic pathways and reactions, enzymes and genes involved in the synthesis of this components were listed in Table 3. In spite of the fact that many genes in milk thistle have not yet been known, Arabidopsis thaliana has been used as a model plant to determine genes, pathways and reactions. Oil extract contained hydrocarbons, fatty acids and esters. Hydrocarbons included Tetradecane, Hexadecane, Octadecane, 3-methyl-Heptane, Octane, Decane and Dodecane, which were active in alkane biosynthesis pathway. The enzymes involved in these pathways were fatty acid photodecarboxylase, aldehyde decarbonylase and alkane 1-monooxygenase.
The fatty acids included Stearic acid, Palmitic acid, Linoleic acid and cis-13-Octadecenoic acid. Stearic acid was produced in biosynthesis pathways of fatty acid and secondary metabolites. Enzymes involved in mentioned biosynthesis pathways were palmitoyl-CoA hydrolase, oleoyl hydrolase and long-chain fatty acid adenylase/ transferase. Palmitic acid was involved in fatty acids and secondary metabolites biosynthesis pathways, elongation, degradation and fatty acids metabolism as well as cutin, suberine and wax biosynthesis pathways. Fatty acid aldehyde dehydrogenase, peroxidase, omega-monooxygenase, fatty acid synthase, oleoyl hydrolase, CoA ligase, adenylase/transferase FadD23, 11-cis-retinyl-palmitate hydrolase and palmitoyl-CoA hydrolase were engaged in biosynthesis pathways. Linoleic acid was also engaged in fatty acids and secondary metabolites biosynthesis as well as linoleic acid metabolism pathways. Linoleate lipoxygenase family enzymes were predominant, which catalyzes reaction between linoleate and oxygen to produce hydroperoxyoctadeca dienoic acid (HPODE). Cis-13-octadecenoic acid fatty acid was involved in cutin, oleate, sporopollenin precursors, suberin monomers www.nature.com/scientificreports/ www.nature.com/scientificreports/ biosynthesis and catalyzed by oleoyl-CoA thioesterase. The only identified ester was 1-oleoyl-glycerol, which was involved in triacylglycerol degradation by sn1-specific diacylglycerol lipase activity. Methanolic extract contained other fatty acids, esters, steroids and lipids. Fatty acids included Methyl linoleate, Methyl stearate, Methyl 9-octadecenoate, Methyl palmitate, Oleic acid and β-Monolinolein. The enzymes involved in these pathways were peroxygenase, lipoxygenase, monooxygenase, hydrolase and hydratase. Methylated fatty acids were synthesized by thioesterase acyl-CoA enzyme in acyl-CoA hydrolysis pathway. Oleic acid, the major fatty acid in methanolic extract, was active in biosynthesis pathways of fatty acids, cutin, suberine, wax, and plant secondary metabolites. β-Monolinolein fatty acid was responsible in linoleate biosynthesis pathway through acyl-lipid ω-6 desaturase (ferredoxin) activity.
Esters comprised Methyl behenate, Linoleic acid ethyl ester, Glycerol β-stearate, Phthalic acid dioctyl ester and Hexadecanoic acid 2-(octadecyloxy) ethyl ester, which the engaged enzymes were carboxylesterase 3 and triacylglycerol lipase, which were active in triacylglycerol degradation pathway. Cholesterol belonged to steroids, which were active in steroid hormone biosynthesis, steroid biosynthesis and degradation, biosynthesis of alkaloids derived from terpenoid and polyketide, fat and vitamin digestion and absorption, cholesterol metabolism and  www.nature.com/scientificreports/ metabolic pathways by involving 3beta-hydroxy-Delta5-steroid dehydrogenase, cholesterol oxidase, monooxygenase, hydroxylase and desaturase, 7-dehydrocholesterol and Delta24-sterol reductase, sterol acyltransferase and esterase, alcohol and bile-salt sulfotransferase, steryl sulfatase and glucosidase, steroid Delta-isomerase. The identified lipid was β-Sitosterol, which was synthesized by engaging sterol 22-desaturase enzymes in steroid, plant secondary metabolites and terpenoids biosynthesis pathways.

Relationship between gene expression and metabolite variation. Lipids variations and respective
involved genes at three different levels of irrigation (Field Capacity (FC), 70% FC, 40% FC) was studied (Table 4). In this table, gene expression was extracted from Fig. 3, and lipid content was measured by GC-MS, mentioned above. ANOVA analysis of milk thistle lipids (oily and methanolic) was performed at three different levels of drought stress with three replications. According to the results of Table 4, there was a significant difference between different levels of drought stress in oleic acid, linoleic acid, cholesterol, β-Sitosterol, β-Monolinolein and Linoleic acid ethyl ester at the level of 0.01, but in stearic acid, palmitic acid and 1-Oleoyl-glycerol significant difference at the level of 0.05 was observed.
The stearic acid content decreased with increasing stress intensity. Among genes involved in biosynthesis pathway of stearic acid, a significant decrease and a slight reduction was observed in expression levels of FATA2 and FATA , respectively.
The oleic acid content in treatments FC and 40% FC was extremely high (75.8 and 73.7, respectively), but in treatment 70% FC (52% Area) was lower than two mentioned treatments. In oleic acid biosynthesis pathway, expression of PXG3 (RD20) significantly increased in all pathways, whereas CYP86A1 and FATA2 expression significantly reduced.
The amount of palmitic acid has decreased over time with increasing water deficiency. In palmitic acid, expression of AT5G47330 significantly raised when a drastic decrease of CYP86A1, LACS3, LACS1 expression were observed.
The linoleic acid content also decreased under drought stress. Up-regulated genes in linoleic acid biosynthesis pathway were LOX3, LOX4, LOX2 and LOX1 and down-regulated genes were PLA2-ALPHA and LOX5.
The content of cholesterol and β-Sitosterol in treatment 70% FC had a relatively high value (4.8 and 16.4% Area, respectively), but in two treatments 40% FC and FC was no significant. In cholesterol biosynthesis pathway, expression of DWF5 and DIM decreased. Expression of CYP710A1 and CYP710A2 significantly increased and declined, respectively, in β-Sitosterol biosynthesis pathway.
The content of β-Monolinolein was relatively high (10.7% Area) in treatment 70% FC but in the other two treatments the amount was halved. The genes biosynthesizing this lipid did not show significant expression.
The content of Linoleic acid ethyl ester and 1-Oleoyl-glycerol in treatment 70% FC had higher values (3.8 and 2.5% Area, respectively) than the two treatments 40% FC and FC. Expression of linoleic acid ethyl ester and 1-oleoyl-glycerol biosynthesis pathway genes reduced (Table4).

Relationship between genes and lipids networks.
For genes and lipids, a network was drawn separately to examine the relationship between genes and lipids. Gene network (Fig. 4) was divided into four groups  www.nature.com/scientificreports/ according to lipid network classification (Fig. 5). There was a correlation between genes and the lipids network, and so, lipid pathways interacted with their biosynthesis genes. The biggest group with maximum number of genes, group 1 (green), included LOX, LACS, AAE, FAT, CYP, DWF and CLO families, as well as FAD6, SDP1, ATPX62, RD20, PLA-ALPHA, AT1G05790, AT4G00520 and AT1G01710 genes, which led to the biosynthesis of stearic acid, oleic acid, palmitic acid, linoleic acid, cholesterol, β-Sitosterol, β-Monolinolein, linoleic acid ethyl ester and 1-oleyl glycerol. This group of genes is involved in the biosynthetic and metabolic processes of fatty acids, lipids, cellular lipids, small molecules, oxoacid, oxylipin, organic acids, carboxylic acid, monocarboxylic acid, as well as reductions in oxidation, oxidation and modification of lipid, finally, it is involved in the metabolic process of long-chain fatty acid and lateral root formation. Group 2 with red colors comprised CYP710A1, CYP710A2, CYP710A3, CYP710A4, DWF1, DWF5, SDP1 and RD20, which were involved in biosynthesis of cholesterol and β-Sitosterol and organic compounds. Depalmitoylation and deacylation of macromolecules is performed by genes in group3 (yellow), which include LACS1, LACS2, AT4G17470, AT5G47340, AT5G47330, AT4G17483 and AT3G60340 were categorized into group 3, yellow colors, which were active in palmitic acid and linoleic acid ethyl ester biosynthesis pathway. Group 4 with blue colors contained LACS6, LACS7 and LOX4, which engaged in biosynthesis pathway of palmitic acid and linoleic acid play role in the response to ozone. After comparing protein and metabolite networks, 4 up-and 4 down-regulated genes with high interaction was chosen (Fig. 4). Given that CYP86A1, CYP710A1, FATA2, LACS3, LOX2, Palmitoyl-protein thioesterase (PAL), PLA2-ALPHA and PXG3 genes were not previously identified in the plant of our interest, so, the nucleotide sequence of these genes (based on our RNA-Seq data in previous research) was provided to submit in NCBI database. Accession numbers of selected genes are MW151571, MW151572, MW151573, MW151574, MW151575, MW151576, MW151577 and MW151578, respectively. PXG3 gene expression increased 65.119 and 59.302 times in 40% FC and 70% FC treatments compared to FC, respectively. The expression of this gene was relatively similar in both treatments and had a significant increase compared to the control. Palmitoyl-protein thioesterase (PAL) expression significantly increased compared to control in treatment 70% FC (154.879) but no significant increase of expression was observed in treatment 40% FC (1.693). Expression of LOX2 was significantly increased in both 40% FC and 70% FC treatments while expression level of this gene in treatment 70% FC (1917.49) was significantly higher than treatment 40% FC (60.129) compared to control. A reduction of CYP86A1 gene expression was observed in both 40% FC and 70% FC treatments compared to the control (0.419 and 0.015, respectively). The expression of PLA2-ALPHA gene in two treatments 40% FC and 70% FC increased by 4.332 and 3.972, respectively, compared to the control although this increase was not significant. Expression of LACS3 was reduced in the 40% FC and 70% FC treatments at ratio of 0.044 and 0.068, respectively, compared to the control. CYP86A1 expression in both treatments 40% FC and 70% FC showed a significant increase compared to the control. Increased expression of this gene in treatment 70% FC (519.147) was significantly higher than treatment 40% FC (9.952) compared to control. FATA2 gene in treatment 40% FC showed an increase compared to FC (2.585), which was not significant while the expression of this gene decreased in treatment 70% FC (0.343) (Fig. 6).
Protein homology modeling and reliability of modeled structures. Four selected proteins (CYP86, LOX2, PAL and PXG3) with high expression level were selected and their structure were modeled. The model with the highest Confidence and TM-score was chosen for each protein among suggested five models by TrRosetta server (Fig. 7). Homology modeling information was acquired for each protein using trRosetta server and presented in Table 5. CYP86 and PAL had a high Confidence and a TM-score (0.544 and 0.518, respectively). PXG3 showed a medium confidence and a TM-score of 0.467, while LOX2 had low confidence and TM-score (0.325). www.nature.com/scientificreports/ Application of Ramachandran plot with ProFunc server showed that 91.2, 87.7, 85.3 and 88.1% of amino acids in CYP86, LOX2, PAL and PXG3, in an order, were in the most favored regions, where the maximum points are observed (Fig. 8). Percentage of amino acid dispersion in additional allowed regions, generously allowed regions and disallowed regions are listed in Table 5.
Function analysis of selected proteins. Protein analysis using CATH database exhibited cholesterol 24-hydroxylase isoform X2 function for CYP86, Lipoxygenase function for LOX2, palmitoyl-protein thioesterase function for PAL, and, probable calcium-binding peroxygenase function for PXG3. Molecular docking. The most plausible predicted ligands for CYP86, LOX2, PAL and PXG3 proteins were HEM, 11O, ACT and LIG, respectively (Fig. 9). The binding energies and C-scores are listed in Table 6.
HEM ligand interacted with 17 amino acids of CYP86 protein, listed in Table 6 and Fig. 10.

Discussion
Lipids as vital and major cellular constituents provide a structural basis for cell membranes and an energy resource for metabolism. In addition, lipids as signal mediators are involved in initiation of defense reactions 42 , as well as mitigation processes in response to stress in plant cells 43 . Therefore, lipid contents including fatty acids, hydrocarbons, esters, steroids, and etc. are affected by different stress conditions. For example, dehydration Figure 6. Relative expression analysis of 8 hub genes (4 up-regulated and 4 down-regulated genes) at 70%FC (orange color) and 40% FC (red color) versus FC treatment (field capacity). The relative expression of genes is shown based on Log10. Because the differences in expression between the different genes were so large, logarithms were used to make them easier to display and compare better. www.nature.com/scientificreports/ drastically reduced and altered lipid levels and compositions 44 . Thus, finding genes biosynthesizing these lipids helps us understand the mechanism of stress tolerance. Among genes involved in lipids synthesis, differentially expressed genes under drought stress were chosen. These genes included four up-regulated CYP710A1, LOX2, PXG3, and Palmitoyl-protein thioesterase (PAL), and four down-regulated, FATA2, CYP86A1, LACS3 and PLA2-ALPHA genes. CYP710A1 and CYP86A1, as members of cytochrome P450 gene family, protect plants against multiple biotic and abiotic stresses through involving in plenty of detoxification activities and biosynthetic pathways 45 .
LOXs act in signaling processes due to stressor effects and lipoxygenase activity characteristics, and may serve as molecular markers for plant stress tolerance studies 46 . Probable calcium-binding peroxygenases (PXG3) take part in storage lipid degradation of oil bodies in the abiotic stress signaling pathways as well as in drought tolerance by controlling stomata under water deficiency 47 . Palmitoyl-protein thioesterase (PAL) is responsible  Table 5. Features of the predicted protein models on two servers, trRosetta and ProFunc. trRosetta server includes confidence and TM-score, and ProFunc server includes most favoured regions (%), additional allowed regions (%), generously allowed regions (%) and disallowed regions (%). TM-score is between 0 and 1 and a TM-score higher than 0.5 usually indicates a model with correctly predicted topology 41  www.nature.com/scientificreports/ for removing palmitate group from its substrate proteins, which might contain cysteine string protein (CSP), presynaptic proteins like SNAP-25, dynamin, and synaptotagmin 48 .
FATA2 plays an important function in chain termination within de novo synthesis of fatty acid and is also essential for plant viability 49 . Long-chain acyl-CoA synthetases (LACSs) synthesize long-chain acyl-CoAs from free fatty acids in plant cells. LACS2 is primarily involved in polyunsaturated linolenoyl-CoA production, which is vital to activate ethylene response transcription factors-mediated hypoxia signaling 50 . PLA2, as one of phospholipase A2, manages plenty of cellular processes, such as development, growth, defense, and stress responses 51 .
The results of gene expression in milk thistle were similar to the results of the Genevestigator program in Arabidopsis under different drought experiments. Under drought stress conditions, expression of four genes CYP86A1, LOX2, Palmitoyl-protein thioesterase (PAL) and PXG3 increased while the other four genes did not change significantly.
To further investigate function mechanism of these genes, protein structure of the respected genes was examined. Due to lack of protein structure sequenced in milk thistle, both quality of homology modeling and accuracy to dock substrate to the substrate-binding site 52 were taken into account. Confidence of CYP86A1 and PAL was high, while for PXG3 and LOX2 were medium and low, respectively. The constructed models examined by Ramachandran analysis exhibited that these models are relatively close to reality. Protein interaction with the respective ligands was investigated to analyze their atomic interactions and functional preferences.
Although approximately 1% of plant protein-coding genes are predicted to encode P450s 53 , just seven P450 protein crystals of plants were found in Protein Data Bank (PDB) database because it is localized in the membrane www.nature.com/scientificreports/   www.nature.com/scientificreports/ and the structure is simply degraded and broken within purification and crystal growth 54 . Heme-containing P450s bonded to oxidoreductases catalyze stereo-and regio-selective oxidations, and also intensively challenging reactions, including decarboxylation, deamination, C-C cleavage, ring opening, coupling, expansion, migration, and dehydration 55 .
For understanding the architecture of HEM binding site, one of biologically active motifs present in CYP, is crucial to establish the mechanism assembling basic metabolic machinery and facilitating secondary metabolites formation. It was previously reported that Phenylalanine, Glycine, Alanine, Arginine, Isoleucine, Cysteine and Proline amino acids were frequently repeated in HEM binding site of nine investigated plants by a proteomewide identification of CYP enzymes analysis using statistical weight matrix approach 56 . Our results confirmed that the mentioned amino acids interacted with the HEM ligand. Residues interacting with HEM are mostly non-polar, particulalry aromatic amino acids, preparing a hydrophobic condition for HEM ring structure 57 . Our results demonstrated that nine amino acids interacting with HEM ligand were non-polar, which one of them had a hydrogen bond. Hydrogen bonds play important roles in protein folding 58 , protein-ligand interactions 59 , and catalysis 60 and affect molecule physicochemical properties, such as distribution, partitioning, solubility, and permeability, which are essential to drug development 60 . A mutation in conserved glycine of heme-binding motif led to an inactive and unstable apoprotein, which supports a main function of glycine to bind the heme and probably regulating P450s activity 61 . Therefore, glycine mutant instability extremely restricts examination of its exact catalytic role 62 .
LOX2, a member of a large monomeric protein family with non-sulphur, non-heme, and iron cofactor containing dioxygenases, interacted with ligand 11O. Free fatty acids are considered to be the main substrates of LOX although phospholipids and glycerolipids were also identified as oxygenation substrates 63 . Lipoxygenases www.nature.com/scientificreports/ (LOXs) catalyze polyunsaturated fatty acid (PUFA), as a substrate, to synthesize hydroperoxides 64 . Therefore, iso-enzymes of LOX would be identified according to substrate's peroxidation site 65 . LOXs generate HPOs, which act as reactions hubs because of having cytotoxic potential to cell membrane, therefore, are rapidly metabolized in chemical compounds engaged in plant defense, signaling and apoptosis 66 . Since the majority of studies have focused on compounds with lipoxygenases inhibitory activity, investigation of ligand 11O and its binding site has not undertaken yet and should it be taken into account for further studies. PXG3, a member of caleosin family, interacted with CA ligand. PXG3 protein binding site includes two hydrogen bonds between two ASP and ILE amino acids with calcium ligand. Caleosins have a specific calcium-binding domain in N-terminal region 67 , a hydrophobic domain and a proline knot motif probably to target protein to oil bodies (OBs) 68 . In Arabidopsis, Peroxygenase activity of AtCLO1 and AtCLO2, two members of caleosin family, required the presence of calcium and two conserved histidines, which subsequently proved by Aubert et al. 69 that these two histidines were in the PXG3 sequence. Despite acting PXG3 as the putative OB-associated peroxygenase, it may participate in lipid modifications or signaling in stress responses of plants 69 . Biosynthesis of cuticular waxes and cuticle was increased under water and ABA deficiency and in Arabidopsis 70 , as a result of peroxygenase activity in wax and cuticle synthesis pathway 71 . It was exhibited that PXG3 takes part in drought tolerance mechanisms by regulating plant growth, stomatal aperture and water use efficiency 69 .
Palmitoyl-protein thioesterase interacted with ACT ligand. This protein was known only in Arabidopsis and no attributed function has been reported yet.
Lipid composition indicated a main impact on cellular membrane integrity and activities of intrinsic-membrane proteins under stress conditions 72 . Therefore, preservation of cellular membrane integrity, as a key factor to retain metabolic homeostasis 73 , is a prerequisite to survive during severe stress conditions 74 . In addition to identifying important lipids and genes involved in their synthesis, the method of identifying lipids is also of particular importance. Characterization and identification of main compounds in biological processes, such as metabolites, lipids and proteins have been facilitated by implementing high sensitivity and resolution in mass spectrometry (MS) 75 , highlighting modern lipidomic tools can be applied to study of plant lipid 76 by relying on tailored and optimized MS-based strategies 77 .

Conclusion
Our results highlighted the importance of lipids as main components of plant cells, which play a major role in stress conditions. Regulating gene expression is one of mechanisms in plants to reduce the stress influences. Lipid components and respected biosynthesizing genes of milk thistle were identified. Determined the genes of these lipids and examined their expression in various plants under drought stress. Next, we selected 8 genes (PXG3, LOX2, CYP710A1, PAL, FATA2, CYP86A1, LACS3, and PLA2-ALPHA) that had differential expression and examined their expression in milk thistle. Out of 8 selected genes, 4 that had high expression (PXG3, PAL, LOX2, and CYP86A1) were selected, and protein structures were obtained by homology modeling. Next, the ligands attached to these proteins were examined. Examination of ligands, binding site amino acids, and atomistic interactions confirmed that the interactions between proteins and ligands are powerful and useful. The product of these proteins, which contains lipids, also plays the final role in confirming the role of these genes in stress tolerance.