The natural variance of the Arabidopsis floral secondary metabolites

Application of mass spectrometry-based metabolomics enables the detection of genotype-related natural variance in metabolism. Differences in secondary metabolite composition of flowers of 64 Arabidopsis thaliana (Arabidopsis) natural accessions, representing a considerable portion of the natural variation in this species are presented. The raw metabolomic data of the accessions and reference extracts derived from flavonoid knockout mutants have been deposited in the MetaboLights database. Additionally, summary tables of floral secondary metabolite data are presented in this article to enable efficient re-use of the dataset either in metabolomics cross-study comparisons or correlation-based integrative analysis of other metabolomic and phenotypic features such as transcripts, proteins and growth and flowering related phenotypes.


Background and Summary
Plant secondary metabolites (so-called specialized metabolites) that have high natural diversity in their chemical structures and abundances can be identified through metabolic screening of populations even in the comparisons between ecotypes and cultivars belonging to the same species [1][2][3] . This may represent relatively recent adaptations or more phylogenetical restrictions in the evolution of such metabolisms [3][4][5] . With metabolomic screening of such populations, metabolic polymorphism in aliphatic glucosinolates 6 , flavonol-glycosides 7 and phenylacylated-flavonols 3 have been discovered in Arabidopsis. Additionally, a key gene of production of phenylacylated-flavonols for the conferral of protection towards UV irradiation 3 , was characterized by an integrative functional genomic approach. Since several physiological studies using Arabidopsis accessions have been reported with phenotypic analysis under stress conditions such as UV-B irradiation 8 , drought and salinity stress 9,10 and biotic stressors 11 , understanding of plant secondary metabolites for the conferral of protection towards stress condition is highly important. To capture the variance of secondary metabolites across populations, liquid chromatography-mass spectrometry (LC-MS) has often been preferred to other analytical methods as it presents the technical advantage of capturing the most extensive variety of plant metabolites.
Here, data of floral secondary metabolite abundance measured in a population of 64 Arabidopsis thaliana (Arabidopsis) natural accessions are presented (Data Citation 1)(Data Citation 2). Sixty-eight secondary metabolites were measured via LC-MS, ions acquired in positive and negative ion detection mode, and compounds annotated through a combination of chemical confirmation with analytical standards and comparative analysis with flavonoids knockout and over-expresser Arabidopsis lines 12,13 . The list of the Arabidopsis accessions used in this study, and raw and normalized metabolomics data are provided (Data Citation 1)(Data Citation 2), respectively. This dataset can be used for cross-study comparisons of plant metabolites, investigations on the reproducibility of metabolomics data, and indepth analysis of plant metabolism. Importantly, transcriptomics data obtained from 10 samples in this experimental set is available in the Gene Expression Omnibus (GEO) database (Data Citation 3). Correlation studies with data of metabolomics, transcriptomics, proteomics and phenomic data of floral related traits are also anticipated. In addition, the presence in this dataset of standard reference files and complex biological data files, which were acquired on the same LC-MS system, makes it useful for practical exercises on data analysis and interpretation. Finally, as several secondary compounds initially identified in model plants bring nutritional and health benefits to humans 14,15 , these data will be helpful in the design of future plant metabolic engineering used for translational genomics applications from model species to crops.

Plant material and sample preparation
Seeds of Arabidopsis natural accessions (Table 1 (available online only)) were germinated on 1/2 MS salts solidified with 1% of agar in a growth chamber (16 h light, 140-160 μmol m − 2 s − 1 , 20°C; 8 h dark, 16°C) after vernalization (two days in the dark at 8-10°C). Fourteen days after planting, the seedlings were transferred onto soil (GS-90 Einheitserde; Gebrueder Patzer) and grown in a greenhouse (16 h light, an average irradiance of 120 μmol m − 2 s − 1 , 20°C; 8 h dark, 16°C) until flowering. Positioning of the plants was randomized during plant growth. Fully open mature flowers (first flowers) were harvested at around noon (after approximately 5 h of light) and immediately frozen in liquid nitrogen for further analysis. Flowers from three plants were individually harvested to prepare one biological replicate. Sample preparation and extraction were performed as previously described 3 .

LC-MS analysis and flavonoid mutant-based peak annotation
Profiling of secondary metabolites was performed as previously described 3,16 . Briefly, flower tissues were ground with liquid nitrogen and homogenized in a mixer mill for 3 min at 25 Hz with a zirconia bead and 20 μL of extraction buffer (80% methanol, prepared with 5 μg mL − 1 isovitexin as an internal standard) per mg of ground tissue (e.g., 204.0 μl extraction buffer for 10.2 mg fresh weight sample). Thereafter, the supernatant was separated from the cellular debris via centrifugation at 12,000 x G and 3 μL of the clarified supernatant directly injected in an HPLC system Surveyor (Thermo Finnigan, USA) coupled to LTQ-XP system (Thermo Finnigan, USA) for metabolite profiling described as below. All samples including flower extracts obtained from Arabidopsis mutants described in 'Data processing and metabolite data analysis' were analyzed together. Sample run order was determined by replicates consecutively.

Transcriptomic data
Transcriptomic analysis was performed using ATH1 microarrays as described previously 3 with ten accessions (Col-0, C24, Cvi, Da, Rsch, Ler-0, Ws, Sap, Stw and RLD). Duplicate hybridizations were carried out for Col-0 and C24, and a single hybridization was performed for all the other accessions except Col-0 and C24. Data is deposited in the Gene Expression Omnibus database (Data Citation 3).

Data Records
Raw data obtained from the analysis of natural Arabidopsis accessions and mutant reference lines have been deposited in the Metabolights (Data Citation 1). Raw data contains two negative (collision energy: 0 and 30 meV) and one positive ion detections. Cdf files contain negative and positive ion detections without data of in-source fragmentation using collision energy. This dataset contains a total of 216 raw files resulting from 72 lines (64 accessions and 8 Arabidopsis mutant lines) with three biological replicates each. A dataset of floral secondary metabolite (68 compounds; 16 glucosinolates, 3 hydroxycinnamate derivatives, 42 flavonol derivatives and 7 putative polyamines) and general statistics relative to the natural accessions used in the study is provided (Data Citation 2). Metabolite data was obtained from a dataset previously published 3 and reformatted for correlation-based analysis by average-scaling and logtransformation ([log 2 (mean(replicates)/mean(mean of all accessions)]) (Data Citation 2). The geographic coordinates of the Arabidopsis accessions provided in Table 1 (available online only) are updated accordingly with the Arabidopsis 1001 genome database (http://1001genomes.org/) 28 .

Technical Validation
To qualitatively and quantitatively validate metabolite data obtained from three biological samples the standard deviation was estimated (Data Citation 2).

Usage notes
Data of floral secondary metabolites are presented in Excel files (Data Citation 2). For each compound, the method used for peak identification/annotation, which includes retention time, ion detection mode and relative peak area, is specified. The value of the relative peak area was obtained from the average of three measurements (n = 3) normalized by the standard deviation (SD)(Data Citation 2). Compound's family name and reference literature are also provided. The abundance of floral metabolites, normalized by average-scaling (mean/average) and log-transformation (log 2 ) is reported (Data Citation 2). The dataset here presented can be used for cross correlation studies to integrate metabolomics with transcriptomics, proteomics, and floral phenotypic data. Figure 1 shows an example of metabolitemetabolite correlation network analysis (r 2 >0.6, Pearson correlation estimated R statistical package (https://www.r-project.org/)) performed with the data reported (Data Citation 2). Visualization of network connection based on coefficient value was performed with Cytoscape (http://www.cytoscape.org/ ) using an organic layout style (Data Citation 2). As previously discussed 3 accession-specific floral phenylacyl-flavonol glycosides (saiginols, indicated with the number 1 in Fig. 1) show a strong correlation within the saiginol clade. The following ten additional clades of compounds were also identified and these are indicated in Fig. 1  aliphatic sulfinyl-glucosinolates, 10) long-chain aliphatic thio-glucosinolates, and 11) other glucosinolates as for example indolic glucosinolates. No subclades of hydroxycinnamates were found. Network analysis suggests that metabolites that belong to the same clade are produced in Arabidopsis natural accessions that share the common genetic polymorphism, transcriptionally co-regulated, or are the resulted of a similar metabolic pattern maintained by the combination of different metabolic flux changes. The data presented in this article are useful in biodiversity studies, e.g., to investigate relationships between natural metabolic diversity and accession distribution, physiological diversity and the genomic polymorphism.