Introduction

Perennial ryegrass (Lolium perenne L. Family: Poaceae)1 supports most of the milk and meat production worldwide2. Geno-phenotypic characteristics of ryegrass are therefore critical in determining feed quality for the animal3, degradation in the rumen4 and livestock production responses5,6,7. Consequently, breeding techniques8,9,10,11 are employed to produce cultivars with desirable traits such as high water soluble carbohydrate content, neutral detergent fibre, crude protein content and digestibility, in addition to forage yield, seed yield, pest and disease resistance12,13. Of special interest and relevance to this study are the high-sugar cultivars, which have elevated levels of fructans. Fructans are the major storage carbohydrate in ryegrass, and are made up of varying degrees and complexities of linear or branched fructose polymers14, denoted by the degree of polymerisation (DP). DP directs fructan accumulation and thereby the total sugar content of these high-sugar cultivars15. High-sugar cultivars are proposed to increase milk and meat production through enhanced protein utilisation by ruminants16. In addition, lipid composition of ryegrass affects quality of animal products12, and secondary metabolites possess anti-parasitic activity in ruminants17. We hypothesised that chemical diversity of ryegrass, especially fructan content, offers opportunities to harness variation in these traits into cultivars for improved ruminant performance and novel product characteristics and, exploring underlying biochemical mechanisms of high-sugar grasses, in addition to a better mechanistic understanding of fructan accumulation, will also help decipher major changes in primary and secondary metabolism.

Metabolomics18,19 provides a snapshot of chemical diversity, enabling metabotypic classification of genotypes20,21,22 and in conjunction with other –omics sciences23, a better understanding of biological mechanisms18. The advent of advanced bio/cheminformatic tools and techniques24 have since propelled metabolomics towards system-level evaluations via data-fusion25 and pathway mapping26. We conducted a mass spectrometry based metabolomics study of 5 clonal replicates of 715 ryegrass genotypes (3575 plants), representing 118 populations from 21 countries (Supplementary Figure 1). Fructans, fatty acid methyl esters (FAMEs), lipids, polar and semi-polar compounds were analysed using ultra-high-performance liquid chromatography (U)HPLC and gas chromatography–mass spectrometry (GC–MS) systems. The objectives of the current study were to verify and demonstrate quality control measures undertaken for big metabolomics data, evaluate diversity of ryegrass genotypes for fructan/sugar content, and thereby identify high- and low-sugar plant genotypes under New Zealand climatic conditions, determine the role of DP of fructans in directing total sugar content, and finally elucidate potential metabolic variation between high- and low-sugar grasses in the context of data from other analytical streams (lipids, FAMEs, polar and semi-polar compounds).

Constant monitoring and post-run evaluation of quality control parameters accounted for technical variation in samples, and where these parameters were not met, batches were re-run following instrument calibration. The quality control procedures adopted here were therefore appropriate for large-scale metabolomics studies, rendering reliable data for downstream processing. The sum of low- (3–5), mid- (10–12) and high- (18–20) DP fructans was used as a measure of total sugar content, and of the 715 genotypes surveyed, 39 high- and 31 low-sugar genotypes were identified. High-DP fructans contributed significantly more to the total sugar content, measured as hexose units, in high-sugar grasses. A negative correlation between high- and low-DP fructans in the high-sugar group, further identified 11 genotypes which had greater high-DP content than a reference high-sugar genotype (Aberdart). These results, in addition to immediate inclusion in breeding exercises, offer a better understanding of fructan accumulation in high-sugar grasses, and subsequently ample scope for genetic improvement. Between high- and low-sugar grasses, major differences in primary metabolism were observed, with most lipid classes and fatty acids significantly higher in the low-sugar group. Differences in secondary metabolism were also noticed, where high-sugar grasses recorded lower concentrations of flavonoids and lignins. Identification of compounds and mapping them to metabolic pathways, successfully led to visualisation of a biochemical snapshot of high-sugar grasses.

Results

Quality control monitoring and evaluation

In large-scale metabolomics studies, demonstration of quality control monitoring and verification of quality control parameters is a prerequisite which accounts for technical variation, and affects data quality and thereby subsequent interpretation. Drifts in mass accuracy and retention time of the internal standard in quality control samples indicates technical variation between batches of samples. Here, drifts in mass accuracy and retention times of the internal standard 2′,7′-Dichlorofluorescein in quality control samples of the semi-polar stream (positive ionisation mode), across all 36 batches was demonstrated (Supplementary Figure 2). Mass accuracy was within the ±5 ppm threshold (Supplementary Figure 2A), and retention time drifts were within ±0.2 min from the median (Supplementary Figure 2B). Quality control monitoring for the lipid and polar streams also generated identical results.

A post-run evaluation of run-order effects was also conducted immediately after each batch was completed. Supplementary Figure 3 shows an exemplar principal component analysis (PCA) of a single batch of samples classified based on run-order, where Supplementary Figure 3A shows no significant run-order effect, while Supplementary Figure 3B shows a notable run-order effect. In this case, the batch representing Supplementary Figure 3A was proceeded to the super batch, whereas that representing Supplementary Figure 3B was re-run. Since quality control samples were not representative of the sample set, they clustered separately from the samples.

Chemical diversity

Fructan content: A typical total ion chromatogram of a sample run for fructan measurement, depicting low (3–5), mid (10–12) and high (18–20) DP ranges, is shown in Supplementary Figure 4. These ranges were used to measure the total sugar content of the sample. Samples in a single batch for fructan/sugar estimates before (Supplementary Figure 5A) and after (Supplementary Figure 5B) normalisation by a linear trend are shown in Supplementary Figure 5. Likewise, estimates between all 36 batches of samples, before (Supplementary Figure 6A) and after (Supplementary Figure 6B) normalisation for batch-effects, are shown in Supplementary Figure 6. The resultant data matrix, obtained after these normalisation procedures was used for classification of genotypes based on total sugar content.

Normal distribution of ryegrass genotypic diversity based on total sugar content is shown in Fig. 1. Of the top 10%, only genotypes that fulfilled our two-tier criterion of having a minimum of three replicates in the top 10% of the whole sample set (3575), were classified as the high-sugar group (n = 133 samples; p = 0.0001; Tukey’s HSD). Based on these conditions, the high-sugar group comprised 39 genotypes, of which 19 had genetic lineage to New Zealand (Supplementary Table 1). As anticipated, genotypes of the high-sugar grasses Aberdart and Aurora, bred in the UK27, were present in this group. The remainder of this pool was made up of genotypes from Netherlands, Denmark, Australia, Slovakia, Tunisia and Germany (Supplementary Table 1). Likewise, the bottom 10%, low-sugar group (n = 106), comprised 31 genotypes, with 14 being of New Zealand origin and the remainder of genotypes from Tunisia (Supplementary Table 1).

Fig. 1
figure 1

Diversity of ryegrass genotypes (open circle, black up pointing triangle, black down pointing triangle) (a) based on total sugar content ±SE (n = 5), with black up pointing triangle denoting the top 10% (high-sugar group) and, black down pointing triangle denoting the bottom 10% (low-sugar group) that fulfilled the two-tier criterion, and b showing the genotypic diversity based on total sugar content, relative to a normal distribution curve

Fructan accumulation in high- vs. low-sugar grasses: Fructan accumulation was significantly higher (p = 0.0001; Tukey’s HSD) in high-sugar grasses across all DPs (Fig. 2a). However, within the high-sugar group, the contribution of high- and mid-DPs to the total sugar content was more prominent than low-DP (Fig. 2a). A negative correlation between high- and low-DP was evident within high-sugar genotypes (Fig. 2b), which when extended to the 39 genotypes, revealed the ones with higher high- to low-DP ratios (Fig. 2c). Provided genotypes with high-DP levels are preferred within the high-sugar group, 11 genotypes with greater high-DP content than the current standard for high-sugar genotypes (Aberdart) were identified (Fig. 2c).

Fig. 2
figure 2

a Boxplots of total sugar content between high- and low-sugar groups, distributed across average peak intensities of low (DP3–5), mid (DP10–12) and high (DP18–20) degree of polymerisation (DP) of fructans. Patterned and shaded boxes denote high- and low-sugar groups, respectively. Different upper case letter codes presented between the two groups, indicate significantly different (p < 0.05) values by Tukey’s HSD. b Correlation analysis (Pearson method) between low-, mid- and high-DP fructans within the 39 high-sugar genotypes. Positive correlations are displayed in blue and negative correlations in red colour. Colour intensity and the size of the circle are proportional to the correlation coefficients. Legend colour at the bottom shows the correlation coefficients and the corresponding colours. c Comparison of average peak intensities ± SE of high- (Patterned bars) and low-DP (shaded bars) fructans across the 39 high-sugar genotypes, along with respective linear regression trends. The high-sugar genotype Aberdart is marked as the current standard. Data underlying the plots in c are available in Supplementary Data 3 

Polar and semi-polar compounds: Following data processing, the final data matrices from the HILIC streams had 222 (positive) and 198 (negative), and those from the C18 streams had 175 (positive) and 152 (negative) metabolic features, respectively. Data for high- and low-sugar groups were compared, and of the total 747 features, 293 were significantly different between the two groups, based on t tests with a false-discovery rate cut-off of p < 0.05 (Fig. 3; Supplementary Data 1). Multivariate analysis with PCA for each analytical stream, failed to discriminate the two groups (Supplementary Figure 7).

Fig. 3
figure 3

Cloudplot of t stat and −log10 p values of metabolic features from the HILIC and C18 streams (positive- and negative-ionisation modes), significantly different between high- and low-sugar groups, based on t tests with a false-discovery rate cut-off of p < 0.05. A positive t stat value indicates high- > low-sugar group, whereas a negative value indicates high- < low-sugar group. Cloud size represents the magnitude of –log10 p value. HP (Purple), HN (Green), CP (Brown) and CN (Blue) represent different analytical streams corresponding to HILIC positive, negative and C18 positive and negative, respectively

An overview of the discriminating features (Fig. 3) showed that polar compounds from HILIC positive and negative-ionisation streams, indicative of primary metabolism, demonstrated maximum variation between the high- and low-sugar groups, compared to semi-polar compounds from C18 positive- and negative-ionisation streams, largely representative of secondary metabolism. The HILIC positive stream accounted for 104 significantly different features, 35 of which were higher in the high-sugar group (Fig. 3). The HILIC negative stream revealed 64 significantly different features, of which 29 were higher in the high-sugar group (Fig. 3). C18 positive and negative streams had 80 and 45 discriminating features, of which 29 and 16 respectively, were significantly higher in the high-sugar group (Fig. 3).

Lipids: Major lipid classes identified by the non-targeted lipidomics method (Fig. 4a) and the targeted FAMEs method (Fig. 4b) are shown in Fig. 4. Taken together, concentrations of phosphatidylserine (p = 0.005), phosphatidylglycerol (p = 0.0001), phosphatidylcholine (p = 0.001), monogalactosyldiacylglycerol (p = 0.004), sulfoquinovosyldiacylglycerol (p = 0.0001), digalactosyldiacylglycerol (p = 0.009), diglycerides (p = 0.0001) and fatty acids C16:0 (p = 0.0001), C16:1 (p = 0.0001), C18:1 (p = 0.0001), C18:2 (p = 0.0001) and C18:3 (p = 0.0001) were higher in the low-sugar group (Fig. 4; Tukey’s HSD). Lysophosphatidylethanolamine, lysophosphatidylglycerol, lysophosphatidylcholine, phosphatidic acid, phosphatidylmethanol, phosphatidylethanolamine, phosphatidylinositol, monogalactosylmonoacylglycerol, digalactosylmonoacylglycerol, monoglycerides and fatty acid C18:0, were not significantly different between the two groups (p > 0.05). Triglycerides alone were in higher concentrations in the high-sugar group (p = 0.002; Tukey’s HSD; Fig. 4). Overall, 100 lipid species belonging to 18 lipid classes were identified by the non-targeted lipidomics stream (Supplementary data 2).

Fig. 4
figure 4

Boxplots of peak intensities of lipid classes identified by a the non-targeted lipidomics method and b normalised peak intensities of fatty acid methyl esters (FAMEs) identified by the targeted method, between the high- and low-sugar groups. Different upper case letter codes, where presented between the two groups, indicate significantly different (p < 0.05) values by Tukey’s HSD. LPE lysophosphatidylethanolamine, LPG lysophosphatidylglycerol, LPC lysophosphatidylcholine, PA phosphatidic acid, Pme phosphatidylmethanol, PE phosphatidylethanolamine, PI phosphatidylinositol, PS phosphatidylserine, PG phosphatidylglycerol, PC phosphatidylcholine, TG triglyceride, MGMG monogalactosylmonoacylglycerol, DGMG digalactosylmonoacylglycerol, MGDG monogalactosyldiacylglycerol, SQDG sulfoquinovosyldiacylglycerol, DGDG digalactosyldiacylglycerol, MG monoglyceride, DG diglyceride. C16:0, C16:1, C18:0, C18:1, C18:2 and C18:3 refer to fatty acids with their respective number of carbon atoms and double bonds

Compound identification: Compounds identified in the current study based on matching with a local library of authentic standards, de novo matching with public domain mass spectral databases, and/or the Mummichog programme, are presented in Table 1. A matching with the local library was given maximum confidence (Level 1), followed by matching with spectral databases (Level 2). Supplementary Figure 8 shows one such match of quinic acid/quinate (C7H12O6; KEGG ID—C00296), where the extracted ion chromatogram for the parent mass [M–H] m/z 191.0554 from a sample in HILIC-negative ionisation stream co-elutes with those for diagnostic fragments m/z 173.0449 (C7H9O5), 127.0392 (C6H7O3), 111.0443 (C5H3O3) and 93.0336 (C6H5O) (Supplementary Figure 8A). Also, mass spectra of diagnostic fragments m/z 93.0336, 111.0078, and 127.0392 (Supplementary Figure 8B) matched with corresponding spectra in the public domain MS database METLIN28 (Supplementary Figure 8C).

Table 1 Summary of compounds identified by matching with a local library of authentic standards, public domain mass spectral databases and/or Mummichog, with their respective KEGG IDs, analytical stream, univariate statistics and level of confidence in identification

Metabolic features (m/z) and their tentative matches used by Mummichog for identification are presented in Table 1. As is evident, a single ion mass may relate to many compounds or a group of masses may relate to one compound. Nevertheless, the compound classes identified by Mummichog provides sufficient information to interrogate the data further towards a higher level of confidence (Level 2). In the case of coniferyl aldehyde (Table 1), which was initially identified by Mummichog using m/z 193.0501 and 223.0608 corresponding to M–H + O[−] and M + HCOO[−], respectively, the parent mass of coniferyl aldehyde (C10H10O3), [M–H] m/z 177.0550, and its diagnostic fragment m/z 162.0321 (C9H6O3) were subsequently queried in the sample and spectral databases. As explained in Supplementary Figure 8, co-elution of the extracted ion chromatograms of these features, and matching of these spectra in the sample with corresponding spectra in MassBank29, with a mass error of <5 ppm, led to tentative identification of coniferyl aldehyde with Level 2 confidence. Even so, redundancies in identifications by Mummichog, for example, d-Ribose 5-Phosphate, alpha-d-Galactose, beta-d-Glucose, beta-d-Galactose etc., all identified for m/z 145.0496, only conform to the identification of monosaccharides or monosaccharide phosphates in the high-sugar group. Therefore, Mummichog results helped identify potential leads for compound identification, albeit with a lower level of confidence.

Modules and pathway analysis: Of all identified compounds input into KEGG Mapper (Table 1) with rice pathways (osa) as a reference, 29 mapped to metabolic pathways (osa01100), 23 to the biosynthesis of secondary metabolites (osa01110), 8 to the biosynthesis of amino acids (osa01230), 6 to carbon metabolism (osa01200) and the rest to miscellaneous pathways related to primary and secondary metabolism (Table 2). Each pathway is characterised by several modules, and each module comprises several compounds and corresponding reactions. Redundancies in a single compound represented in multiple modules, and a single module accommodating several identified compounds, was observed (Table 2). A pictorial representation of compounds mapped to respective pathways/modules in the context of high-sugar grasses is depicted in Fig. 5.

Table 2 KEGG pathway modules and hierarchical bin structure for MapMan style representation of compounds identified in Table 1, matched with rice reference pathways (osa) using KEGG Mapper
Fig. 5
figure 5

Pictorial representation of biochemical activity of high-sugar grasses in MapMan77 (Ver 3.6.0RC1; copyright of the Max–Planck-Institute for Molecular Plant Physiology, Golm, Germany), depicting KEGG pathways and modules involving the compounds identified in Table 1, with rice pathways (osa) as a reference (Table 2). Dots represent compounds identified in respective modules, where red dots denote compounds with positive t stat value (high- > low-sugar group), and blue dots denote compounds with negative t stat value (high- < low-sugar group). CHO: carbohydrates, Bra-AA: branched chain amino acids, CoF and Vit: cofactor and vitamins, ABC: ATP-binding cassette

All major pathways, i.e., metabolic, biosynthesis of secondary metabolites, biosynthesis of amino acids and carbon metabolism, were broadly classified into nucleotide and amino acid, carbohydrate and lipid and energy metabolism modules (Fig. 5). Other modules related to primary metabolism comprised galactose metabolism, glycolysis, amino sugar metabolism, carbon fixation, oxocarboxylic acid metabolism, pentose phosphate pathway, ascorbate metabolism, pentose and glucuronate conversions, fructose and mannose metabolism, starch and sucrose metabolism, inositol phosphate metabolism and glycerophospholipid metabolism, while modules related to secondary metabolism comprised flavone, flavonol, anthocyanin, flavonoid and phenylpropanoid biosynthesis (Table 2; Fig. 5).

In all major pathways, compounds related to metabolism of the amino acids cysteine, methionine, serine, threonine, arginine and proline were found in low concentrations in the high-sugar group. However, their intermediate products (S)-2-Aceto-2-Hydroxybutanoate, O-Acetyl-l-Homoserine, (2S)-2-Isopropyl-3-Oxosuccinate and 2-Dehydropantoate were at higher concentrations (Fig. 5). These intermediate products were in turn involved in the biosynthesis of branched-chain amino acids leucine, isoleucine and valine (M00019, M00570 and M00432; Fig. 5; Table 2). On the other hand, compounds related to carbohydrate, lipid and energy metabolism were at high concentrations, while those related to the biosynthesis of secondary metabolites were low (Fig. 5). L-Serine and S-Adenosyl-l-Homocysteine were primarily involved in cysteine, methionine, serine and threonine metabolism (Modules M00035, M00609, M00021, M00338 and M00020; Table 2). l-Citrulline was involved in arginine and proline metabolism via the urea cycle (M00029), and (S)-2-Aceto-2-Hydroxybutanoate was involved in branched chain amino acid metabolism (M00019 and M00570). Glycerone phosphate, a breakdown product of fructose 1, 6-Biphosphate and an isomer of 3-Phosphoglyceraldehyde (3-PGA), d-Ribose 5-Phosphate, d-Xylulose 5-Phosphate and alpha-d-Glucose, all found in higher concentrations in the high-sugar group, were involved in the central carbohydrate metabolism via glycolysis/gluconeogenesis (M00001, M00002 and M00003) and the pentose phosphate pathway (M00007, M00580 and M00004; Fig. 5; Table 2). Other compounds such as myo-Inositol were involved in lipid metabolism via the inositol phosphate metabolism module (M00131: Table 2), and d-Arabinose 5-Phosphate was involved in lipopolysaccharide biosynthesis (M00063: Table 2), both reportedly at increased concentrations in the high-sugar group (Fig. 5).

l-Serine (low concentration), d-Galactose, d-Xylulose 5-Phosphate, alpha-d-Glucose and alpha,alpha-Trehalose (all high concentrations) were some of the metabolites involved in other carbohydrate metabolism modules which comprised photorespiration (M00532), galactose degradation (M00632), glucuronate pathway (M00014) and nucleotide sugar biosynthesis (M00549). Ryegrass being a C-3 plant30, the energy metabolism pathway comprising carbon fixation and methane metabolism modules involved d-Ribose 5-Phosphate (high concentration) in the Calvin cycle (M00165) and glycerone phosphate (high concentration) in formaldehyde assimilation (M00344 and M00345), respectively. Biosynthesis of secondary metabolites involved coniferyl aldehyde, sinapoyl aldehyde, 1-O-Sinapoyl-beta-d-Glucose and 1-O-(4-Coumaroyl)-beta-d-Glucose, all in low concentrations, and related to the biosynthesis of monolignols (M00039). Compounds classified as others’ represented metabolites that did not feature in any modules, but were indirectly involved in the corresponding pathways. In that light, compounds classified as others’ in metabolic pathways mainly comprised amino acids, sugars and sugar phosphates (predominantly in higher concentrations), and those classified under the biosynthesis of secondary metabolites comprised umbelliferone, vitexin, isovitexin, orientin, isoorientin, pelargonidin-3-O-beta-d-Glucoside, cyanidin-3-O-beta-d-Glucoside and cyanidin 5-O-beta-d-Glucoside, (predominantly in lower concentrations; Fig. 5; Table 2).

Compounds belonging to minor pathways were classified as those involved in primary or secondary metabolism (Fig. 5). Clearly, compounds related to the primary metabolism of sugars via galactose metabolism, glycolysis, amino sugar metabolism, carbon fixation, oxocarboxylic acid metabolism, pentose phosphate pathway, ascorbate metabolism, pentose and glucuronate conversions, fructose and mannose metabolism, starch and sucrose metabolism, inositol phosphate metabolism and lipids via glycerophospholipid metabolism, were all found at higher concentrations in high-sugar grasses. Likewise, compounds related to secondary metabolism via flavone, flavonol, anthocyanin, flavonoid, and phenylpropanoid biosynthesis, were all found in lower concentrations (Fig. 5; Table 2). Amongst the ABC transporters, l-Serine, a phosphate and amino acid transporter was found in low concentration, whereas myo-Inositol, maltose and alpha, alpha-Trehalose, mono- and oligosaccharide transporters, were found at higher concentrations (Fig. 5; Table 2). Quinate/Quinic acid, involved in the biosynthesis of phenylalanine, tyrosine and tryptophan via the shikimate pathway, was found at higher concentration in high-sugar grasses, and so were compounds involved in the biosynthesis of the branched chain amino acids valine, leucine and isoleucine (Fig. 5; Table 2).

Overall, a snapshot of metabolic pathways in high-sugar grasses revealed: an increase in concentrations of compounds involved in primary metabolism, mainly comprising sugars; decrease in concentrations of compounds involved in secondary metabolism, mainly comprising flavonoids and lignins; decrease in concentrations of compounds involved in cysteine, methionine, serine, arginine, proline and threonine metabolism, and; a concomitant increase in concentrations of compounds involved in the metabolism of branched chain amino acids valine, isoleucine and leucine, and aromatic amino acids phenylalanine, tyrosine and tryptophan.

Discussion

The distribution of 715 ryegrass genotypes based on their total sugar content (Fig. 1), provides an overview of their performance under natural NZ autumn conditions. High-sugar content in leaves has a direct relationship with the metabolisable energy of ryegrass, which in turn enhances protein capture and supply to the ruminant16,31. While, seasonal variation of fructan content has also been hypothesised32, we have only measured fructan content at a single seasonal time point. A time-resolved analysis of fructan diversity is therefore expected to shed more light on the genotype and/or genotype × environment interactions33 in the high-sugar group identified here (Supplementary Table 1).

We also hypothesised that the DP contributes to differences in total sugar content, both between and within the high- and low-sugar groups. Between the high- and low-sugar groups, all three DP classes (low, mid or high) were significantly higher in the high-sugar group (Fig. 2a). However, as reported earlier15, within the high-sugar group, the contribution of high-DP fructans to the total fructan content was greater than the low-DP fructans (Fig. 2a). For the standard high-sugar genotypes (Aber cultivars), high-DP fructans are considered critical in retaining the total sugar content under seasonal variation compared to normal genotypes34. Therefore, considering high-DP fructans are sought for breeding reliable high-sugar genotypes, 11 genotypes which have greater high-DP content than Aberdart, have been identified (Fig. 2c). Given that fructan accumulation and degradation in ryegrass is still poorly understood14, the negative correlation between high- and low-DPs within the high-sugar group (Fig. 2b), remains unexplained and deems further investigation. The metabolomics approach employed here for fructan analysis has therefore led to a better understanding of fructan accumulation in high-sugar genotypes prior to genotype selection. Moreover, the contrast provided here with the low-sugar group, in the context of lipids, polar and semi-polar compounds (Figs. 3 and 4), has established a platform to scrutinise underlying biological mechanisms (Fig. 5; Table 2).

Another route to high-metabolisable energy has been via increased lipid content in leaves35. Elevated levels of triglycerides/triacylglycerols and poly-unsaturated fatty acids at the cost of proteins, would lead to improved nitrogen utilisation and feed conversion efficiency of the ruminants35, resulting in enhanced nutritional value of the end-products36. Exclusive efforts aimed at high-lipid content, primarily based on metabolic engineering, are therefore in practice35,37,38. While fatty acids have been the focus of this endeavour36, other lipid classes or the lipidome have largely been ignored. The lipid profile presented here (Fig. 4a), therefore potentially marks the first comprehensive report of the ryegrass lipidome (Supplementary Data 2). A negative relationship between sugar and fatty acid content (Fig. 4b) has also been reported by Morgan36. When the sugar content is low due to diurnal, environmental or genotypic factors, plants have the ability to redirect cellular activity towards lipid catabolism, resulting in an increase in fatty acids39. Manipulating this carbon flux towards lipid biosynthesis has also been suggested to achieve elevated levels of lipids in plants37.

A few classes of phospholipids, i.e., phosphatidylserine, phosphatidylglycerol and phosphatiylcholine, diglycerides and galactolipids were found in higher concentrations in the low-sugar group and triglycerides were higher in the high-sugar group (Fig. 4a). Galactolipids are found in thylakoid membranes of chloroplasts and are abundant in photosynthetically active leaves40. A hypothesis that an increase in FA content correlates with chlorophyll content in leaves, and thereby galactolipids, was tested by Morgan36. In general, polar lipids mainly contributed by galactolipids had a positive correlation with FA content, whereas neutral lipids mainly comprising triglycerides had a negative correlation36. These results are in accordance with the present study (Fig. 4). Triglycerides are storage lipids and, until recently, were thought to be absent in leaves41. They are believed to be intermediate products in the catabolism of galactolipids to sucrose, mostly accumulating in leaves during the day41. While their role in lipid metabolism is largely unknown, their elevated levels in high-sugar grasses provides sufficient context to pursue this result further.

Few studies have investigated the impact of changes in primary metabolism on the fluxes into secondary metabolism42,43. Non-targeted metabolomics, through a wide coverage of primary and secondary metabolites as demonstrated in this study, is well positioned to elucidate global metabolic changes, and thereby the cross-linkages between primary and secondary metabolism. Combined with other –omics data, metabolomics provides a robust platform for biological interpretation. In spite of these inherent advantages, non-targeted studies are limited by the bottleneck of compound identification, resulting in compounds predominantly identified with low to medium-level confidence44. Taking cues from non-targeted studies and proceeding towards unequivocal identification using targeted approaches is therefore recommended. Here, a snapshot of metabolic activity in the high-sugar group (Fig. 5) revealed an increase in concentrations of compounds involved in primary metabolism, mainly comprising sugars, a decrease in concentrations of compounds related to metabolism of the amino acids cysteine, methionine, serine, threonine, arginine and proline, increase in concentrations of compounds involved in the biosynthesis of aromatic and branched chain amino acids, and a concomitant decrease in compounds involved in secondary metabolism, mainly comprising flavonoids and lignins.

Plant secondary metabolites are of interest due to their bewildering diversity, and their roles in defence, stress response, UV protection, allelopathy and signalling45. The link between carbohydrate and secondary metabolism is primarily mediated by the shikimate pathway, through synthesis of aromatic amino acids, leading into the phenylpropanoid pathway46,47. Quinic acid, a constituent of the phenylpropanoid pathway, is a precursor of chlorogenic acid47, and also involved in the biosynthesis of lignins and flavonols. Quinic acid was found at higher concentrations in the high-sugar group, and on the other hand, coniferyl aldehyde and sinapoyl aldehyde (lignin subunits48), were found in lower concentrations (Table 1). Chlorogenic acid concentrations were not significantly different between the high- and low-sugar groups. In sorghum (Sorghum bicolor) plants with high biomass, quinic acid and lignin contents were high, whereas in the high-sugar ryegrass cultivar ‘Aberdove’, the concentration of chlorogenic acid was high49. A partitioning of quinic acid towards biosynthesis of either lignin or chlorogenic acid was therefore hypothesised47. However, the preferential flux towards either of these compounds is unknown. Grass-fibre composition is another key target of breeding which affects feed intake and digestibility50. This is largely determined by cell wall components, and lignins, complex polyphenolic polymers and end products of the phenylpropanoid pathway are amongst the major cell wall constituents51. The shikimate pathway described here in the context of total sugar content, should persuade further studies on secondary metabolism in ryegrass. In addition to the shikimate pathway, the interface between primary and secondary metabolism in plants is far more complex, with the production of secondary metabolites associated with glycolysis, TCA cycle, aliphatic amino acids, pentose phosphate pathway52 and nitrogen status53.

Towards the primary objective of screening for high-sugar cultivars, we set-off with a survey of 715 ryegrass genotypes based on their fructan content. However, breeding for select traits requires a thorough understanding of the population wide diversity of these traits, and their genetic control mechanisms21. Non-targeted metabolomics studies, as employed here, delivered the broadest coverage of metabolites, thereby enabling a better understanding of the underlying biochemical mechanisms. A snapshot of metabolites and the corresponding modules/pathways they represent in the high-sugar group (Fig. 5), established key insights on primary and secondary metabolism, that merit further investigation. This study therefore signifies one of many avenues that can be explored with these data. For example, the low-sugar group indirectly led to genotypes with greater lipid content, thereby creating opportunities to tap into lipid content through breeding applications. Likewise, the flux from primary to secondary metabolism, as reported here, may facilitate breeding strategies specifically targeting select secondary metabolites. Studies exploring these avenues in secondary metabolites are already underway20,54. In conclusion, we have established levers that help explain biochemical activity in ryegrass, which when operated towards specific objectives can deliver desired traits. The raw files from this study, maintained at MetaboLights database, we envisage will cater to further explorations towards these objectives.

Methods

Experimental

Plant material: Ryegrass seeds were obtained from the Margot Forde Germplasm Centre at AgResearch Limited, Grasslands Research Centre, Palmerston North, New Zealand. Seeds of 724 genotypes (denoted by a specific code, e.g., PG0698) which included NZ cultivars (82), natural population/ecotypes (49), NZ elite breeding populations (336), overseas cultivars (181), enhanced germplasm (71) and unknown cultivars (5) were selected. Five clonal replicates (n = 5; denoted as PG0698_1 for the first replicate) per genotype were cultivated and used for metabolomics studies. Plant trial conditions are described in Supplementary Methods.

Perennial ryegrass forms a natural symbiotic association with epichloae fungi (Epichloë spp.)50, which produce metabolites that protect the host from biotic and abiotic stresses55. It has been shown that infection with these endophytes affects the metabolic profiles of ryegrass49,56. To ensure that the ryegrass metabolome alone was analysed and reported, endophyte-free plants were used. This was achieved by heat and fungicide treatment of seed to kill the fungus, prior to germinating and growing the plants. After growing out and prior to clonal replication, the endophyte-free status of plants was assessed by an immunoblot analysis57 of 10 tillers per plant. Subsequently, the presence of peramine, a pyrrolopyrazine alkaloid58 specific to most endophytes, was also monitored. Nine genotypes that showed presence of peramine were identified and excluded (Supplementary Figure 9). This resulted in 715 genotypes from 118 populations for subsequent analyses. Here, a genotype refers to a group of plants with the same genetic makeup, while a population consists of one or more genotypes grouped together based on a common trait (genotypic/phenotypic/geographical origin). For example, PG0698 refers to the genotype in a population named ‘Hillary’, a NZ based cultivar (Supplementary Table 1).

Extractions: At 60 days after transplanting, leaves were harvested during late autumn (May 2012), snap-frozen in liquid nitrogen, freeze-dried, ground to a coarse powder, transferred to glass vials and stored at −80 °C until further use. Five aliquots of 50 mg each (4 for analyses and 1 standby) of each sample were weighed into microcentrifuge tubes. A surrogate QC sample59 comprising random amounts of ground ryegrass leaf material, irrespective of the presence or absence of endophyte was used. Since this QC sample was not representative of the sample set, it was solely used for monitoring sample degradation and for tracking run-order effects within a batch and not used for batch normalisations or ensuing data processing.

Polar and semi-polar compounds. A single aliquot (50 mg) was used for analysing both polar and semi-polar compounds. One millilitre (1 ml) of extraction solvent comprising acetonitrile:water containing 0.1% formic acid (50:50, v/v), and 2′,7′-Dichlorofluorescein (1 µg/ml; CAS No. 76-54-0; MW = 401.2) and l-Tyrosine-3, 3-d2 (10 µg/ml; CAS No. 72963-72-0; MW = 183.20) as internal standards for semi-polar and polar compounds respectively, were added to the aliquot. A ceramic bead was added and samples were mixed in a bead-mill homogeniser (TissueLyser II; Qiagen, Valencia, CA, USA) for 3 min, after which they were centrifuged for 6 min (18188g). Approximately 500 µl of the aqueous extract was transferred to an autosampler vial and stored at −20 °C until analysis.

Lipids. For lipids, all procedures were similar to that of polar and semi-polar compounds, except that the extraction solvent was made of damp chloroform:methanol (67:33, v/v), containing 2′,7′-Dichlorofluorescein (20 µg/ml) as an internal standard.

Fructans. Boiling water (1.5 ml) was added to another aliquot (50 mg). After homogenisation, samples were placed in a hot water bath (90 °C) for 30 min. Samples were cooled to room temperature, centrifuged, and approximately 500 µl of the aqueous extract was transferred to an autosampler vial for analysis. Samples were stored (−20 °C) for a minimal period to avoid fructan degradation.

Fatty acid methyl esters (FAMEs). After extraction of fructans, the remaining supernatant was discarded and the pellet was freeze-dried overnight. The dried aliquot was transferred to a 15 ml Falcon tube containing 50 µl of pentadecanoic acid (C15:0; CAS No. 1002-84-2; MW = 242.4) in heptane (4 mg/ml) as an internal standard. After the addition of 1 ml of 1 M methanolic HCl reagent60, the tubes were purged with nitrogen, sealed and heated in a hot water bath (80 °C) for 1 h. Tubes were then cooled to room temperature, and 50 µl of heptadecanoic acid methyl ester (Premethyl ester C17:0; CAS No. 1731-92-6; MW = 284.48) in heptane (4 mg/ml) was added as another internal standard. A total of 600 µl of heptane and 1 ml of 0.9% sodium chloride (w/v; NaCl) were added and the tubes were manually shaken to extract FAMEs into the heptane phase. Totally, 150 µl of the heptane layer was transferred to a 250 µl glass insert fitted in an autosampler vial for GC–MS analysis.

Chromatography and tandem mass-spectrometry: (U)HPLC-MS. (U)HPLC-MS conditions for analysis of polar and semi-polar compounds using ZIC-pHILIC61 and C1862 columns, were as described in respective references. An Exactive Orbitrap (Thermo Fisher Scientific, Waltham, MA, USA) mass spectrometer with electrospray ionisation was used for analyses in both positive and negative modes. Fructans were analysed as described by Harrison, et al.63 using a porous graphitised carbon column and LTQ linear ion trap mass spectrometer (Thermo Fisher Scientific, Waltham, MA, USA) with electrospray ionisation in negative mode.

The Thermo LC–MS system (Thermo Fisher Scientific, Waltham, MA, USA) for analysis of lipids consisted of an Accela 1250 quaternary UHPLC pump, a PAL auto-sampler fitted with a 15,000 psi injection valve (CTC Analytics AG., Zwingen, Switzerland) a 20 μl injection loop, and a Q-Exactive OrbitrapTM mass spectrometer with electrospray ionisation. A C1 column (50 × 2.1 mm, 5 µm; Thermo Fisher Scientific, Waltham, MA, USA), maintained at 25 °C with a gradient elution programme and a flow rate of 500 µl/min, was used for chromatographic separation. The mobile phase comprised water containing 0.1% formic acid (Solvent A) and isopropanol:acetonitrile containing 0.1% formic acid (50:50, v/v; Solvent B). The gradient was set to hold solvent A at 80% from 0 to 1 mins, gradually decline to 0% at 18.1 mins, maintained at the same level up to 20 mins, increased to 80% at 20.1 mins and finally allowed to equilibrate as such for the rest of the programme, i.e., 25 min. The samples were cooled in the auto-sampler at 4 °C and the injection volume of each sample was 2 μl. The first 1.5 min and the last 5 min of the chromatogram were diverted to waste. Both full and data dependent MS2 scans were collected in profile data acquisition mode. For full scan mode, a mass resolution setting of 35,000 was set to record a mass range of m/z 200–2000 with a maximum trap fill time of 250 ms. For MS2 scan mode, the same mass resolution setting was maintained with a maximum trap fill time of 120 ms. The isolation window of selected MS1 scans was ± 1.5 m/z with a collision energy of 30 eV. Samples were run in both positive and negative ionisation modes separately. Positive-ion mode parameters were as follows: spray voltage, 4.0 kV; capillary temperature, 275 °C; capillary voltage, 90 V, tube lens 120 V. Negative-ion mode parameters were as follows: spray voltage, −4.0 kV; capillary temperature, 275 °C; capillary voltage, −90 V, tube lens −100 V. The nitrogen source gas desolvation settings were the same for both modes (arbitrary units): sheath gas, 40; auxiliary gas, 10; sweep gas, 5. The Xcalibur software package provided by the manufacturer was used to create these settings.

GC–MS. Analysis of FAMEs was undertaken using a Thermo DSQ II Trace Ultra gas chromatograph (Thermo Fisher Scientific, Waltham, MA, USA) fitted with a DB5 GC capillary column. GC–MS conditions were as described by Browse et al.60.

Standards and reagents: All standards were purchased from Sigma–Aldrich Chemicals Co. (St. Louis, MO). Ultrapure water was obtained from a Milli-Q system (Millipore, Bedford, MA). All solvents used were of Optima LC–MS grade and were purchased from Thermo Fisher Scientific (Auckland, New Zealand).

Data analysis

Sample sequence and batches: For each stream of analysis, samples (3575) were systematically randomised across 36 batches, making sure that two clonal replicates of a genotype were not present in the same batch. A single batch comprised approximately 100 samples interspersed with a QC sample for every 10 samples (10–12 QC samples per batch). The sequence comprised blanks, QC and samples run in positive mode, followed by the same order in negative mode. Fructans and FAMEs were analysed in negative-63 and positive- ionisation modes alone, respectively.

QC monitoring and troubleshooting: For each batch, the quality of runs was determined by constantly monitoring the respective internal standard in QC samples for: (1) consistency in retention time (±0.5 min), (2) mass accuracy (±5 ppm) and (3) signal intensity. In the instance of constant drift in one of these parameters, the batch was immediately stopped and re-run after recalibrating the mass spectrometer. After completion of the run sequence, samples of each batch were retained at −20 °C until another QC check (based on PCA) was conducted. PCA of samples classified based on run-order within each batch was used to reveal any significant run-order effects. Where a significant run-order effect was apparent, the batches were re-run. Otherwise, batches were passed on to a super batch (a collection of batches that have passed the QC tests), for further processing. Score plots of PCA provide a simple and quick qualitative assessment of variability within a sample set64.

Fructans: Fructan identification and DP measurements were based on XCMS65 (Supplementary Table 2) and an in-house R script. Essentially, a target list from DP2 to DP20, denoting the parent mass, dimer and commonly formed adducts (formic acid and chlorine) for each DP, with corresponding retention times, was created based on respective extracted ion chromatograms. The script was used for (1) peak detection, (2) peak grouping, (3) retention time correction, (4) peak filling, (5) normalisation of run-order within each batch20, (6) normalisation of batch-effects using comBat66, (7) matching the resultant data matrix with the target list and (8) creating a table with peak intensities for each DP, including the molecular ion, dimer and adduct masses, across all samples.

Peak intensities of all ions representing each DP were summed, and the sum was multiplied by the number of hexose units, to establish a common baseline for comparisons. An exemplar is presented in the case of DP3 (Supplementary Table 3). As a measure of the total sugar content, fructans in the low (DP3, 4 and 5), mid (DP10, 11 and 12) and high (DP18, 19 and 20) DP ranges were added.

To delineate maximum separation between high- and low-sugar groups and maintain consistency of results, a two-tier criterion was established. First, the 715 genotypes (n = 5) were ranked as top 10% and bottom 10% based on total sugar content, and second, the full sample set (3575) was ranked as top 10% and bottom 10% based on total sugar content. Within the top 10% and bottom 10% of genotypes identified, only genotypes with three or more (out of five) clonal replicates were selected for comparisons. This two-tier criterion enabled identification of genotypes with minimum variation. Raw files corresponding to these high- (n = 133) and low-sugar (n = 106) samples from other analytical streams (polar, semi-polar, FAMEs and lipids) were collated for further comparative analysis.

Lipids, polar and semi-polar compounds: XCMS65 (Supplementary Table 2) and in-house R scripts with appropriate parameters for UHPLC (C18) and HPLC (Lipids and HILIC) settings were used for: (1) peak detection using centwave; (2) grouping; (3) retention time alignment using obiwarp; (4) peak re-grouping and (5) filling of missing peaks.

The resultant data matrices were cleaned and post-processed by using: (1) diffreport function of XCMS to generate extracted ion chromatograms of all identified peaks, and eliminating the ones that represented background noise, and; (2) CAMERA67 to identify and eliminate isotopes (HILIC and C18) and (3) normalisation for batch-effects using comBat66. The final data matrices of metabolic features were used for local library matching and statistical analysis. Metabolic features, defined here as molecular entities or ion types with a unique mass-to-charge ratio (m/z) and retention time.

LipidSearch and local library matching: Lipid identification was performed using LipidSearch software (Thermo Fisher Scientific, USA)68. Raw lipid files corresponding to high- and low-sugar groups were uploaded to LipidSearch separately for positive and negative ionisation modes. Product ion search on Q-ExactiveTM data was selected with a mass tolerance of 6 ppm for precursor ions and 10 ppm for product ions, along with a selection of lipid classes (Supplementary Table 4). Lipid species/classes identified for each file were then merged based on retention time alignments and a single file with the merged results, one each for each ionisation mode, was generated. This library of identified lipids was matched against the final data matrices based on parent mass and retention times. Peak intensities generated by XCMS settings for the identified lipids were ultimately used for further statistical analyses.

The lipidomics results shown here were obtained by a low level data fusion25 of the positive and negative ionisation modes by horizontal concatenation, and taking an average of the different lipid ions that were detected for a specific lipid class. For example, an average of the peak intensities of lysophosphatidylethanolamine LPE(16:0) − H, LPE(18:3) − H, LPE(18:2) − H, LPE(16:0) + H, LPE(18:3) + H and LPE(18:2) + H was taken to obtain the overall peak intensity of the LPE class.

Authentic standards, mostly plant based, from the AgResearch chemical inventory were run through the HILIC and C18 streams, under conditions identical to the current study. Parent masses in respective ionisation modes and retention times were listed in a table, and this library of authentic standards was matched against the corresponding final data matrices with tolerances of 5 ppm for parent mass and 3 s for retention time, to identify any hits. The libraries had 297, 170, 142 and 227 parent masses to match in C18 positive and negative, and HILIC positive- and negative-ionisation modes, and predominantly contained amino acids, secondary metabolites and their derivatives in C18, and amino acids, sugars, organic acids and their derivatives in HILIC streams, respectively.

Statistical analysis: Statistical analyses were performed using MetaboAnalyst ver 3.0, an online metabolomics analysis suite69. A data scaling procedure (auto-scaling) was carried out, where the data were normalised (mean-centred and divided by standard deviation of each variable) so that the features (peak intensities) are comparable.

Univariate and multivariate data analyses70 were conducted. For univariate analysis using t tests, features detected as significant with a false discovery rate cut-off of p < 0.05 were evaluated. For the multivariate approach, PCA was used to interrogate the data. Score plots which display the distribution of each sample along the composite variables of the score plot graph are presented.

Minitab 18 (Minitab Inc., USA) software was used to generate the interval plot (Fig. 1), cloud plot (Fig. 3), and boxplots (Fig. 2a, Fig.4); MS Excel (Microsoft Inc., USA) was used to generate Fig. 2c; and an online correlation analysis tool (www.sthda.com) was used to generate Fig. 2b.

Pathway analysis and compound identification: While eliminating redundant signals is vital for the identification of biomarkers, retaining these signals will facilitate network predictions25. Therefore, the raw data matrix obtained from the diffreport function (high- vs. low-sugar groups) of XCMS was used for pathway analysis. Network predictions/pathway analysis was performed using Mummichog71, a programme that combines metabolite prediction and network analysis in one step72,73. In contrast to the traditional approach, Mummichog first populates related spectral features to a network, with the hypothesis that if features reflect biological activity, then the metabolites they represent must exhibit enrichment in the local network. Metabolite identifications along with enriched pathway modules impart a broad understanding of tentative mechanisms, and provides scope for further probing. Input files for Mummichog from the HILIC and C18 streams comprised m/z values, retention time, p values and fold-change values. Due to its genetic synteny with ryegrass74, metabolic networks/pathways of barley (Hordeum vulgare) from the Plant Metabolic Network (PMN), www.plantcyc.org, December, 2016, were used as a reference. Appropriate parameters were selected for the ionisation mode, instrument (Orbitrap) and mandatory identification of parent masses, while other parameters were set as default.

Significant nodes of all four streams from respective activity networks were collated, and m/z features showing fold-change greater than 1.5 alone were selected. Features that matched authentic standards in the local library, and manually identified compounds that had fold-change greater than 1.5, were also added to this list. KEGG IDs for these shortlisted features were uploaded to KEGG Mapper75, http://www.kegg.jp/kegg/tool/map_pathway2.html, December, 2016. Modules directly involving the identified compounds with reference to rice (Oryza sativa) pathways, were displayed.

In addition to compounds identified with level 1 confidence44 by matching with a local library of authentic standards, de novo compound identification with level 2 confidence was performed as demonstrated by Subbaraj et al.76. Compounds identified by Mummichog were only given a level 3 status, a level that implies compound class.

Pathway visualisation using MapMan: MapMan uses a hierarchical ontology approach to visualise pathways and their corresponding modules in a functional context77. A customised map, based on the module/pathway data generated by KEGG Mapper, was built for the current study. The experimental data file comprised the compounds identified in Table 1, along with the respective t stat values. KEGG Mapper results were sorted into appropriate bins in accordance with MapMan syntax (Table 2) and added as the mapping file. Finally, a custom made picture (.bmp) that accommodates all the identified pathways and modules was added to the pathways folder. The ImageAnnotator module of MapMan uses mapping files as its data source, and then paints the input experimental data to the custom made images/maps according to the hierarchical structure of the mapping files77.

A summarised overview of data analysis is presented in Fig. 6.

Fig. 6
figure 6

Summarised overview of data analysis from raw files of fructan data through to total sugar content, classification of high- and low-sugar groups, and subsequent analysis of fatty acid methyl esters (FAMEs), lipids, polar and semi-polar compounds