Understanding how information encoded in the DNA is transcribed to RNA, translated to proteins and other downstream endophenotypes, such as metabolites, and how this information dictates the organismal functional phenotype is in the core of several biological disciplines. While the revolutionary work that led to the discovery of this central flow of genetic information within biological systems was published more than 50 years ago (Crick 1970) we are still making progress in understanding the genotype–phenotype map. This is aided by new technologies within molecular and systems biology, allowing researchers to obtain full genome sequences from individuals of any species, detailed information about expression levels of all genes, abundancies of proteins and metabolites, etc. These omics tools have provided unforeseen knowledge about the genetic and environmental background of complex traits, which has revolutionized the field of genetics with strong impacts on multiple research disciplines including medicine, animal and plant breeding and evolutionary biology (Dekkers 2012; Elmer 2016; Hasin et al. 2017; Pinu et al. 2019).

One common goal of studies on genotype–phenotype associations is to understand to what extent complex organismal phenotypes, such as behavioural traits, traits linked to reproduction, diseases, yield or the ability to cope with stressful environmental conditions, can be predicted from DNA sequence information or from endophenotypes. If information on endophenotypes, such as transcripts, proteins or metabolites, can accurately predict the phenotype this has wide ranging applications across life sciences, and this has proven useful in several cases (Buckler et al. 2009; Hayes and Goddard 2010; Desta and Ortiz 2014; Hickey et al. 2017; Grinberg et al. 2019). However, studies have also demonstrated that predicting complex phenotypes based on genetic information can often be difficult and the predictive power in such studies is typically low (Schrodi et al. 2014; Märtens et al. 2016; Sun et al. 2016). Reasons for this are many and include (1) that quantitative trait values result from a complex interplay between a large number of genes, each with a small contribution to the phenotype, combined with environmental factors, (2) despite progress, the underlying genetic architectures of most traits of medical interest or traits with relevance in agriculture or evolutionary biology are still not well-understood, (3) the effect of so-called candidate genes is often depending on the genetic background (epistasis) and they explain only a small proportion of the heritability and (4) genes and environments interact in their effect on the phenotype. These factors give rise to substantial challenges in constructing and implementing genetic risk prediction models across biological disciplines.

Challenges with using DNA sequence variation to predict variation at the organismal phenotypic level have sparked an interest in using endophenotypes as predictors of complex functional phenotypes (Scoriels et al. 2015; Hayes et al. 2017; te Pas et al. 2017; Van Der Ende et al. 2018). Endophenotypes influence and regulate the functional phenotype and in contrast to the genotype, which is fixed in an individual’s lifetime, they are governed by interactions between the genome of an individual and internal and external influences that range from the cellular level and to the wealth of external biotic and abiotic factors an individual is exposed to in its environment. Thus, endophenotypes have been proposed to constitute proximal links between variation at the genome level and the organismal phenotype and they may provide more accurate predictors of the functional phenotype compared to the genotype (te Pas et al. 2017; Zhou et al. 2020).

The predictive value of endophenotypes for organismal functional phenotypes is likely linked to the proximity of the endophenotype to the organismal phenotype (Fiehn 2002; Civelek and Lusis 2014; Hasin et al. 2017; Zampieri and Sauer 2017; Zhou et al. 2020). Since the abundancies of metabolites can be seen as an ultimate molecular response of biological systems to genetic or environmental changes, information on the metabolome level may provide more accurate prediction than, e.g., information at the gene expression or protein levels (Xu et al. 2016; Bahado-Singh et al. 2017). This is an emerging research field and we do not have many results yet. However, there are studies that support the hypothesis that transcriptomic or proteomic data combined with genotype information improve prediction of several traits in, e.g., Drosophila melanogaster and maize (Wang and Marcotte 2010; Harel et al. 2019; Li et al. 2019; Azodi et al. 2020). A recent study based on investigating 453 metabolites in 40 isogenic lines suggest that the metabolome might constitute a reliable predictor of organismal phenotypes and that the metabolome provide novel insights into the underpinnings of complex traits and its genetic basis (Zhou et al. 2020).

Here we elaborate on the findings from Zhou et al. (2020) by using nuclear magnetic resonance (NMR) metabolomics to obtain information about D. melanogaster metabolomes in pooled samples of whole male flies from 170 inbred lines from the D. melanogaster Genetic Reference Panel (DGRP); a system of fully inbred sequenced lines of D. melanogaster (Mackay et al. 2012; Huang et al. 2014). NMR metabolomics constitute a highly reproducible technique that in contrast to mass spectrometry allows for metabolic profiling of the total complement of metabolites in a sample (Emwas 2015). With this set up, we first investigated to what extent the metabolome varies across the investigated DGRP lines and whether this variation was heritable. Second, we performed a genome-wide scan to detect DNA sequence variants that were associated with variation in NMR data point intensity. Finally, we investigated to what degree metabolomic data could increase prediction accuracy (PA) of five complex behavioural and stress tolerance phenotypes compared to when predictions were based solely on DNA sequence variation.

Materials and methods

Drosophila melanogaster lines, husbandry and collection

We used 170 inbred lines of the DGRP (Mackay et al. 2012; Huang et al. 2014). The DGRP lines were established by 20 consecutive generations of full sibling inbreeding from isofemales collected at the farmer’s market in Raleigh, NC. Complete genome sequence of the DGRP lines has been obtained using Illumina platform and is publicly available (Mackay et al. 2012; Huang et al. 2014).

The DGRP lines were maintained on standard Drosophila diet consisting of yeast, sucrose, oatmeal and agar, mixed with tap water. Following autoclaving, nipagin and acetic acid were added to the diet (see Kristensen et al. 2016 for recipe details). Flies were maintained in a 23 °C climate chamber at 50% relative humidity with a 12:12 h light:dark cycle. To generate experimental flies, 20 adult flies (age 3–4 days) from each DGRP line were allowed to reproduce for ~12 h in each of ten replicated vials with 7 mL medium after which they were discarded. Density of developing larvae was not controlled in the vials. The number of emerging flies was below 80 in all vials and for all lines (data not shown). This level of larval density is considered to be optimal for developing D. melanogaster cultures (Barker and Podger 1970; Lefranc and Bundgaard 2000). The first emerging male flies from these vials were collected when <24 h old. Flies from the ten replicates were pooled and distributed randomly to five new vials with fresh food with 20 individuals per vial. Flies were keep in these vials for ~72 h and then transferred to Eppendorf tubes and immediately thereafter snap frozen in liquid nitrogen. In cases where we did not get four replicates with 20 males, we supplemented with flies from the extra vial so that we ended out with four replicates with 20 males from each of 170 DGRP lines.

Drosophila melanogaster sample preparation

We prepared four replicates of 20 pooled male flies for NMR spectroscopy using a routinely used protocol (Malmendal et al. 2006; Schou et al. 2017). Males were snap frozen at 3–4 days of age and kept at –80 °C. Samples were mechanically homogenized with a Kinematica, Pt 1200 (Buch & Holm A/S, Herlev, Denmark) in 1 mL of ice-cold acetonitrile (50%) for 45 s. Hereafter samples were centrifuged (10,000g) for 10 min at 4 °C and the supernatant (900 μL) was transferred to new tubes, snap frozen and stored at –80 °C. The supernatant was lyophilized and stored at −80 °C. Immediately before NMR measurements, samples were rehydrated in 200 mL of 50 mM phosphate buffer (pH 7.4) in D2O, and 180 μL was transferred to a 3 mm NMR tube. The buffer contained 50 mg/L of the chemical shift reference 2,2-Dimethyl-2-silapentane-5-sulfonate-d6 sodium salt (DSS), and 50 mg/L of sodium azide to prevent bacterial growth.

NMR experiments and spectral processing

NMR measurements were performed at 25 °C on a Bruker Avance III HD 800 spectrometer (Bruker Biospin, Rheinstetten, Germany), operating at a 1H frequency of 799.87 MHz, and equipped with a 3 mm TCI cold probe. 1H NMR spectra were acquired using a standard NOESYPR1D experiment with a 100 ms delay. A total of 128 transients of 32 K data points spanning a spectral width of 20 ppm were collected.

The spectra were processed using Topspin (Bruker Biospin, Rheinstetten, Germany). An exponential line broadening of 0.3 Hz was applied to the free-induction decay prior to Fourier transformation. All spectra were referenced to the DSS signal at 0 ppm, manually phased and baseline corrected. The spectra were aligned using icoshift (Savorani et al. 2010). The region around the residual water signal (4.87–4.70 ppm) was removed in order for the water signal not to interfere with the analysis. The high- and low-field ends of the spectrum, where no signals except the reference signal from DSS appear, were also removed (i.e., leaving data between 9.7 and 0.7 ppm). The spectra were normalized by probabilistic quotient area normalization (Dieterle et al. 2006). In order to reduce the size of the NMR data, each two NMR data points were averaged resulting in a final number of 14,440 NMR data points.

Metabolite assignments were done based on chemical shifts, using earlier assignments and spectral databases previously described (Malmendal et al. 2006; Cui et al. 2008; Ulrich et al. 2008; Schou et al. 2017; Wishart et al. 2018) together with Chenomx NMR Suite (Chenomx Inc.).

Quantitative genetics of NMR intensities

Each NMR data point (14,440 in total) was treated as a quantitative trait. For each NMR data point, we fitted a linear mixed model to partition the total phenotypic variation (i.e., one NMR data point) into genetic and environmental variation. Using the R package qgg (Rohde et al. 2020), we fitted the model:

$${\boldsymbol{y}} = {\boldsymbol{Xb}} + {\boldsymbol{Zg}} + {\boldsymbol{e}}$$

where y was a vector containing the replicated measurements of intensities for a particular NMR data point (three to four replicates per DGRP line), X and Z are design matrices linking fixed and random effects to the phenotype, b is a vector of the fixed effects (Wolbachia infection status, and major polymorphic inversions; In2Lt, In2RNS, In2RY1, In2RY2, In3LP, In3LM, In3LY, In3RP, In3RK, In3RMo, In3RC. Information available at, g is a vector of the random genetic effects defined as \({\boldsymbol{g}}\sim N(0,{\boldsymbol{G}}\sigma _g^2)\) and e is a vector of residual effects defined as \({\boldsymbol{e}}\sim N(0,{\boldsymbol{I}}\sigma _e^2)\). The variance structure among the random effects is modelled as independent for the residual effects (by the identity matrix I), and for the genetic effects by the additive genomic relationship matrix G, which was constructed using all SNPs (minor allele frequency ≥ 0.05) as \({\boldsymbol{G}} = {\boldsymbol{WW}}^\prime /m\), where m is the number of SNPs (i.e., 1,725,755), and W is a centred and scaled genotype matrix, where each column vector is \({\boldsymbol{w}}_i = \frac{{{\boldsymbol{a}}_i - 2p_i}}{{\sqrt {2p_i(1 - 2p_i)} }}\), pi is the allele frequency of the ith SNP, and ai is the ith column vector of the allele count matrix, A, which contains the genotypes coded as 0 or 2 counting the number of the minor allele (genotypes are available at The G-matrix was upscaled to a block matrix to match the replicated measurements of the NMR intensities.

For each NMR data point we estimated the proportion of phenotypic variation explained by SNP variation as \({\hat h_{{\mathrm{SNP}}}^2} = \frac{{\hat \sigma _g^2}}{{\hat \sigma _g^2 + \hat \sigma _e^2}}\), where \({\hat \sigma} _g^2\) and \({\hat \sigma} _e^2\) are the estimated variance components from Eq. (1). The significance of each \({\hat h}_{{\mathrm{SNP}}}^2\) was determined as \({\hat h}_{{\mathrm{SNP}}}^2 - {\mathrm{SE}}\left( {\hat h_{{\mathrm{SNP}}}^2} \right) \times Z > 0\), where Z is the quantile function of the normal distribution at probability P = 0.05/14,440. Thus, the resulting set of heritability estimates are those estimates that differ significantly from zero when accounting for a total of 14,440 statistical tests.

Mapping of metabolome QTL

Metabolome quantitative trait loci (mQTLs) for mean NMR intensity were identified using linear regression (Huang et al. 2015) using the function for single marker association analysis implemented in the qgg package (Rohde et al. 2020). The estimated genetic effects (\({\hat{\boldsymbol g}}\), from Eq. (1)) were used as line means, since these values represent the within-DGRP-line mean intensity of a single NMR data point adjusted for Wolbachia, chromosomal inversions and polygenicity, which then was regressed on marker genotypes.

Significant mQTLs were identified as those SNP–NMR associations where the heritability estimate for the NMR data point was significant and further that the P value was below \(\frac{{0.05/1,725,755}}{{14,440 \times 0.39}} = 5.14\, \times \,10^{ - 12}\) to account for both the number of SNP associations performed per NMR data point and the total NMR data points analyzed (i.e., 14,440 × 0.39). Significant mQTLs were annotated to the D. melanogaster genome using FlyBase annotation v.5.57 (

Phenotypic predictions

Using the linear mixed model (BLUP; best linear unbiased prediction) framework, we performed several phenotypic prediction models using either genomic best linear unbiased prediction (GBLUP) or metabolomic best linear unbiased prediction (MBLUP) information to investigate if the metabolome provides additional information that will increase the accuracy of prediction compared to genomic prediction. The DGRP has been characterized for a wide range of physiological, morphological and behavioural phenotypes (Anholt and Mackay 2018; Mackay and Huang 2018). When the number of individual genotypes is low (i.e., the number of DGRP lines), a large number of within-line replicates are required for accurate predictions (Edwards et al. 2016). Therefore, we restricted the set of quantitative traits to those where we had access to all individual observations, and where the average number of observations within line was >25 (Table 1).

Table 1 Quantitative traits used in phenotypic predictions.

The five test traits were initially adjusted for experimental factors (see references in Table 1), Wolbachia infection status and major polymorphic inversions (In2Lt, In2RNS, In3RP, In3RK, In3RMo). The adjusted phenotypic values were obtained as: \({\tilde{\boldsymbol y}}_l = \hat L_l + {\hat{\boldsymbol e}}_l\), where \(\hat L_l\) is the estimated line effect for DGRP line l (i.e., the BLUP value), and \({\hat{\boldsymbol e}}_l\) is a vector containing the residuals for line l. Thus, yl and \({\tilde{\boldsymbol y}}_l\) have the same dimension. The estimated line effects (\({\hat{\boldsymbol L}}\)) were assumed \({\boldsymbol{L}}\sim N(0,\;{\boldsymbol{I}}\sigma _L^2)\). The metabolome contains aggregated information both on the individual genotypes and environmental exposures. To avoid double counting the genomic variation, as represented by SNP information by adding genomic information in both the two steps in the GBLUP and MBLUP analysis, we assumed the DGRP lines to be independent, modelled by the identity block matrix I, instead of G in the first step.

For each quantitative trait (Table 1), we fitted two models, each for 100 randomly selected training sets (t, the training sets were the same for all prediction models) containing 90% of the DGRP lines:

$${\tilde{\boldsymbol y}}_t = {\boldsymbol{Z}}_t{\boldsymbol{g}}_t + {\boldsymbol{e}}_t$$
$${\tilde{\boldsymbol y}}_t = {\boldsymbol{Z}}_t{\boldsymbol{m}}_t + {\boldsymbol{e}}_t$$

where \({\tilde{\boldsymbol y}}_t\) is the adjusted phenotypic values for the DGRP lines in training set t, et is a vector of random residuals, Zt is a design matrix linking the genomic (gt) and metabolomic (mt) effects to the phenotypes. The random genomic effects are \({\boldsymbol{g}}_t\sim N(0,{\boldsymbol{G}}_{[t,t]}\sigma _g^2)\), and the metabolomic effects are \({\boldsymbol{m}}_t\sim N(0,{\boldsymbol{M}}_{[t,t]}\sigma _m^2)\), where G is the additive genomic relationship matrix as specified previously, and M is the metabolomic relationship matrix. The metabolomic relationship matrix was computed as \({\boldsymbol{M}} = {\boldsymbol{QQ}}^\prime /m_{{\mathrm{NMR}}}\), where Q is a n × mNMR matrix of adjusted, centred and scaled NMR intensities (mNMR = 14,440). Each column vector of Q contains the BLUP values from a mixed model where the phenotype was the corresponding NMR intensity, which was adjusted for Wolbachia infection status and major polymorphic inversions (In2Lt, In2RNS, In3RP, In3RK, In3RMo), using a block identity matrix as covariance structure to link the replicated NMR measurements with each DGRP line. This was done to obtain a data structure similar to the genomic data; namely, one value of each DGRP line/NMR intensity.

The predicted phenotypes in validation set v (\({\hat{\boldsymbol y}}_v\)) were obtained using Eqs. (4) and (5) for GBLUP and MBLUP, respectively:

$${\hat{\boldsymbol y}}_v = ({\boldsymbol{G}}_{\left[ {v,t} \right]}\hat \sigma _g^2)\left[ {{\boldsymbol{G}}_{\left[ {t,t} \right]}\hat \sigma _g^2 + {\boldsymbol{I}}_{\left[ {t,t} \right]}\hat \sigma _e^2} \right]^{ - 1}({\tilde{\boldsymbol y}}_t - \mu _t)$$
$${\hat{\boldsymbol y}}_v = ({\boldsymbol{M}}_{\left[ {v,t} \right]}\hat \sigma _m^2)\left[ {{\boldsymbol{M}}_{\left[ {t,t} \right]}\hat \sigma _m^2 + {\boldsymbol{I}}_{\left[ {t,t} \right]}\hat \sigma _e^2} \right]^{ - 1}({\tilde{\boldsymbol y}}_t - \mu _t)$$

The PA was quantified as the mean Pearson’s correlation between observed and predicted phenotypes across training sets, \({\rm{PA}} = \frac{1}{{100}}\mathop {\sum }\nolimits_{t = 1}^{100} {\rm{Cor}}({\tilde{\boldsymbol y}}_v,{\hat{\boldsymbol y}}_v)\). The accuracies of GBLUP and MBLUP were compared using a paired t-test.

NMR cluster-guided phenotypic predictions

From genomic prediction models, we know that allowing marker effects to come from different distributions, e.g., grouping genetic variants into functional pathways, can increase the PA markedly (Speed and Balding 2014; Edwards et al. 2016; Rohde et al. 2017, 2018; Sørensen et al. 2017). Therefore, we investigated if similar benefits could be achieved by partitioning the metabolome.

Using the Q-matrix (i.e., the n × mNMR matrix of adjusted, centred and scaled NMR intensities), we computed all pairwise Pearson’s correlation coefficients and performed hierarchical clustering on the dissimilarity on the correlation coefficients using an unweighted pair group method with arithmetic mean agglomeration (Fig. 1). Using a range of total number of clusters Kcl = {25, 50, 75, 100, 125, 200}, we performed metabolomic data point best linear unbiased prediction (MFBLUP), which is an extension to the MBLUP model (Eq. (3)) containing an additional metabolomic effect (Eq. (6), Fig. 1). For each Kcl level, we estimated model parameters for each cluster (for Kcl = 25 we ran 25 models and for Kcl = 50 we ran 50 models) as follows:

$${\tilde{\boldsymbol y}}_t = {\boldsymbol{Z}}_t{\boldsymbol{m}}_t^{K_{cl\_k}} + {\boldsymbol{Z}}_t{\boldsymbol{m}}_t^r + {\boldsymbol{e}}_t,$$

where the superscript Kcl_k indicates the total number of clusters (Kcl) and the cluster number (cl). The first metabolomic effect was defined as \({\boldsymbol{m}}_t^{K_{cl\_k}}\sim N(0,{\boldsymbol{M}}_{[t,t]}^{K_{cl\_k}}\sigma _{m^{K_{cl\_k}}}^2)\), where \({\boldsymbol{M}}_{[t,t]}^{K_{cl\_k}}\) corresponds to the metabolomic relationship of the DGRP lines within the training set (t) for the NMR data points within cluster number cl among the Kcl clusters. The second metabolomic effect (\({\boldsymbol{m}}_t^r\sim N(0,{\boldsymbol{M}}_{[t,t]}^r\sigma _{m^r}^2)\)) is the random effects using all NMR data points except those within the Kcl_k cluster.

Fig. 1: Conceptual illustration of the NMR cluster-guided phenotypic predictions.
figure 1

All pairwise correlations among the NMR features were computed, which was used in a hierarchical clustering of NMR data points. The dendrogram was then sequentially cut into K clusters (25, 50, 75, 100, 125, 150 and 200 clusters), and each individual cluster was then used in the MFBLUP model. NMR data points within one cluster were used to construct a metabolomic relationship matrix that was used as covariance matrix in the MFBLUP model. The MFBLUP model was fitted for all clusters within the seven levels of K clusters.

Similar to the MBLUP model, the predicted phenotypes in the validation set v (\({\hat{\boldsymbol y}}_v\)) were obtained as follows:

$${\hat{\boldsymbol y}}_v = ({\boldsymbol{M}}_{\left[ {v,t} \right]}^{K_{cl\_k}}\hat \sigma _{m^{K_{cl_k}}}^2 + {\boldsymbol{M}}_{[v,t]}^r\hat \sigma _{m^r}^2)\left[ {{\boldsymbol{M}}_{\left[ {t,t} \right]}^{K_{cl\_k}}\hat \sigma _{m^{K_{cl_k}}}^2 + {\boldsymbol{M}}_{[t,t]}^r\hat \sigma _{m^r}^2 + {\boldsymbol{I}}_{\left[ {t,t} \right]}\hat \sigma _e^2} \right]^{ - 1}({\tilde{\boldsymbol y}}_t - \mu _t)$$

The PA for each Kcl_k cluster was obtained as \({\rm{PA}} = \frac{1}{{100}}\mathop {\sum }\limits_{t = 1}^{100} {\rm{Cor}}({\tilde{\boldsymbol y}}_v,{\hat{\boldsymbol y}}_v)\) and was compared (using paired t-test corrected for the number of tests performed within each cluster level using a false discovery rate (FDR) of <0.05) within and across Kcl clusters to identify the NMR data points resulting in the largest PA. We only considered clusters to be significant if the FDR was below 0.05, and if the proportion of variance captured by the cluster was larger than 1%, which is computed as \(\left( {\hat \sigma _{m^{K_{cl_k}}}^2/(\hat \sigma _{m^{K_{cl_k}}}^2 + \hat \sigma _{m^r}^2)} \right) > 1\%\).

Finally, to investigate if we could increase the predictive performance further, we took all the clusters that increased the trait-specific predictive performance (including clusters where the variance captured was below 1%), ranked them by their predictive performance, and ran a new series of prediction models, adding the NMR data points sequential to the model based on the clusters predictive performance (high to low).


The metabolome of D. melanogaster

1H NMR spectroscopy measures the intensity of signals from hydrogens in different chemical environments and can therefore be used to assess which molecules are present in a biological sample. Here we used 1H NMR to quantify the metabolome of male flies from 170 DGRP lines in four biological replicates. For each sample (i.e., one for each DGRP line and replicate), we obtained one NMR spectrum consisting of 14,440 data points, together describing all NMR-visible hydrogens in the sample. For each NMR data point, we estimated the proportion of variation in 1H intensity explained by common genetic variants; i.e., the heritability (h2, Fig. 2A). In total, 39% of the NMR data points showed a significant heritability estimate (Fig. 1B), of which the average heritability was 0.26 (0.13 across all data points).

Fig. 2: Genetic variation for the D. melanogaster metabolome.
figure 2

Panel (A) shows in solid blue line the average NMR intensity across all DGRP lines (intensity axis not shown) as function of chemical shift. For each NMR data point, we estimated the heritability (h2); the points in grey represent non-significant estimates of h2, and points in green are significant estimates of h2. Panel (B) is a histogram of the significant heritability estimates.

For the NMR data points with a significant heritability estimate, we identified genetic variants associated with NMR 1H intensity, namely, mQTLs. We found a total of 152 genome-wide significant mQTLs (Supplementary Table 1) covering 98 NMR data points and 53 SNPs. The significant mQTLs were distributed across the D. melanogaster genome (Supplementary Fig. S1), and we annotated the significant variants to 56 genes. Among the 56 genes 20 of them contained two or more significant mQTLs (Supplementary Table 1); however, in several cases the same SNP was annotated to different genes (five gene sets in total, Supplementary Table 1) as these genomic regions were complex and contained different annotations and the exact annotation could therefore not be resolved. The genes CCHamide-2 receptor (CCHa2r), sidestep (side), Glutamate receptor IB (GluRIB), Coronin (coro) and CG43373 were the genes with the largest number of genome-wide significant associations (between 5 and 74 associations; Supplementary Table 1).

Phenotypic predictions

To test the predictive performance of the metabolome, we obtained phenotypic data from five previously published studies (Table 1); two behavioural traits and three stress resistance traits. We constructed relationship matrices based on genomic and on metabolomic information and performed GBLUP/MBLUP. For each trait we used 90% of the data to estimate the parameters using either the genomic or metabolomic relationship matrices and used the estimated parameters to predict the remaining 10% of the data. This was repeated on 100 random data divisions.

The mean PA for the two behavioural traits, locomotor activity without and with treatment of Ritalin, was below 0.1 when based on genomic information (Fig. 3A, B). Using metabolomic information the predictive performance was increased to above 0.4 (Fig. 3A, B and Supplementary Table S2). By using the D. melanogaster metabolome, we could also increase the predictive accuracy of the two environmental stress resistance traits, chill come recovery and starvation resistance (Fig. 3D, E and Supplementary Table S2). However, for startle response (a behavioural response to a physical disturbance) prediction using genomic information was superior over the metabolome (Fig. 3C).

Fig. 3: Prediction accuracies using genomic and metabolomic data.
figure 3

For each panel, the barplot shows the maximum mean prediction accuracies (+standard error) for the different models. GBLUP and MBLUP are based on single component prediction models, whereas the two MFBLUP models are based on two components. The global maximum prediction accuracy obtained across all levels of clusters (Kcl = {25, 50, 75, 100, 125, 200}) is shown in the MFBLUP bar (indicated with white arrow). The prediction accuracy when combining the significant clusters is shown in the MFBLUP2 bar (indicated with white square and circle). Significant improved predictive performance is indicated by asterisks above the bars, see Supplementary Table S3 for all comparisons. The heatmaps on the right side of the panels show all prediction accuracies for the NMR cluster-guided prediction model within Kcl cluster level. The columns correspond to NMR data points (fixed across the Kcl cluster levels) and each cell is one cluster of NMR data points (link between NMR data points and clusters can be found in Supplementary Table S4). The predictive performance of each cluster within Kcl cluster level is indicated with the colour scale. Within cluster level significant prediction accuracies are indicated with white squares, and the cluster with the highest significant prediction accuracy is indicated with asterisk. Across all cluster levels, the highest prediction accuracy is indicated with white arrow (which then corresponds to the orange bars on the left-side panel). The set of significant clusters that give the highest predictive performance is marked with white squares with black circle (corresponds to the light green bars in the barplot).

We computed Pearson’s correlation coefficients among all NMR data points and performed hierarchical clustering (Supplementary Fig. S2). We then ran a two component NMR cluster-guided prediction model (MFBLUP) where the first component was based on NMR data points within one cluster, and the second component was based on the remaining NMR data points (Fig. 1). We tested all clusters from the hierarchical clustering using different number of total clusters; Kcl = {25, 50, 75, 100, 125, 200} (Fig. 1). By the extension of the MBLUP model we could increase the predictive performance of all five quantitative traits by 17–185% (Fig. 3 and Supplementary Table S2). Interestingly, the largest improvement in PA was obtained at different cluster levels for the five traits (Fig. 3 and Supplementary Fig. S3). For locomotor activity the largest improvement in PA was obtained at cluster level Kcl = 100 (Fig. 3A), with cluster 1 as the only cluster that had significantly increased predictive accuracy (Supplementary Table S3 and Fig. S3). For locomotor activity (Ritalin treatment), startle response and starvation resistance, cluster level Kcl = 200 contained the clusters that gave the highest predictive performance (Fig. 3B–D and Supplementary Fig. S3). For activity (with Ritalin) clusters 121 and 74 (Supplementary Table S3) increased the predictive performance. Combining the two clusters increased the PA insignificantly by 1% (Figs. 3B and S6). Eight clusters increased the predictive performance (that also captured >1% intensity variance) for startle response (1, 2, 5, 14, 99, 112, 145 and 187; Supplementary Table S3 and Fig. S4), and by combining clusters 1, 2, 5, 112 and 145 we further increased the PA (Figs. 3E and S8 and Supplementary Table S2). Five clusters (21, 65, 83, 95 and 140) increased the accuracy of prediction (and captured >1% intensity variance) for starvation resistance (Supplementary Fig. 4), and the joint effect of clusters 21, 65, 83 and 95 further insignificantly increased the accuracy (Figs. 3D and S9 and Supplementary Table S3). Finally, for chill coma recovery, the maximum PA was obtained at cluster level Kcl = 50 (Fig. 3E), where clusters 5, 27 and 35 significantly explained >1% of the NMR intensity variance (Supplementary Table S3 and Fig. S4). Combining the two clusters with the highest accuracy led to insignificant increased accuracy (Figs. 3B and S9 and Supplementary Table S2).

The contributions to the predictive clusters at cluster level Kcl = 200 were mapped on to the NMR spectrum (Fig. 4). Interestingly, none of the clusters contained signals from the highest concentration metabolites. Rather they contain signals from metabolites at lower abundance or very broad signals, suggesting contributions from small metabolites bound to larger molecules or larger molecules themselves. Out of the 18 clusters (Table 2), 7 (1, 2, 5, 21, 32, 45, 65; Fig. 4A) contained signals in a region where aromatic compounds with quaternary nitrogens, such as NAD and nicotinamide ribotide, appear. These clusters showed high PA for locomotor activity, chill coma recovery and startle response. Four clusters (32, 74, 83, 95; Fig. 4A) contained signals in a region where other heterocycles, such as adenosine, appear. Furthermore, six clusters (112, 121, 129, 139, 145, 154; Fig. 4B) contained signals in a region where aromatic groups from amino acids like histidine and tyrosine appear. Most of these are important for locomotor activity with Ritalin and starvation resistance. There is also one cluster (171; Fig. 4C) that contained signals in a region where signals from sugars appear, and another (187; Fig. 4D) that contained signals in a region where amino acid CH2 groups appear. Out of these clusters 1 and 187 clearly contained signals that would usually be identified as baseline, while clusters 2, 5, 32, 112, 129, 139, 145 contained mostly well resolved, though often low intensity, signals. The remaining signals are somewhere in between. Only cluster 32 could be matched to a known metabolite and its signals are assigned as coming from the nicotinamide group of NADP. The other metabolites could not be found in currently available databases with NMR characteristics of metabolites (Cui et al. 2008; Ulrich et al. 2008; Wishart et al. 2018) or in metabolomics studies of D. melanogaster or other model organisms. The clusters at cluster level Kcl = 200 were compared with those giving the highest PA for locomotor activity at cluster level Kcl = 100, and chill coma recovery at cluster level Kcl = 50 (Supplementary Fig. S10). For locomotor activity the larger cluster with the higher prediction activity cover a larger stretch of the baseline in the nicotinamide region indicating that it is optimal to include a larger number of higher molecular weight nicotinamide units (Figs. 3A and S10A, B). For chill coma recovery, the larger cluster covers the broad peak containing the signals from aromatic amino acids in a higher molecular weight context and there is no smaller cluster that retain significant predictivity once the larger cluster is broken up (Figs. 3E and S10C, D).

Fig. 4: Contributions to predictive clusters from the metabolite NMR spectra at the Kcl = 200 level.
figure 4

Clusters are indicated with coloured dots on the average of all NMR spectra (black line). Panel (A) shows clusters: 1, 2, 5, 21, 32, 45, 65, 74, 83 and 95; panel (B) clusters: 112, 121, 129, 139, 145 and 154; panel (C) cluster: 171; and panel (D) cluster: 187. Selected major metabolites in these regions are identified. The location of the nicotinamide ribotide signals resonating at the highest ppm values is also indicated in panel (A).

Table 2 Predictive NMR data point clusters.


Prediction of phenotypic trait values using genetic markers has been a central element in plant and livestock breeding for decades (Van Arendonk et al. 1994; Meuwissen et al. 2001; Goddard et al. 2009), and more recently this strategy has emerged within human genetics attempting to accurately predict, e.g., disease risk from DNA information (Hall et al. 2004; Wray et al. 2008, 2019; Schrodi et al. 2014). However, the predictive value from genotyped genetic variants is often low (Schrodi et al. 2014; Patron et al. 2019), and this is problematic when aiming to predict complex phenotypes such as diseases, behaviours or production traits. Therefore, there is a potential to further optimize the applicability of these methods. Here, we used 1H NMR spectroscopy to quantify the metabolomic profiles of groups of male D. melanogaster from 170 completely inbred and genome sequenced lines to investigate if an endophenotype, here the metabolome, has improved predictive power compared with a situation where only genome information was used.

Since the metabolome NMR spectra contain signals from every molecule that contains 1H (and does not exchange with water) there are contributions from many different small molecule metabolites like amino acids and sugars, but also from larger molecules that remain in solution after the acetonitrile/water extraction. The former signals are sharp and often easy to identify, while the latter typically cover very broad areas of the spectra and are only identifiable in terms of the nature of the chemical groups behind the signal. For D. melanogaster it may, however, also be difficult to identify some of the small molecule metabolites since they do not appear in the available databases that typically contain rodent or human metabolites. It is thus not feasible at this point to produce a table with names and concentrations of all metabolites that, e.g., show significant association with the genome. As the main focus in this study was on the predictive ability of the metabolome quantified with 1H NMR, we focused on a few specific metabolites.

We found that the D. melanogaster metabolome was highly variable with more than 39% of the NMR spectrum having a significant heritability estimates (Fig. 2), displaying same level of genetically determined variability as the D. melanogaster transcriptome (Huang et al. 2015). It has previously been shown that metabolome variation appears to have a genetic signature (Pedersen et al. 2008; Malmendal et al. 2013; Reed et al. 2014; Zhou et al. 2020), but also that the metabolome is highly variable among sexes (Schou et al. 2017; Li et al. 2018; Zhou et al. 2020) and change with age (Lawton et al. 2008; Yoshida et al. 2010; Sarup et al. 2012; Hoffman et al. 2014). Our findings confirm that variation in metabolite abundance is genetically controlled. The metabolome can therefore be influenced by evolutionary forces like any other phenotypic trait and this variation can be utilized, e.g., in livestock and plant breeding where specific metabolites may be of interests (Browne and Brindle 2007; Goldansaz et al. 2017; Gamboa-Becerra et al. 2019).

Metabolome-wide association studies, which is the mapping of metabolite QTLs (mQTLs), seek the same as genome-wide and transcriptome-wide association studies, namely, to identify genetic variants associated with variation in a functional character or an endophenotype (Holmes et al. 2008; Bictash et al. 2010). Here, we mapped the individual 1H intensities in the NMR spectra with SNP genotypes and identified abundant mQTLs (Supplementary Fig. S1 and Supplementary Table S1). CCHa2R, which encodes a neuropeptide (Hansen et al. 2011), was the gene with the largest number of genome-wide significant associations (74 in total). In humans, CCHa2R is the bombesin receptor subtype 3, which is involved in regulating metabolic rate (Xiao et al. 2017) and glucose metabolism (Feng et al. 2011). This gene associates with signals from tyrosine only (Supplementary Fig. S11D). Recently, CCHa2R was associated with variation in levels of five metabolites, and upregulation of this gene extended the life span of D. melanogaster (Jin et al. 2020). The second most associated gene was sidestep, which controls the migration of motor axons in the developing fly (Siebert et al. 2009). Across several model organisms and humans sidestep has no apparent orthologous genes (Hu et al. 2011). The glutamate receptor (GluRIB) contained six significant associations, and the variants herein associated with a set of features close to a signal from the imidazole group in histidine. Coronin, with six associations, is involved in muscle morphogenesis (Schnorrer et al. 2010). The human orthologue of Coronin is coronin 1C and has previously been associated with lipoprotein and cholesterol levels (Wakil et al. 2016; Siewert and Voight 2018). Accordingly, the NMR data points associated with Coronin include signals corresponding to the choline methyl groups of sn-glycerophosphocholine and methylene and methyl groups from larger molecules such as peptides or fatty acids (Supplementary Fig. S11A). Another gene with many associated NMR data points (five NMR data points) was CG43373, which has the human ortholog adenylate cyclase 5 (ADCY5), that in multiple studies have been associated with type II diabetes mellitus (Mahajan et al. 2014; Qi et al. 2017; Bonàs-Guarch et al. 2018), body mass index (Locke et al. 2015), blood glucose (Manning et al. 2012) and cholesterol levels (Liu et al. 2017; Hoffmann et al. 2018). Both sidestep and CG43373 were associated with the same unidentified signal in a region with signals from hydrogen atoms in the vicinity of hydroxy or carboxy groups (Supplementary Fig. S11B, C). The observation that the genes that contain most associations with NMR data points are indeed involved in metabolic processes indicates that the identified mQTLs are biologically relevant and not a statistical artefacts. Further studying the regulatory effects of these genes on the metabolome would require functional validation, e.g., using RNA interference (Jin et al. 2020), which is beyond the scope of this study. Furthermore, in the present study, we limited our search for mQTLs to those identifiable using single SNP marginal effects. Gene/Pathway-based enrichment analyses are known to increase statistical power and have the potential to improve biological interpretation (Holmans et al. 2009; de Leeuw et al. 2015; Rohde et al. 2016). We demonstrated, that despite using an underpowered statistical mapping approach (a univariate marker model), we were able to identify abundant mQTLs with apparently large effects. Applying a gene-based model would likely have increased the statistical power enabling identification of additional genes of minor affects. However, as the main focus of the present study was on predictive performance of the D. melanogaster metabolome we did not enter that route.

Accurate phenotypic prediction of any trait requires large sample sizes to reliably estimate model parameters, and some measure that can describe the covariance structure among individuals, such as genetic variants or other molecular variation. Although the DGRP system appears to lack power for mapping and prediction studies because of the limited number of inbred lines, it gains statistical power because it is possible to obtain repeated measures on a large number of individuals from highly genome-wide homozygotes lines resulting in very accurate within-DGRP-line measures (Mackay and Huang 2018).

The main aim of the current study was to investigate the predictive power of the D. melanogaster metabolome compared with prediction models using genomic data. Using genomic information generally resulted in low predictive abilities, and for chill coma recovery time it was even negative (Fig. 3). Despite similar heritability estimates for the five quantitative traits investigated (Table 2), it was not surprisingly that we observed poor predictive ability for chill coma recovery time as this has been found multiple times (Ober et al. 2012, 2015; Edwards et al. 2016; Sørensen et al. 2017). It has been suggested that the lack of predictive ability for chill coma recovery might be due to more pronounced non-additive effects, like epistasis, for this trait (Ober et al. 2015; Morgante et al. 2018). We showed that for four out of five quantitative traits using the D. melanogaster metabolome for phenotypic predictions significantly improved the accuracy of prediction compared to using the D. melanogaster genome (Fig. 3). The extent to which prediction was increased varied across traits, but the trend is clear across all five traits; partitioning the metabolome by highly correlated NMR data points increased the predictive performance (Fig. 3). Even for chill coma recovery time, we observed a great improvement in predictive ability (Fig. 3), which further supports the idea that this particular trait could be under influence of non-additive effects, as these are inherent in the metabolome. Partitioning the metabolomic variation using the cluster-guided approach further increased the predictive performance (Fig. 3). This is expected because partitioning the genome by functional categories has also been shown to increase the predictive performance (Speed and Balding 2014; Edwards et al. 2016; Fang et al. 2017; Rohde et al. 2018, 2019). Here we did not partition the genome as our main contrast of interest was to compare predictions based on the whole genome and the whole metabolome. Comparing two partitioned prediction models would be a challenging comparison because the two partitionings would be different; i.e., one based on functional genomic regions and one based on metabolomic signatures.

Comparing the significant mQTLs with the NMR data points within the trait-specific predictive clusters, we see an overlap for startle response, starvation resistance and chill coma recovery time (Supplementary Table S5). However, none of the associated mQTL genes common across traits (Supplementary Table S6) were previously found associated with startle response, starvation resistance or chill coma recovery (Mackay et al. 2012). The most likely explanation for this result is that GWAS capture genetic variants with strong effect sizes, whereas in prediction models variants or metabolomic intensities with small effect sizes are also detectable. Similarly, we compared the NMR data points within the predictive clusters across the five functional phenotypes (Supplementary Table S5) and found commonalities between activity (control treatment) and startle response; activity (Ritalin treatment) and chill coma recovery; startle response and chill coma recovery; and starvation resistance and chill coma recovery. Out of these overlapping regions the first one contained a very broad signal from what seems to be a nicotine amide bound to or in a large molecule. The others are small unidentified metabolites in this region or the region where the aromatic groups for amino acids appear.

Recently, Zhou et al. (2020) measured metabolite variation in 453 metabolites using 40 DGRP lines, and concluded that if the sample size was larger the accuracy of metabolome predictions could be improved. This is exactly what we have shown in this study, namely that the metabolome can increase the accuracy of phenotypic prediction. The increased power relative to Zhou et al. (2020) may originate not only from the larger numbers of DGRP lines, but also from the higher reproducibility that 1H NMR metabolomics provide compared to mass spectroscopic methods. It should be stressed, however, that while NMR has a much higher reproducibility, mass spectrometry is more sensitive and better at resolving which signals belong to which metabolite, but these features seem to be of less importance when it comes to predictive power.

Our results clearly demonstrate the added value of performing predictions of functional phenotypes using NMR metabolomics compared with SNP genotypes. These findings, together with others, truly open the doors for applying metabolomics in different disciplines, for example, in the human health sector or in animal breeding. Metabolites can be easily quantified in biofluids from livestock, human blood donors or patients that have blood samples taken on a regular basis. This entails a clear advantage over other methods in terms of translatability (Fontanesi 2016). Recently, studies have shown that the metabolomic signatures of blood from cattle (Novais et al. 2019) and pigs (Carmelo et al. 2020) can be used to accurately separate animals by their feed efficiency which is one of the most economically important traits in livestock production. Metabolomics also has great potential for prediction of traits that are difficult or expensive to measure (Riedelsheimer et al. 2012; Rohart et al. 2012; Xu et al. 2016; Hayes et al. 2017; Gemmer et al. 2020). Studies of the human metabolome and its relevance to human health have also clearly increased the last 5 years (Rangel-Huerta et al. 2019; Zhang et al. 2020), and it is very likely to be one of the cornerstones in an implementation of personalized medicine. With the emergence of large human cohort projects, like the UK Biobank (Bycroft et al. 2018), FinnGenn (Mars et al. 2020) and BioBank Japan (Nagai et al. 2017), the data samples are reaching a sample level enabling ground-breaking research if full metabolomic profiles were obtained. For example, it has recently been shown that the 5–10-year all-cause mortality was better predicted from NMR spectra of plasma and serum samples than using traditional clinical risk factors (Deelen et al. 2019).

The functional phenotypes investigated here were not obtained on the same flies that were used for metabolomics. Thus, the only thing that links the phenotypes to the metabotype is the common genotype of the DGRP lines. This would imply that the higher predictive power of the metabolome comes from the fact that the metabolome has a more linear relationship to the phenotype than the genome does, which makes very much sense in an organism with several layers of mechanisms striving to maintain homoeostasis (Chan et al. 2010). Furthermore, predictions based on the BLUP framework rely heavily on the estimated relationship among samples/individuals. The metabolomic relationship is a condensation of all the “causal genetic effects” that actually has an effect on the phenotype, whereas the genomic relationship matrix mostly describes the relationship among samples. Thus, despite the fact that the functional phenotypes and the metabolome were not measured on the same flies from the same environments, we do see improved predictive performance likely because the metabolomic relationship is closer to the true causal genetic effects than the estimated genomic relationship matrix, as also recently observed by Harrison et al. (2020). An interesting example of the close relation between the metabolome and the phenotype in insects is the very high correlation (R2 = 0.93) between the haemolymph metabolome and cold tolerance in different Drosophila species (Olsson et al. 2016).

The fact that many of the NMR signals with high predictivity come from unidentified and/or larger metabolites of lower concentration shows that despite the inherently low sensitivity of NMR, the high reproducibility of the method allows for a high accumulated sensitivity to be obtained for the combined data set. An interesting example here is the cluster with the highest PA for chill coma recovery that covers the entire aromatic amino acid region. It also shows the need for a deeper investigation of the NMR detectable D. melanogaster metabolome and stresses its complementarity to mass spectrometry. From a metabolic signalling perspective it also makes sense that it is the less abundant metabolites with aromatic groups that are important in these processes. In this context, it should be made clear that when using NMR metabolomics, information about the entire spectrum is obtained in one go so there is no reason to choose which region to use beforehand for predictive purposes. It should also be stressed that in order to make predictions, it is not needed to be able to identify the molecules behind the NMR signals.

In conclusion, we have convincingly shown that metabolomic approaches have large potential for predicting functional phenotypes. Obviously, the generality and repeatability of these findings should be verified in different genetic backgrounds, in non-model species and in samples that are easy to generate from livestock, humans and plants. However, our main findings, namely, that metabolite profiles are highly heritable, that specific genes are associated with metabolome variation and that the metabolome predicts phenotypes more accurately than genomic data are likely to be robust.