The superior physical performance of elite athletes is a multifactorial trait, with contributions from both environmental (exercise and diet) and genetic factors1. There is ample evidence suggesting influence of multiple genetic variants with small effect size over several phenotypic traits related to physical performance2. The identification of these variants is crucial to understand the superior performance of elite athletes and has been a subject of study for many years3,4,5. However, research into the genetics of athletic performance has been hindered by small sample sizes and complexity of the phenotype6. Genome-wide association studies (GWAS) in athletes versus non-athletes have uncovered many new loci7,8. However, a meta-analysis of 1520 endurance athletes and 2760 controls has revealed no evidence of association of a common genetic variation with endurance status in world class athletes9.

The advancement in metabolomics tools including mass spectrometry (MS) technologies has offered a unique opportunity to complement genomics data with intermediate phenotypes. Identified metabolites exhibited direct functional associations with genetic variants and provided greater effect sizes10,11. In a pilot metabolomics study, we identified differences in metabolic profiles between moderate and high endurance elite athletes including metabolites involved in steroid biosynthesis, fatty acid metabolism, oxidative stress and energy-related molecular pathways12. The integration of genomics and metabolomics technologies has also allowed a more comprehensive coverage of the metabolic pathways involved in complex physiological and pathological processes13,14.

GWAS for metabolic traits (mGWAS)10,15,16,17,18,19,20,21,22,23,24 has revealed hundreds of metabolomics quantitative trait loci (mQTLs) in the general population22,23,24,25,26,27. The identification of novel mQTLs in athletes who experience unique environmental conditions including special diet and intensive exercise may provide invaluable tools for biomarker discovery in relation to exercise and performance. This unique approach could provide better informed selection of athletic candidates and crucial information needed for optimal balance between training and recovery for every athlete12. Identified mQTLs in elite athletes could also help in the development of non-conventional anti-doping analytical strategies by understanding the genetic predisposition of specific doping-related metabolites. Furthermore, elite-athletes-unique mQTLs could offer potential novel therapeutic targets in athletes and potentially general population.

The aims of this study are (1) to confirm previously published12 metabolites associated with endurance sports, (2) to discover novel genetic loci affecting metabolites in elite athletes by fine-mapping loci to putative functional variants at or near sentinel SNPs (a sentinel SNP or sentinel metabolite refers to a lead SNP or a lead metabolite) and (3) to discover novel variant loci associated with endurance metabolites underscoring the metabolic individuality of endurance athletes.


Genotyping of 275,016 SNPs that passed quality control measures (see methods) was performed in 490 elite athletes belonging to different sport disciplines (Table S1), followed by serum metabolomics of 751 metabolites to confirm previously published endurance metabolites12. Subsequent mGWAS analysis was performed to reveal novel SNP-metabolite associations by comparing mGWAS hits identified in elite athletes with reference studies that were previously performed in non-elite athletes25,26,27. Finally, novel mGWAS hits associated with endurance sports were determined. Figure 1 provides a schematic representation of the study design.

Figure 1
figure 1

Schematic view of the study design. HE: High Endurance, ME: Moderate Endurance, QC: Quality Control, PCs: Principle Components.

Confirmation of endurance-associated metabolites in elite athletes

In order to confirm previously reported associations12, a linear model was used to assess the significance of metabolite associations with the endurance level of athletes’ sports (moderate vs high endurance) after correcting for gender, hemolysis levels, PCA (PC1 and PC2 from metabolites) and ethnicity groups in a second cohort of 490 athletes. A meta-analysis confirmed 104 metabolites identified in both cohorts in association with endurance sports (Table S2), including elevation in pregnenolone, androgenic steroids and monohydroxy fatty acids and reduction in diacylglycerols, acyl carnitines, gamma glutamyl amino acids and glutathione in the high endurance sports.

Common variant loci influence metabolites (mQTLs) in elite athletes

By combining genotyping and metabolomics data, 145 significant SNP-metabolite associations (Bonferroni p ≤ 2.4 × 10−10) were identified (Table S3), with an average inflation factor for mGWAS metabolites of 1.07 (0.96–1.19). Genetic loci were investigated for known expression quantitative trait loci (eQTLs), mQTLs and functional associations using several databases including SNIPA, PhenoScanner V2 A database of human genotype-phenotype associations, GTEx portal (version 2.1, Build #201), OMIM, Overview of Bravo variant server resources and GnomAD By identifying the identities of their genes, these associations collapsed into 19 independent loci (Table 1, Fig. 2). The variance explained by these SNPs ranges from the highest value of 43.68% (N-methylpipecolate with rs7072216 in PYROXD2 locus) to the lowest value of 8.59% (Ceramide-d16:1/24:1-d18:1/22:1 in SGPP1 locus) with an average of 16.09% (Fig. 3).

Table 1 Nineteen unique locus-metabolite mGWAS pairs identified in 490 elite athletes, including two novel gene/metabolite associations and four known gene/metabolite associations but with novel SNPs.
Figure 2
figure 2

Manhattan plot for the discovered mGWAS loci. The red line indicates the Bonferroni threshold (2.4 × 10−10) and the blue line indicates the genome wide significance threshold (5 × 10−8). The novel gene/metabolite associations appear in red and the known gene/metabolite associations, but with novel SNPs, appear in blue. Previously reported associations are shown in grey.

Figure 3
figure 3

Percent of explained variance of metabolite by the corresponding SNP in the identified mGWAS loci in elite athletes. The height of a column bar reflects the percent of variance explained for each locus. Loci genes are indicated above the column bar and corresponding metabolite name on the X-axis. Novel mGWAS loci appear in red and previously reported associations are shown in in black. The known replicated loci, but with novel SNP or metabolite are typed in blue. Bars are colored according to Metabolon specified pathway for the metabolites associated with the locus. Biochemical Name* indicates compounds that have not been officially confirmed based on a standard, but Metabolon is confident in their identities.

The 19 independent loci replicated 15 previously reported loci26,27 (Table 1). The remaining four mGWAS loci represented novel associations between specific SNPs and metabolites. Two of these included novel gene/metabolite associations, namely rs55729124 in folate hydrolase 1 (FOLH1) in association with N-acetyl-aspartyl-glutamate (NAAG) (r2 = 9.35%, p = 2.17E-11) and rs3798793 in vascular non-inflammatory molecule 1 (VNN1) in association with linoleoyl ethanolamide (r2 = 10.8%, p = 3.15E-13) (Table 1, Figs. 2 and 3). Other novel mGWAS included known gene/metabolite associations, but with novel SNPs. These included the third novel mGWAS rs11568825 SNP within the cytochrome P450 family 3 subfamily A member 7 (CYP3A7) gene, exhibiting significant association with three different metabolites (androsterone sulfate (r2 = 14.11%, p = 3.91E-17), epiandrosterone sulfate (r2 = 9.95%, p = 2.82E-12) and 5 alpha-androstan-3alpha, 17 beta-diol monosulfate 1 (r2 = 9.2%, p = 3.31E-11)). The fourth novel mGWAS represented association between rs17101394 in sphingosine-1-phosphate phosphatase 1 (SGPP1) gene and Cermamide (r2 = 8.59%, p = 1.52E-10). For the 4 novel mGWAS loci, there were clear genotype-dependent effects on levels of associated metabolites as shown in boxplots in Fig. 4.

Figure 4
figure 4

Boxplots of levels of metabolites by genotype for novel loci. Boxplots for the loci CYP3A7, SGPP1, VNN1, and FOLH1 indicating the metabolite level and the number of samples for each genotype group.

Regional association plots for the novel loci VNN1 and FOLH1 are shown in Fig. 5. The intronic SNPs within VNN1 (rs3798793, Fig. 5a) and FOLH1 (rs55729124, Fig. 5b) loci show the strongest association (−log10 (p-value)) with linoleoyl ethanolamide and N-acetyl-aspartyl-glutamate (NAAG), respectively. The colors correspond to different linkage disequilibrium (LD) thresholds, where LD is computed between the sentinel SNP (lowest p-value, colored in blue) and all SNPs.

Figure 5
figure 5

Regional association plots for the two new loci (VNN1 and FOLH1).

Common variant loci influence metabolites (mQTLs) in elite endurance athletes

In order to investigate novel mQTLs between athletes’ genotyping data and the confirmed 104 endurance metabolites (Table S3), significant (p < 0.05/104*275016 = 1.7 × 10−9) mGWAS associations were identified from amongst the list of significant mGWAS hits at p value < 10−6 (Table S4). Four significant associations were found including one novel mGWAS association between rs10426201 in SULT2A1 gene and androstenediol (3alpha, 17alpha) monosulfate (2). Although the latter association was reported before, it did not reach statistical significance27 (Table 2). For the novel endurance mGWAS locus in SULT2A1 gene, there was a clear genotype-dependent effect on levels of associated metabolite as shown in Fig. 6.

Table 2 Unique locus-metabolite pairs associated with endurance sports in comparison with previous reports, including one novel association between a known locus (SULT2A1) and a new metabolite.
Figure 6
figure 6

Boxplot for metabolite-locus pair associated with endurance, indicating the metabolite level and the number of samples for each genotype group.

The regional association plot indicates that the intronic SNP (rs10426201) in SULT2A1 gene shows the strongest association (−log10 (p-value)) with androstenediol (3alpha, 17alpha) monosulfate (2) (r2 = 8.2%, p = 2.47E-10) (Fig. 7). The colors correspond to different LD thresholds, where LD is computed between the sentinel SNP (lowest p-value, colored in blue) and all SNPs.

Figure 7
figure 7

Regional association plots for the novel locus SULT2A1.


Historically, the superior performance of elite athletes has been considered an outcome of a special talent shaped by intensive training28. The talent is currently believed to be a product of additive genetic components predisposing elite athletes to higher endurance/power trainability under the control of strong environmental cues including exercise and nutrition29. Despite the identification of a number of genetic variants associated with athletic endurance, their small effect size made it difficult to replicate in various small cohorts of elite athletes. Therefore, this study aimed to identify intermediate phenotypes (metabolites) that could offer direct functional relationship with genetic variants in elite athletes in response to their unique environment, hence provide a greater effect size and a better chance to be identified. This could help in understanding the superior physical and mental performance of professional elite athletes and the identification of novel exercise-related biomarkers in athletic candidates.

To achieve our aims, genotyping was performed in elite athletes from different sport disciplines using a druggable genes-enriched SNP chip. This chip covers various metabolic pathways suited for the investigation of our intermediate phenotype of interest (metabolites) without enduring the penalty of multiple testing associated with more comprehensive SNP chips30. This was followed by serum metabolomics of the same samples to confirm our previously published endurance metabolites12 using Metabolon platforms that offer a very comprehensive untargeted metabolomics profiling31. Subsequent mGWAS analysis was performed to reveal novel SNP-metabolite associations by comparing mGWAS hits identified in elite athletes with the reference studies that were previously performed in non-elite athletes25,26,27. Finally, novel mGWAS hits associated with endurance sports were determined using the confirmed list of endurance metabolites from the meta-analysis.

Our genotyping data revealed a number of SNPs associated with endurance but none has reached the GWAS level of significance (data not shown). This expected outcome may have resulted from our small sample size, small effect size of genetic variants and the complex phenotype of physical performance. Therefore, a more precise phenotype (metabolites) was sought to obtain larger effect size and a better chance for detection. In our pilot study of 191 elite athletes we identified a number of metabolites associated with endurance12. In this study, we confirmed a number of these metabolites by carrying out metabolomics in a second cohort, followed by a meta-analysis of the two cohorts. Among confirmed hits, elevation of pregnenolone and androgenic steroids indicate active steroid biosynthesis pathway in high endurance athletes. Reduced diacylglycerols and acyl carnitines and increased monohydroxy fatty acids suggest active fatty acid oxidation for energy generation in the high endurance group. Reduction in gamma glutamyl amino acids and glutathione metabolism suggests active oxidative scavenging mechanisms in moderate endurance group. These metabolic changes seen in high performance elite athletes may reflect various cellular adaptations to prolonged exercise-induced oxidative stress. These may include modulation of energy utilization, muscle mass and deployment of stress-scavenging mechanisms as previously suggested12.

Following genotyping and metabolomics analyses, genetically-influenced metabolites were firstly sought between elite athletes cohort and published data from non-elite athletes25,26,27, and secondly within the elite athletes cohort between moderate and high endurance groups. Both analyzes revealed novel mGWAS associations with significant effect size (between 8–14%, Table 2 and Fig. 3), clear genotype-dependent effect (Figs. 4 and 6) and evidence of multiple SNP associations within the same genomic region (Figs. 5 and 7).

The mGWAS results between elite and non-elite athletes revealed 4 novel mQTLs. The first of which is a negative association between rs55729124 in Folate Hydrolase 1 (FOLH1) and NAAG levels. FOLH1 encodes a type II transmembrane glycoprotein termed glutamate carboxypeptidase II (GCPII) that hydrolyzes NAAG to NAA and glutamate32. The intronic SNP rs55729124 may therefore be associated with enhanced FOLH1 activity leading to the breakdown of NAAG and accumulation of NAA and glutamate. NAA is a nervous system specific metabolite found predominantly in cell bodies of neurons. Aerobic fitness was reported to increase NAA levels, leading to improved cognitive enhancement33. The identification of this novel mQTL in elite athletes may suggest augmentation of FOLH1 activity in elite athletes with exercise, resulting in higher NAA levels compared to non-elite athletes in other published studies (Table 3). Interestingly, NAAG serves as a reservoir to provide glutamate to cancer cells through GCPII34. The identification of this novel mQTL could potentially be utilized for the development of novel strategies for targeting GCPII for cancer treatment.

Table 3 Novel elite athletes-associated mQTLSs reflecting gene/environment (exercise) interaction.

The second mQTL identified in our athletic cohort is a positive association between rs3798793 in vascular non-inflammatory molecule 1 (VNN1) in association with linoleoyl ethanolamide. VNN1 protein possess pantetheinase activity that may play a role in oxidative-stress response. The endocannabinoid linoleoyl ethanolamide has a role as fatty acid amide hydrolase inhibitor as it inhibits arachidonoylethanolamide amidohydrolase. It has also been shown to have a neuroprotective role during ischemia reperfusion injury with potential therapeutic benefits when used as complementary treatment with other therapies to improve stroke outcome35. The link between VNN1 and linoleoyl ethanolamide is not clear yet. Exercise, however, has been shown to increase serum concentrations of endocannabinoids36, thus the identification of this novel mQTL may be due to exercise interaction (Table 3).

The third mQTL involved association of various SNPs in the Cytochrome P450 Family 3 Subfamily A Member 7 (CYP3A7) with lower serum sulfated steroids37. This is the first report of a negative association between rs11568825 in CYP3A7 with 5alpha-androstan-3alpha,17beta-diol monosulfate (1), although association of other SNPs within the same gene with same metabolites were previously reported27. Cytochrome P450 enzymes are important for the metabolism of many endogenous compounds including various steroids38. We have shown previously that 5alpha-androstan-3alpha,17beta-diol monosulfate is increased in endurance sports, potentially providing evidence of environmental interaction with endurance exercise12 (Table 3). Previous studies have reported that signaling activated by 5alpha-androstane-3alpha,17beta-diol may represent a novel pathway responsible for the progression to androgen-independent prostate cancer39. Therefore, the identification of this novel mQTL may potentially aid in designing novel therapeutic targets for androgen-independent prostate cancer.

The fourth novel mQTL was a positive association between rs17101394 in Sphingosine-1-Phosphate Phosphatase 1 (SGPP1) in association with Ceramide (d16:1/24:1, d18:1/22:1). Although the association of the same SNP with multiple different metabolites was previously reported. These metabolites included various spingolipids such as palmitoyl dihydrosphingomyelin (d18:0/16:0), sphingomyelin (d18:1/14:0, d16:1/16:0), sphingomyelin (d18:1/15:0, d16:1/17:0), sphingomyelin (d18:1/20:0, d16:1/22:0), and sphingomyelin (d18:1/21:0, d17:1/22:0, d16:1/23:0)27,40 as well as X-08402, and X-1051026 that are also related to sphingolipid pathway20. SGPP1 catalyzes the degradation of Sphingosine-1-phosphate (S1P), a bioactive sphingolipid metabolite that regulates diverse biologic processes, via salvage and recycling of sphingosine into long-chain ceramides41. Acute prolonged exercise was shown previously to influence ceramide metabolism in human skeletal muscle42, perhaps explaining identification of this mQTL in our elite athlete cohort (Table 3). Additionally, the identification of this mQTL could potentially be utilized for the development of novel therapeutic strategies against atherosclerosis since sphingolipids have been directly related to increased risk of atherosclerosis43.

In addition to novel mQTLs identified in elite athletes, we have confirmed 16 previously published mQTLS, two of which exhibited greater percent of variance in our elite athletes compared to the ones reported in non-elite athletes25,27, including rs7072216 in PYROXD2 in association with N-methylpipecolate (effect size 43% vs 31%) and rs1881245 in NAT8 in association with N-acetyl-1-methylhistidine (effect size 30% vs 26.6%). Among the 16 previously reported loci, 9 loci showed similar direction of association compared to previously published studies25,27. Among these, 4 loci showed more than 2 fold increase in their effect size in elite athletes including CYP3A7, AGMAT, NAT8 and SPTLC3 in association with androsterone sulfate, beta-guanidinopropanoate, N-acetyl-1-methylhistidine, and sphingomyelin (d18:1/25:0, d19:0/24:1, d20:1/23:0, d19:1/24:0). Whereas 6 loci showed opposite direction of association compared to previously reported study including CERS4, KLKB1, SLC6A13, SLCO1B1, TMPRSS11E and UGT1A1027. Among these, SLCO1B1 locus showed 2.8 fold decrease in its effect size in association with glycochenodeoxycholate glucuronide (1) in elite athletes. The functional relevance of these variable effect sizes remains to be investigated.

When focusing on confirmed endurance metabolites, four mGWAS were identified, of which a positive association between rs10426201 in Sulfotransferase Family 2 A Member 1 (SULT2A1) in association with androstenediol (3alpha, 17alpha) monosulfate (2) was novel. SULT2A1 catalyzes the sulfation of steroids, a process that is fundamental for their function. Following biosynthesis, hydrophobic steroids become sulfated to accelerate their circulatory shuttling to target tissues. The expression of anion transporting polypeptides on target cells enables their uptake. Subsequently, intracellular sulfatases activate them by hydrolyzing the steroid sulfate esters44. The genetic predisposition of steroid sulfation in elite high endurance athletes may therefore explain active steroid biosynthesis in this group, and could potentially contribute to their elite physical performance. Furthermore, the identification of the genetic predisposition to enhanced activity of SULT2A1 could potentially be utilized to determine the percentage of sulfated intact molecules with relevance to steroid profiling parameters for antidoping strategies45.

Study limitation: The use of Mitchell’s criteria based on sport disciplines to dichotomize participants into two endurance groups12,46 is a crude method of categorization. A better phenotype would be the actual measurement of VO2max in these athletes. However, due to the strict institutional research board’s instructions, the only available information about participants were their sport disciplines and gender. Additionally, the relatively small number of participants may have limited the power of the study, however this remains the largest cohort of elite athletes with mGWAS data to date as elite athletes samples are very difficult to obtain. Future studies are warranted to confirm these findings in larger cohorts using more accurate measures of endurance.

This study reveals for the first-time evidence of genetically-influenced metabolites associated with elite athletic status in general and endurance sports in particular. Uncovering these novel associations in elite athletes, but not in the general population, could reflect a gene-environment (intensive exercise) interaction that augments the effect size of these genetic variants. Among the novel identified mQTLs, SNPs associated with enhanced endogenous steroids activity may play an important role in elite athletic performance, especially among endurance athletes. The utilization of these mQLTs as biomarkers for selecting athletic candidates with a greater potential to becoming elite endurance athletes is warranted and should be further validated. Additionally, the newly identified mQTLs in elite athletes could provide crucial information about the interaction between exercise and genetic predisposition of some doping-related metabolites, potentially paving the way for development of non-traditional indirect analytical strategies for the detection of novel doping strategies. Finally, the identification of these novel mQTLs could provide vital clues for potential therapeutic targets as they provide direct functional relationships between genes and their products/byproducts with therapeutic values.

Subjects and Methods


Blood and serum samples were collected at anti-doping laboratories in Qatar (ADLQ) and Italy (FMSI) from 490 elite athletes who participated in national or international sports events and tested negative for doping abuse. Written informed consent was obtained from each participant. This study was performed in line with the World Medical Association Declaration of Helsinki – Ethical Principles for Medical Research Involving Human Subjects. All protocols were approved by the Institutional Research Board of ADLQ (F2014000009). Table S1 summarizes the distribution of all recruited athletes according to their sports disciplines into moderate and high endurance, groups following published criteria46. It was not possible to involve patients or the public in this work.


Profiling of serum metabolites in 490 elite athletes (Table S1) was performed using protocols established at Metabolon, Durham, NC, USA. The platform utilizes Waters ACQUITY ultra-performance liquid chromatography (UPLC) and a Thermo Scientific Q-Exactive high resolution/accurate mass spectrometer interfaced with a heated electrospray ionization (HESI-II) source and Orbitrap mass analyzer operated at 35,000 mass resolution. Detailed protocol and QC measures were previously published12,31.

Genotyping methods

Genotyping of 490 elite athletes was conducted using Illumina Drug core BeadChip arrays. The chip contains 476728 SNPs including 240,000 highly-informative genome-wide tag SNPs and a novel 200,000 custom marker set designed to support studies of drug target validation and treatment response. The latter SNP set was selected to include the following: 1- genes involved in drug absorption, distribution, metabolism and excretion (ADME), 2- exome content coverage of genes encoding proteins closely related to targets of approved small molecule and biotherapeutic drugs or binding drug-like compounds, and 3- other useful content, including all SNPs associated at GWAS significance with any human trait marking the X and Y chromosomes and mitochondrial DNA, and for sample fingerprinting (common SNPs represented on major genome-wide array products from both Illumina and Affymetrix). These SNPs are expected to represent genes involved in controlling the same essential metabolic pathways that regulate the magnitude of physical performance. Following genotyping using Illumina’s Drug Core SNP array, the following SNP exclusion QC filters were adopted: genotype call rate < 98% (130526 SNPs were excluded), MAF < 0.01 (70210 SNPs were excluded) and Hardy Weinberg p value < 10−6 (976 SNPs were excluded), resulting in 275016 SNPs (Bonferroni significance = (0.05/(275016 × 751) = 2.4E-10) used for the analysis. Genotype distribution was compared among athletes grouped according to the endurance group of their respective sports (data not shown).

Statistical analysis of metabolomics data

A linear regression model was run using R statistical package (version 2.14, to assess association between metabolites and endurance level (moderate versus high). The model also corrected for the following possible confounders: sport power, gender, hemolysis levels (determined visually by Metabolon) and metabolites PCs. Multiple testing was Bonferroni corrected. A meta-analysis was utilized to identify metabolites equally influenced by endurance level in both metabolomics datasets in the current study and previously published study12. Initially, functions from the R library ‘esc’ were used to convert the beta value from the regression analysis of individual datasets into effect size (in this case, difference in mean between low and high levels of endurance). The metafor R library was then used to run the metanalysis on the derived effects size from the individual datasets. The p-values from the meta-analysis were corrected for multiple testing based on FDR correction.

mGWAS analysis

Associations between SNPs and metabolite levels were computed using lm function in R (version 3.3.1) while correcting for gender, hemolysis and predicted ethnicity based on caparison with 1000 genome project that was calculated with plink version 1.9. An additive inheritance model was used (SNPs were coded as 0,1,2 according to their genotype group). Manhattan and box plots were generated using R (version 3.3.1). Regional association plots were produced using SNIPA ( Loci and sentinel SNPs association results were divided into gene loci, and in each of these the sentinel SNP and sentinel metabolite were defined with 500Kb according to the SNP-metabolite association with the highest significance, defining the metabolite quantitative trait loci (mQTLs). In the case where a locus had sentinel SNP that could not be found in regional association plotting release grch37-1kgpp3v5 (eur) and lying in location of known gene, it was mentioned in association with known SNP loci. For example, SNP rs3733402 in locus 5 had P-value of 6.80E-12, however in regional association rs4241816 was indicated as it had the 2nd highest p-value 2.02E-11 in locus 5. Both are associated with same gene KLKB1.

mQTLS associated with Endurance

To determine mGWAS associated with endurance sports, SNPs that were significantly associated with 104 endurance metabolites (Table S2) were identified within the list of mQTLs from the mGWAS analysis (Table S4). Bonferroni p-value of 1.7 × 10−9 [0.05/(104 × 275016)] was used to report a significant association.

Ethics approval and consent to participate

This study was performed in accordance with the World Medical Association Declaration of Helsinki. All protocols were approved by the Institutional Research Board of anti-doping lab Qatar (F2014000009) and participants have given consent to participate.