Genome-wide association study identifies novel genetic variants contributing to variation in blood metabolite levels

Draisma, Harmen H. M.; Pool, René; Kobl, Michael; Jansen, Rick; Petersen, Ann-Kristin; Vaarhorst, Anika A. M.; Yet, Idil; Haller, Toomas; Demirkan, Ayşe; Esko, Tõnu; Zhu, Gu; Böhringer, Stefan; Beekman, Marian; van Klinken, Jan Bert; Römisch-Margl, Werner; Prehn, Cornelia; Adamski, Jerzy; de Craen, Anton J. M.; van Leeuwen, Elisabeth M.; Amin, Najaf; Dharuri, Harish; Westra, Harm-Jan; Franke, Lude; de Geus, Eco J. C.; Hottenga, Jouke Jan; Willemsen, Gonneke; Henders, Anjali K.; Montgomery, Grant W.; Nyholt, Dale R.; Whitfield, John B.; Penninx, Brenda W.; Spector, Tim D.; Metspalu, Andres; Eline Slagboom, P.; van Dijk, Ko Willems; ‘t Hoen, Peter A. C.; Strauch, Konstantin; Martin, Nicholas G.; van Ommen, Gert-Jan B.; Illig, Thomas; Bell, Jordana T.; Mangino, Massimo; Suhre, Karsten; McCarthy, Mark I.; Gieger, Christian; Isaacs, Aaron; van Duijn, Cornelia M.; Boomsma, Dorret I.

doi:10.1038/ncomms8208

Article
Published: 12 June 2015

Genome-wide association study identifies novel genetic variants contributing to variation in blood metabolite levels

Harmen H. M. Draisma^1,2,3^na1,
René Pool^1,2,4^na1,
Michael Kobl⁵,
Rick Jansen⁶,
Ann-Kristin Petersen⁵,
Anika A. M. Vaarhorst^4,7,8,
Idil Yet⁹,
Toomas Haller¹⁰,
Ayşe Demirkan^11,12,
Tõnu Esko^10,13,14,15,
Gu Zhu¹⁶,
Stefan Böhringer¹⁷,
Marian Beekman⁷,
Jan Bert van Klinken¹¹,
Werner Römisch-Margl¹⁸,
Cornelia Prehn¹⁹,
Jerzy Adamski^19,20,21,
Anton J. M. de Craen²²,
Elisabeth M. van Leeuwen¹²,
Najaf Amin¹²,
Harish Dharuri¹¹,
Harm-Jan Westra²³,
Lude Franke ORCID: orcid.org/0000-0002-5159-8802²³,
Eco J. C. de Geus^1,2,
Jouke Jan Hottenga^1,2,
Gonneke Willemsen^1,2,
Anjali K. Henders¹⁶,
Grant W. Montgomery¹⁶,
Dale R. Nyholt^16,24,
John B. Whitfield ORCID: orcid.org/0000-0002-1103-0876¹⁶,
Brenda W. Penninx^2,25,
Tim D. Spector⁹,
Andres Metspalu¹⁰,
P. Eline Slagboom^4,7,
Ko Willems van Dijk ORCID: orcid.org/0000-0002-2172-7394^11,26,
Peter A. C. ‘t Hoen¹¹,
Konstantin Strauch^5,27,
Nicholas G. Martin¹⁶,
Gert-Jan B. van Ommen ORCID: orcid.org/0000-0002-8603-218X¹¹,
Thomas Illig^28,29,30,
Jordana T. Bell⁹,
Massimo Mangino⁹,
Karsten Suhre ORCID: orcid.org/0000-0001-9638-3912^18,31,32,
Mark I. McCarthy^33,34,35,
Christian Gieger^5,28,36,
Aaron Isaacs¹²,
Cornelia M. van Duijn^4,8,12^na2 &
…
Dorret I. Boomsma^1,2,4^na2

Nature Communications volume 6, Article number: 7208 (2015) Cite this article

8817 Accesses
125 Citations
14 Altmetric
Metrics details

Subjects

Abstract

Metabolites are small molecules involved in cellular metabolism, which can be detected in biological samples using metabolomic techniques. Here we present the results of genome-wide association and meta-analyses for variation in the blood serum levels of 129 metabolites as measured by the Biocrates metabolomic platform. In a discovery sample of 7,478 individuals of European descent, we find 4,068 genome- and metabolome-wide significant (Z-test, P<1.09 × 10⁻⁹) associations between single-nucleotide polymorphisms (SNPs) and metabolites, involving 59 independent SNPs and 85 metabolites. Five of the fifty-nine independent SNPs are new for serum metabolite levels, and were followed-up for replication in an independent sample (N=1,182). The novel SNPs are located in or near genes encoding metabolite transporter proteins or enzymes (SLC22A16, ARG1, AGPS and ACSL1) that have demonstrated biomedical or pharmaceutical importance. The further characterization of genetic influences on metabolic phenotypes is important for progress in biological and medical research.

You have full access to this article via your institution.

Download PDF

Whole-Genome Sequencing Analysis of Human Metabolome in Multi-Ethnic Populations

Article Open access 30 May 2023

Rare and common genetic determinants of metabolic individuality and their effects on human health

Article Open access 10 November 2022

Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases

Article 12 January 2023

Introduction

Metabolite levels in human blood reflect the physiological state of the body, and may differ between individuals because of variation in genetic makeup and environmental exposure¹. The study of the genetic contribution to variation in metabolite levels is an important basis for improved aetiological understanding, prevention, diagnosis and treatment of complex disorders^1,2. Modern high-throughput metabolomics enables the cost-effective measurement of large metabolite panels in blood samples obtained from many individuals. The data generated by such metabolomic experiments have been combined with genotypic data in several recent genome-wide association (GWA) studies^{2,3,4,5,6,7,8,9,10,11,12}. Indeed, the combined investigation of large numbers of genetic variants and large numbers of metabolic traits is beginning to draw a systems-wide overview of genetic influences on human metabolism¹¹. However, the heritability estimates from twin and family studies^9,10,11,13 suggest that additional genetic variants influencing variation in serum metabolite levels remain to be found in GWA studies.

In the current study, we set out to further characterize the genetic contribution to variation in human blood metabolite levels. We perform GWA and meta-analyses for the concentrations of 129 serum metabolites in seven independent cohorts, with replication analyses in one additional cohort. To functionally characterize the significant single-nucleotide polymorphism (SNP)-metabolite associations, we integrate the results of the GWA meta-analyses with those from gene expression analysis in whole blood and the liver. Finally, we compare the variance explained by significantly associated SNPs with heritability estimates for each metabolite.

We identify 4,068 significant SNP-metabolite associations, involving 59 independent SNPs and 85 different metabolites. Five of the fifty-nine independent SNPs are novel for serum metabolite levels. The newly found SNP-metabolite associations may lead to a better understanding of cardiovascular and metabolic disease, and may have implications for chemotherapy. Our findings contribute to the understanding of human metabolism.

Results

Discovery meta-analysis of GWA scans

Primary genetic association analyses were carried out in seven cohorts (TwinsUK, KORA, EGCUT, LLS, QIMR, ERF and NTR) with a combined sample size of 7,478 individuals. Characteristics of the study participants included in the analyses (all of European descent) are given in Supplementary Table 1. Within each cohort, SNP genotypes were imputed and analysed for association with the concentrations of each metabolite, assuming a linear model of association and correcting for population stratification (see Methods and Supplementary Table 2). Supplementary Tables 3–5 and Supplementary Data 1 list the characteristics of the 129 metabolites (18 acylcarnitines, 14 amino acids, 82 glycerophospholipids, 14 sphingolipids and hexose) that were measured in the serum samples from all study participants using the Biocrates platform. The cohort-level GWA results were pooled in inverse variance-weighted, fixed-effects meta-analysis. The values of the genomic control lambda (λ_GC, applied to the individual cohort-level results for each metabolite before meta-analysis) varied between 0.976 and 1.081 across all metabolites and cohorts (see Supplementary Table 6), suggesting little residual influence on the GWA results of population stratification and other potential confounders. A three-dimensional Manhattan plot providing an overview of the association P values in the discovery phase for all metabolites is given in Fig. 1; two-dimensional Manhattan plots and quantile-quantile plots for each metabolite separately are given in Supplementary Figs 1 and 2, respectively. Overall, 4,068 SNP-metabolite associations reached genome- and metabolome-wide significance (Z-test, P<1.09 × 10⁻⁹), which reduced to 123 associations involving 59 independent SNPs and 85 different metabolites. Of these 123 associations (listed in Supplementary Data 2), 4 represented secondary association signals according to approximate conditional analysis. Regional association plots, showing the association signals in the regions surrounding the lead metabolomic SNPs, are given for all 123 associations in Supplementary Fig. 3. SNPs representing independent association signals were aggregated into 31 genomic loci, which are listed in Supplementary Data 3. Figure 2 depicts all associations between loci and metabolites as detected in the discovery phase.

**Figure 1: Manhattan plots for all metabolites targeted by the Biocrates AbsoluteIDQ p150 kit (N=[1,497, 7,478]).**

**Figure 2: Associations between loci and metabolites detected in stage 1 meta-analysis in the current study (N=[1,588, 7,478]).**

Five independent SNPs had not been associated with variation in serum metabolite levels in previous GWA studies (see Table 1). To further interpret the association of the remaining 54 SNPs with serum metabolite concentrations, we compared our findings with those from 11 published GWA studies^{2,3,4,5,6,7,8,9,10,11,12} for which at least one of the included metabolites overlapped with the current study. The identified associations of known SNPs with metabolites that were significant in discovery stage meta-analysis in the current study and that had not been reported in those previous studies are highlighted in Fig. 2 and in Supplementary Data 2.

Table 1 Novel SNPs for serum metabolite levels identified in the current study.

Full size table

Replication analysis

Replication analyses were performed in an independent sample (N=1,182) from the KORA S4 cohort (hereafter KORA S4 replication sample) for the five new SNPs for serum metabolite levels that had been found in the discovery phase meta-analysis. The associations with their most strongly associated metabolite were replicated for four of these five novel SNPs; the only non-replicated association was that between SNP rs7582179 and metabolite PC ae C44:5. Although the effect sign was concordant between the discovery set and the KORA S4 replication sample (Table 1), this association was significant in the discovery phase for the NTR and KORA cohorts only (see Supplementary Fig. 4).

Integration with gene expression analysis results

We integrated the results of the metabolomics discovery stage GWA meta-analysis with the results of gene expression analyses in whole blood and the liver. In whole blood, both cis and trans expression quantitative trait locus (cis-eQTL and trans-eQTL, respectively) analyses were performed in two different samples originating from the United Kingdom, the Netherlands and Estonia: the Dutch NTR-NESDA sample (N=5,071) and the Fehrmann-EGCUT sample comprising data from three cohorts that were meta-analysed (total N=2,360; see Methods and Supplementary Methods). The results of cis-eQTL analysis for lead metabolomic SNPs showing overlap with cis-eQTL SNPs are given for the NTR-NESDA and Fehrmann-EGCUT samples in Supplementary Data 4 and 5, respectively. Significant (false discovery rate<0.05) trans-eQTL effects for lead metabolomic SNPs in the Fehrmann-EGCUT sample are listed in Supplementary Data 6. We did not detect trans-eQTL effects for the lead metabolomic SNPs in the NTR-NESDA sample. Thirty-five lead metabolomic SNPs identified cis-eQTLs in at least one of the searched tissues (i.e., whole blood and/or the liver) with a (t-, Z- or Kruskal-Wallis test) P<0.001, defining a total of 67 SNP-gene pairs and 28 different genes (see Supplementary Data 7). The cis-eQTL analysis results were used to support the annotation of likely causal genes to loci that displayed significant association with variation in serum metabolite concentrations in the discovery stage meta-analysis (see Supplementary Data 3). Of the 28 genes, 14 were predicted to be causal on the basis of our annotation and the other 14 were predicted to be non-causal.

Variance explained

It has been described previously that a relatively small number of genetic variants can explain a relatively large proportion of the variance observed for serum metabolite levels^4,5,9. Therefore, we compared the variance in serum metabolite levels explained by significantly associated SNPs, with the heritability as estimated in a monozygotic twin sample from the NTR cohort (N=181 pairs; see Fig. 3)¹³. Among all metabolites, the largest proportion of phenotypic variance was explained for C9 (13%; see Supplementary Data 2). In the current study, this metabolite associated significantly with SNPs in the THEM4 and CPS1_ACADL loci. The largest proportion of heritability was explained for lysoPC a C20:4 (19%; corresponding with 10% of the phenotypic variance), which was associated with a SNP in the FADS1–3 locus. The results of polygenic score (PGS) analyses (see Supplementary Fig. 5 and Supplementary Note 1) suggest different genetic background of variation in serum levels for different metabolites, ranging from close-to-monogenic to highly polygenic.

**Figure 3: Decomposition of variation in serum metabolite levels.**

Discussion

We set out to enhance the current understanding of the genetic underpinnings of variation in circulating metabolite levels in humans. To this end, we employed a well-established targeted metabolomics platform (Biocrates) in combination with genome-wide SNP genotyping and imputation in eight independent cohorts of European descent. By meta-analysis of GWA analyses carried out for each of 129 metabolites measured in the serum samples of all individual study participants, the current study identified 123 significant SNP-metabolite associations between 59 independent SNPs and 85 different metabolites. Five of the independent SNPs were new for variation in serum metabolite levels.

Consistent with previous reports, for the majority of all 59 independent SNPs we were able to annotate a likely causal gene, which in most cases encoded a metabolite transporter protein or enzyme. The five new SNPs for serum metabolite levels are also all located nearby such genes, and their associations with metabolites tend to match the known function of these genes. SNP rs7582179 in the AGPS gene is associated with the choline plasmalogen PC ae C44:5. Mutations in AGPS (encoding the enzyme alkylglycerone phosphate synthase) are known to cause rhizomelic chondrodysplasia punctata type 3 (RCDP3; OMIM:600121 (ref. 14)), a rare autosomal recessive disorder that is fatal, with death occurring often early in childhood. Clinically, RCDP3 is characterized by significantly delayed and abnormal physical and mental development, with shortness of the proximal limb bones (‘rhizomelia’) being one of the hallmarks. RCDP3 has been shown to result from reduced production of plasmalogens (a type of ether phospholipids) by alkylglycerone phosphate synthase in peroxisomes. The association in the current study of a SNP within the AGPS gene with the serum concentration of the choline plasmalogen PC ae C44:5 is therefore perfectly concordant with the known gene-disease link between AGPS and RCDP3. PC ae C44:5 also associated significantly with the new SNP rs7700133 located near the ACSL1 gene, encoding long-chain acyl CoA synthetase 1. Previous studies have shown links between genetic and transcriptional variation of ACSL1 and the metabolic syndrome^15,16. The new SNP rs12210538, located within the SLC22A16 gene, associated with the two acylcarnitines C18:1 and C18:2. This gene encodes a carnitine transporter that mediates the uptake of anticancer drugs such as bleomycin and doxorubicin into tumour cells, and its activity correlates with treatment response^17,18. Significant associations were found for two SNPs (rs17657817 and rs2246012) located inside the ARG1 gene (coding for the enzyme arginase) with serum concentrations of the amino acid ornithine that participates in the urea cycle. Importantly, the global arginine bioavailability ratio (that is, the ratio of arginine to ornithine and citrulline¹⁹) is of interest as a potential biomarker for endothelial dysfunction, which is a known risk factor for the development of cardiovascular disease²⁰. The two newly identified SNPs associated with serum ornithine levels might now be used as instrumental variables in cost-effective Mendelian randomization studies in large samples of individuals, to investigate the possible causal relationship among ornithine, endothelial dysfunction and subsequent cardiovascular disease²¹. Among three meta-analyses of coronary artery disease and myocardial infarction as carried out by the CARDIoGRAMplusC4D Consortium, the association with SNP rs2246012 was the strongest in the CARDIoGRAM GWA study (P=0.002) involving 22,233 cases and 64,762 controls²². This suggests that the link between genetic variation at SNP rs2246012 and variation in serum ornithine levels as identified in the current study will indeed be useful to further establish the possible link between the ARG1 gene and cardiovascular disease.

We compared the significant SNP-metabolite associations from the current study with those reported in 11 previous publications that employed high-resolution methods to assay the serum metabolome. For several SNPs that were associated with variation in metabolite levels in the previous studies, we identified new associations with individual metabolites. These new associations strengthen the evidence for the associations of these known SNPs with specific metabolites, demonstrating their extended effect on phenotypes that are closely related to the metabolites with which their association was discovered initially^4,23. Also, it has been demonstrated that GWA studies with the more refined metabolic phenotypes provided by metabolomics often yield effect sizes that are larger than those observed in GWA studies of composite measures such as high-density lipoprotein cholesterol, suggesting that these more refined metabolomic phenotypes provide better intermediate traits^8,9. In this context, it is interesting to note that the large proportion of explained heritability we observed for lysoPC a C20:4 (19%) was caused exclusively by the association with a single SNP in the FADS1–3 locus, an observation that is in line with the results from previous studies^4,5,9.

In conclusion, the results obtained in the current study contribute to the understanding of the genetic background of variation in serum metabolite levels, and are important for further progress in biomedical and pharmaceutical research.

Methods

Participants

The meta-analysis included GWA data from 7,478 participants from seven cohorts originating from five countries (the Netherlands, Germany, Australia, Estonia and the United Kingdom). The independent test sample (‘KORA S4 replication sample’) consisted of 1,182 additional KORA participants. The following local research ethics committees approved the individual studies: KORA, Ethics Committee of the Bavarian Medical Association (Bayerische Landes-ärztekammer); NTR, Central Ethics Committee on Research Involving Human Subjects of the VU University Medical Center, Amsterdam; EGCUT, Ethics Review Committee on Human Research of the University of Tartu; TwinsUK, St Thomas’ Hospital ethics committee; ERF, medical ethics board of the Erasmus MC Rotterdam, the Netherlands; LLS, Medical Ethical Committee of the Leiden University Medical Centre; QIMR, QIMR Human Research Ethics Committee. Informed consent was obtained from all participants. Sample characteristics for all cohorts included in this study and detailed study sample descriptions are given in Supplementary Table 1 and in the Supplementary Methods, respectively.

Biocrates metabolite quantification

Targeted metabolomic measurements were performed using electrospray–flow injection analysis–tandem mass spectrometry methods and the Biocrates AbsoluteIDQ p150 kit (BIOCRATES Life Sciences AG), which enables quantification of a total of 163 metabolites (see Supplementary Table 3 for an overview of all metabolites targeted by this kit)²⁴. The method of AbsoluteIDQ p150 kit has been proven to be in conformance with FDA Guideline ‘Guidance for Industry—Bioanalytical Method Validation (May 2001)’²⁵, which implies proof of reproducibility within a given error range. For all cohorts, metabolite measurements were carried out at the Metabolomics Platform of the Genome Analysis Center at the Helmholtz Zentrum München, Germany as per the manufacturer’s instructions^4,24,26,27. In brief, the used metabolomic measurement technique is based on a targeted profiling scheme that is used to quantitatively screen for known small-molecule metabolites by multiple reaction monitoring, neutral loss and precursor-ion scans. Internal standards served as reference for the calculation of all metabolite concentrations, which are reported as micromolar. Data evaluation for quantification of metabolite concentrations and quality assessment have been performed with the MetIDQ software package, which is an integral part of the AbsoluteIDQ kit. Stability of the assay was assessed using the measurement results of five aliquots of the same reference blood sample on every plate. Quality control of the Biocrates metabolite concentration measurement data was performed by each participating cohort as follows²⁷: for each cohort, metabolite profile measurements for all individuals were performed on multiple plates. For each metabolite i and plate j, the coefficient of variation (CV_i,j) was calculated as: , where the standard deviation (s.d.) and mean were calculated over all reference measurements per plate j (five per plate). Summary statistics for the metabolite concentration data for each cohort were compared with the measurement detection limit specifications as reported by the manufacturer of the AbsoluteIDQ p150 kit (BIOCRATES). A metabolite was excluded from further analyses for a particular cohort if its concentration measurement data did not meet all of the following criteria: (i) mean CV_i over all plates <25%; (ii) ≤5% missing values; (iii) median >lower limit of quantification (for metabolites reported as absolute concentrations) or >limit of detection (for semiquantitatively measured metabolites). Outlying metabolite concentration values (data points) and outlying samples were also removed, and the missing data points were imputed with the ‘R’²⁸ package ‘mice’²⁷. The resulting concentration data for each metabolite were natural log-transformed in order to attain a normal distribution. Throughout the article, names of lipids detected by the Biocrates AbsoluteIDQ p150 platform are abbreviated as follows: acylcarnitines, Cx:y; hydroxylacylcarnitines, C(OH)x:y; dicarboxylacylcarnitines, Cx:y-DC; sphingomyelins, SMx:y; N-hydroxylacyloylsphingosylphosphocholine, SM (OH) x:y; phosphatidylcholines, PC (aa=diacyl, ae=acyl-alkyl). Lipid side chain composition is abbreviated as Cx:y, where x denotes the number of carbons in the side chain and y the number of double bonds.

Association analyses and meta-analyses

Genome-wide SNP genotyping was performed in each cohort with standard genotyping technologies (see Supplementary Table 2 and Supplementary Methods). For the samples contributing to the stage 1 (discovery) meta-analysis, imputation was conducted with reference to HapMap phase 2 build 36 release 22 or 24 CEU (Utah residents of Northern and Western European ancestry)²⁹ phased genotypes. For the KORA S4 replication sample, SNP genotypes were imputed against the 1,000 g phase1 integrated haplotypes reference set. Association analysis was performed assuming a linear regression model for each SNP, adjusting for relatedness, age, sex and study specific (for example, ancestry-informative principal component scores) covariates as necessary (see Supplementary Table 2). Positions of all SNPs described in the current manuscript were mapped to those as in the HapMap 2 Build 36 release 24 reference set (hg18). Meta-analysis of GWA results obtained in the cohorts participating in stage 1 was performed as follows: the expected minor allele count (eMAC) was computed at the cohort level for each SNP as

where N is the study sample size, MAF is the minor allele frequency and I_A is the SNP genotype imputation quality measure. SNPs for which eMAC<25 were filtered out of the GWA results for the cohort under consideration. After applying genomic control at the individual cohort level, two independent analysts carried out additive model fixed-effects meta-analysis of association data for imputed autosomal SNPs, using two different software packages (METAL³⁰ and GWAMA³¹). For a given SNP, cohort-specific effect size estimates were weighted inversely with their variance. Throughout the manuscript, we report the P values as resulting from Z-tests of association as carried out by METAL. A P value equal to 5.0 × 10⁻⁸ was adopted as the threshold for genome-wide suggestive association between a SNP and the concentration of a metabolite, based on the approximate number of independent SNPs in samples of European ancestry³². To obtain a threshold for significant association, taking account of the number of metabolites tested and their intercorrelations, the threshold value for suggestive association was divided by the number of independent tests (Meffli) in the metabolomic data as estimated using the method of Li and Ji³³. The value of Meffli was estimated on the basis of the metabolite profiling data in two independent cohorts (ERF and NTR). Meffli was estimated to be equal to 46 in both ERF and NTR, rendering the P value threshold for genome- and metabolome-wide significance in the present study to be equal to 5.0 × 10⁻⁸/46=1.09 × 10⁻⁹.

Definition of loci and secondary signals analysis

Independent signals of association were identified in the GWA meta-analysis results for each metabolite separately, using the linkage disequilibrium-based ‘clumping’ procedure as implemented in PLINK³⁴. This procedure takes all SNPs that show a P value of association with a phenotype below a threshold (‘—clump-p1’), and forms clumps of these ‘index’ SNPs together with all other SNPs that are in linkage disequilibrium with (controlled by the parameter ‘—clump-r2’) and in physical proximity (controlled by parameter ‘—clump-kb’) to these index SNPs. For the current study, we used the following parameter settings: ‘—clump-p1’, 5.0 × 10⁻⁸; ‘—clump-r2’, 0.1; ‘—clump-kb’, 1,000. As input for the ‘clumping’ procedure, we used association P values from the discovery phase meta-analysis results and linkage disequilibrium patterns as estimated from the HapMap 2 Build 36 release 24 reference set. For each metabolite separately, secondary association signals at a locus were verified by approximate conditional analysis as implemented in GCTA³⁵. In this analysis, the association for the secondary association signal ‘top’ SNP (that is, the SNP with the lowest P value of association with variation in serum metabolite level at the secondary association signal) was conditioned on the top SNP for the locus and metabolite under consideration. As input for the approximate conditional analysis, we used the discovery phase GWA meta-analysis results and the imputed SNP genotype data from the NTR cohort as a reference for LD structure³⁶. We report only secondary signal top SNPs for which the P value remained <1.09 × 10⁻⁹ in this approximate conditional analysis. In the current manuscript, the term ‘lead metabolomic SNP’ refers to a top SNP at a locus or secondary association signal for one or more metabolites. We identified genomic loci significantly associating with metabolite levels by grouping lead metabolomic SNPs located within 1 Mb from each other over all metabolites.

Identification of new SNP–metabolite associations

The approach used in the current study to identify novel associations between SNPs and serum metabolite levels is described in Supplementary Fig. 6 and in the Supplementary Methods. In brief, we applied two complementary methods: the first method identified novel SNPs associated with variation in serum metabolite levels, and the second method identified novel SNP-metabolite associations with respect to 11 previous GWA studies that included at least one metabolite that was also included in the current study.

Replication analyses

Replication analyses were performed in the KORA S4 replication sample for the associations of the five new SNPs for serum metabolite levels (listed in Table 1) with their lead metabolites (that is, the metabolites that were most strongly associated with these SNPs in the discovery phase meta-analysis).

Lookup of association with cardiovascular disease

Data on coronary artery disease/myocardial infarction have been contributed by CARDIoGRAMplusC4D investigators and have been downloaded from www.CARDIOGRAMPLUSC4D.ORG. We performed a lookup of the associations with SNP rs2246012 in the results from all three meta-analyses as provided on this website^22,37,38.

Association and Manhattan plots

For each lead metabolomic SNP, the LocusZoom³⁹ tool was used to generate association plots in the region between 500 kb before the locus minimum position and 500 kb after the locus maximum position. Manhattan plots for each metabolite were generated based on the discovery phase meta-analysis results using in-house developed Python⁴⁰ code.

SNP annotation

To facilitate the manual process of selecting plausible candidate genes for each locus, we used an automated workflow developed in-house to generate reports containing the associated protein, enzyme, metabolic reaction, pathway and disease phenotypes of each gene within a ±500 kb window of each lead metabolomic SNP. SNPs within this window that were published in GWAS Catalog⁴¹ or in GTEx-eQTL (http://www.ncbi.nlm.nih.gov/gtex/GTEX2) were also listed. In detail, the reports created by our workflow were based on the NCBI-Gene (http://www.ncbi.nlm.nih.gov/gene), GTEx-eQTL, GWAS Catalog, ConsensusPathDB⁴², UniProtKB⁴³, OMIM⁴⁴, Gene Ontology⁴⁵, TCDB⁴⁶, ExPASy⁴⁷ and KEGG databases⁴⁸. These databases had been downloaded earlier from the respective File Transfer Protocol servers and have been integrated offline in MATLAB (R2009a, The Mathworks Inc.). Overlap of lead metabolomic SNPs with cis-eQTL SNPs was also used as evidence to support the annotation of likely causal genes to loci. In case no biologically plausible gene could be found, the locus was given the name of the nearest gene; a similar approach was followed in the study by Shin et al.¹¹

Variance explained

We estimated the proportion of phenotypic variance explained by each independent association signal (lead SNP for a locus or secondary association signal) as Pearson’s phi coefficient squared:

where ; N is the sample size in the discovery phase meta-analysis for the SNP–metabolite association under consideration; is the ordinary least-squares estimate of (that is, the regression coefficient for the SNP as estimated in the discovery phase meta-analysis); and is its standard error. For each metabolite, we added up the proportions of variance in metabolite level explained by independent association signals to estimate the total proportion of phenotypic variance explained. We also estimated for each metabolite the proportion of heritability of metabolite level variability explained by approximately independent association signals. As an estimate of heritability for this analysis we used the monozygotic twin correlations for each metabolite, based on data from 181 pairs from the NTR cohort¹³. The proportion of heritability in metabolite level variation explained by independent association signals was estimated by dividing the proportion of phenotypic variance explained by independent association signals, by the monozygotic twin correlation. The total proportion of heritability explained for a particular metabolite was estimated by adding up the proportions of variance explained by all approximately independent association signals for that metabolite.

PGS analysis

We investigated evidence for the polygenic nature of variation in serum metabolite levels by building a multi-SNP predictor from the meta-analysis results for each metabolite to predict the levels of the same metabolite in an independent target cohort (KORA S4 replication sample). Such a multi-SNP predictor, or PGS, reflects the weighted sum of multiple SNPs associated with a phenotype. The discovery meta-analysis forms the basis to select SNPs based on liberal significance thresholds (for example, 0.001, 0.01 and so on). In the target sample, PGSs are calculated for each individual for each set of SNPs by multiplying the number of effect alleles per SNP (0, 1 or 2) with the beta from the meta-analysis, summed over all SNPs in the set of SNPs. We performed the PGS analysis⁴⁹ using the regression coefficients (betas) from the discovery phase meta-analyses as weights. Analyses were performed for the 127 metabolites for which concentration data were available both in the discovery sample and in the KORA S4 target sample. For each of these metabolites, SNPs representing approximately independent association signals were selected in the discovery phase meta-analysis results using the PLINK clumping procedure. SNPs with P values of association with metabolite concentration levels in the discovery meta-analysis below the following thresholds were included: P<1.0 × 10⁻⁸; P<1.0 × 10⁻⁷; P<1.0 × 10⁻⁶; P<1.0 × 10⁻⁵; P<1.0 × 10⁻⁴; P<1.0 × 10⁻³; P<1.0 × 10⁻²; P<5.0 × 10⁻²; P<0.1; P<0.2; P<0.3; P<0.4; P<0.5; P<0.6; P<0.7; P<0.8; P<0.9; P<1.0. For each clump of SNPs, the index SNP was taken for possible inclusion in the score computation. From the resulting set of SNPs eligible for inclusion in the PGS analysis, A/T and G/C SNPs for which (0.35<MAF<0.50) were excluded because these SNPs are potentially ambiguous and therefore may lead to spurious association in the case of strand flips⁵⁰. From the imputed SNP genotype data for the KORA S4 target sample, the SNPs corresponding with the remaining clump index SNPs were selected. A PGS was constructed for each individual in the KORA S4 replication sample using the ‘—score’ procedure as implemented in PLINK v. 1.07 (http://pngu.mgh.harvard.edu/~purcell/plink/profile.shtml). The resulting PGS was included as a covariate in a multiple linear regression analysis that was similar to the regression that was carried out in the primary single SNP-based GWA analysis:

where y represents log(metabolite) values, and the study-specific covariates include adjustments for, for example, population stratification (and thus β₄ can be a vector). The Biocrates metabolite values were obtained and preprocessed using the same methods as described for the primary GWA in the Section ‘Biocrates metabolite quantification’. The proportion of variance explained by the PGSs was assessed by comparing the raw (that is, unadjusted) R² values for the ‘full’ model (that is, a model including the genetic score as a covariate) with the raw R² values when fitting a ‘reduced’ model that did not include the genetic score as a covariate⁵¹. The significance of the association of the PGSs with serum metabolite levels was estimated on the basis of the P value of association for β₁ according to the full model.

EQTL analyses

Data from two independent samples originating from the United Kingdom, the Netherlands and Estonia (the Dutch NTR-NESDA sample and the Fehrmann-EGCUT sample) were used for cis- and trans-eQTL mapping in whole blood. Details of these analyses, and of the integration of the cis-eQTL analysis results with the results from the metabolomics genome-wide meta-analysis, are provided in the Supplementary Methods. We also assessed the overlap of lead metabolomic SNPs with cis-eQTL signals in the liver as catalogued in the GTEx-eQTL database, following the method of Shin et al.¹¹: for each lead metabolomic SNP, we retrieved all SNPs with r²>0.8 in the 1000 Genomes Project pilot phase (CEU population). All cis-eQTLs within a 1-Mb window centred on the lead SNP were retrieved from the GTEx-eQTL database, and the best eQTL P value was noted. The cis-eQTL results for which overlap with lead metabolomic SNPs was shown and that displayed association P values <0.001 are given in Supplementary Data 7.

Meta-analysis

The full meta-analysis results for all metabolites are available at www.tweelingenregister.org/engagebiocratesgwama

Additional information

Accession codes. The metabolomic data described in this manuscript are uploaded to MetaboLights and are available under the accession code MTBLS192.

How to cite this article: Draisma, H.H.M. et al. Genome-wide association study identifies novel genetic variants contributing to variation in blood metabolite levels. Nat. Commun. 6:7208 doi: 10.1038/ncomms8208 (2015).

References

Suhre, K. & Gieger, C. Genetic variation in metabolic phenotypes: study designs and applications. Nat. Rev. Genet. 13, 759–769 (2012).
Article CAS PubMed Google Scholar
Suhre, K. et al. Human metabolic individuality in biomedical and pharmaceutical research. Nature 477, 54–60 (2011).
Article CAS ADS PubMed Google Scholar
Nicholson, G. et al. A genome-wide metabolic QTL analysis in Europeans implicates two loci shaped by recent positive selection. PLoS Genet. 7, e1002270 (2011).
Article CAS PubMed PubMed Central Google Scholar
Illig, T. et al. A genome-wide perspective of genetic variation in human metabolism. Nat. Genet. 42, 137–141 (2010).
Article CAS PubMed Google Scholar
Gieger, C. et al. Genetics meets metabolomics: a genome-wide association study of metabolite profiles in human serum. PLoS Genet. 4, e1000282 (2008).
Article PubMed PubMed Central Google Scholar
Demirkan, A. et al. Genome-wide association study identifies novel loci associated with circulating phospho- and sphingolipid concentrations. PLoS Genet. 8, e1002490 (2012).
Article CAS PubMed PubMed Central Google Scholar
Hicks, A. A. et al. Genetic determinants of circulating sphingolipid concentrations in European populations. PLoS Genet. 5, e1000672 (2009).
Article PubMed PubMed Central Google Scholar
Tukiainen, T. et al. Detailed metabolic and genetic characterization reveals new associations for 30 known lipid loci. Hum. Mol. Genet 21, 1444–1455 (2012).
Article CAS PubMed Google Scholar
Kettunen, J. et al. Genome-wide association study identifies multiple loci influencing human serum metabolite levels. Nat. Genet. 44, 269–276 (2012).
Article CAS PubMed PubMed Central Google Scholar
Rhee, E. P. et al. A genome-wide association study of the human metabolome in a community-based cohort. Cell Metab. 18, 130–143 (2013).
Article CAS PubMed PubMed Central Google Scholar
Shin, S. Y. et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014).
Article CAS PubMed PubMed Central Google Scholar
Yu, B. et al. Genetic determinants influencing human serum metabolome among African Americans. PLoS Genet. 10, e1004212 (2014).
Article PubMed PubMed Central Google Scholar
Draisma, H. H. et al. Familial resemblance for serum metabolite concentrations. Twin Res. Hum. Genet. 16, 948–961 (2013).
Article PubMed Google Scholar
Itzkovitz, B. et al. Functional characterization of novel mutations in GNPAT and AGPS, causing rhizomelic chondrodysplasia punctata (RCDP) types 2 and 3. Hum. Mutat. 33, 189–197 (2012).
Article CAS PubMed Google Scholar
Gertow, K. et al. Fatty acid handling protein expression in adipose tissue, fatty acid composition of adipose tissue and serum, and markers of insulin resistance. Eur. J. Clin. Nutr. 60, 1406–1413 (2006).
Article CAS PubMed Google Scholar
Phillips, C. M. et al. Gene-nutrient interactions with dietary fat modulate the association between genetic variation of the ACSL1 gene and metabolic syndrome. J. Lipid. Res. 51, 1793–1800 (2010).
Article CAS PubMed PubMed Central Google Scholar
Aouida, M., Poulin, R. & Ramotar, D. The human carnitine transporter SLC22A16 mediates high affinity uptake of the anticancer polyamine analogue bleomycin-A5. J. Biol. Chem. 285, 6275–6284 (2010).
Article CAS PubMed Google Scholar
Bray, J. et al. Influence of pharmacogenetics on response and toxicity in breast cancer patients treated with doxorubicin and cyclophosphamide. Br. J. Cancer 102, 1003–1009 (2010).
Article CAS PubMed PubMed Central Google Scholar
Tang, W. H., Wang, Z., Cho, L., Brennan, D. M. & Hazen, S. L. Diminished global arginine bioavailability and increased arginine catabolism as metabolic profile of increased cardiovascular risk. J. Am. Coll. Cardiol. 53, 2061–2067 (2009).
Article CAS PubMed PubMed Central Google Scholar
Tripolt, N. J. et al. Multifactorial risk factor intervention in patients with Type 2 diabetes improves arginine bioavailability ratios. Diabet. Med. 29, e365–e368 (2012).
Article CAS PubMed Google Scholar
Brion, M.-J. A., Benyamin, B., Visscher, P. M. & Davey Smith, G. Beyond the single SNP: emerging developments in Mendelian randomization in the ‘omics’ era. Curr. Epidemiol. Rep 1, 228–236 (2014).
Article Google Scholar
Schunkert, H. et al. Large-scale association analysis identifies 13 new susceptibility loci for coronary artery disease. Nat. Genet. 43, 333–338 (2011).
Article CAS PubMed PubMed Central Google Scholar
Dharuri, H. et al. Genetics of the human metabolome, what is next? Biochim. Biophys. Acta 1842, 1923–1931 (2014).
Article CAS PubMed Google Scholar
Römisch-Margl, W. et al. Procedure for tissue sample preparation and metabolite extraction for high-throughput targeted metabolomics. Metabolomics 8, 133–142 (2012).
Article Google Scholar
US Department of Health and Human Services,. F. a. D. A., Center for Drug Evaluation and Research (CDER), Center for Veterinary Medicine (CVM). Guidance for Industry. Bioanalytical Method Validation. (2001).
Mittelstrass, K. et al. Discovery of sexual dimorphisms in metabolic and genetic biomarkers. PLoS Genet. 7, e1002215 (2011).
Article CAS PubMed PubMed Central Google Scholar
Goek, O. N. et al. Serum metabolite concentrations and decreased GFR in the general population. Am. J. Kidney Dis. 60, 197–206 (2012).
Article CAS PubMed Google Scholar
R: A language and environment for statistical computing R Foundation for Statistical Computing (2012).
Frazer, K. A. et al. A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851–861 (2007).
Article CAS ADS PubMed Google Scholar
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Article CAS PubMed PubMed Central Google Scholar
Mägi, R. & Morris, A. P. GWAMA: software for genome-wide association meta-analysis. BMC Bioinformatics 11, 288 (2010).
Article PubMed PubMed Central Google Scholar
Pe'er, I., Yelensky, R., Altshuler, D. & Daly, M. J. Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32, 381–385 (2008).
Article PubMed Google Scholar
Li, J. & Ji, L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity (Edinb) 95, 221–227 (2005).
Article CAS Google Scholar
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Article CAS PubMed PubMed Central Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
Article CAS PubMed PubMed Central Google Scholar
Cornelis, M. C. et al. Genome-wide meta-analysis identifies six novel loci associated with habitual coffee consumption. Mol. Psychiatry 20, 647–656 (2015).
Article CAS PubMed Google Scholar
Peden, J. F. & Farrall, M. Thirty-five common variants for coronary artery disease: the fruits of much collaborative labour. Hum. Mol. Genet 20, R198–R205 (2011).
Article CAS PubMed PubMed Central Google Scholar
Deloukas, P. et al. Large-scale association analysis identifies new risk loci for coronary artery disease. Nat. Genet. 45, 25–33 (2013).
Article CAS PubMed Google Scholar
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
Article CAS PubMed PubMed Central Google Scholar
Van Rossum, G. Python Tutorial Centrum voor Wiskunde en Informatica (CWI) (1995).
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009).
Article CAS ADS PubMed PubMed Central Google Scholar
Kamburov, A. et al. ConsensusPathDB: toward a more complete picture of cell biology. Nucleic Acids Res. 39, D712–D717 (2011).
Article CAS PubMed Google Scholar
Magrane, M. & Uniprot Consortium. UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford) 2011, bar009 (2011).
Article Google Scholar
McKusick, V. Mendelian Inheritance in Man; A Catalog of Human Genes and Genetic Disorders Johns Hopkins Univ. (1998).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Saier, M. H., Tran, C. V. & Barabote, R. D. TCDB: the Transporter Classification Database for membrane transport protein analyses and information. Nucleic Acids Res. 34, D181–D186 (2006).
Article CAS PubMed Google Scholar
Gasteiger, E. et al. ExPASy: the proteomics server for in-depth protein knowledge and analysis. Nucleic Acids Res. 31, 3784–3788 (2003).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M. & Goto, S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).
Article CAS PubMed PubMed Central Google Scholar
Purcell, S. M. et al. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460, 748–752 (2009).
CAS ADS PubMed Google Scholar
Demirkan, A. et al. Genetic risk profiles for depression and anxiety in adult and elderly cohorts. Mol. Psychiatry 16, 773–783 (2011).
Article CAS PubMed Google Scholar
Berndt, S. I. et al. Genome-wide meta-analysis identifies 11 new loci for anthropometric traits and provides insights into genetic architecture. Nat. Genet. 45, 501–512 (2013).
Article CAS PubMed PubMed Central Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank all participants. This research was supported by the European Community’s Seventh Framework Programme (FP7/2007–2013), ENGAGE Consortium, grant agreement HEALTH-F4-2007-201413. The Cooperative Health Research in the Region Augsburg (KORA) study was supported by the Helmholtz Zentrum München—German Research Center for Environmental Health, which is funded by the German Federal Ministry of Education and Research (BMBF) and by the State of Bavaria. KORA research was supported within the Munich Center of Health Sciences (MC-Health), Ludwig-Maximilians-Universität, as part of LMUinnovativ. This work was financed by the German National Genome Research Network (NGFN; NGFNPlus, project number 01GS0834). K. Suhre is supported by Biomedical Research Program funds at the Weill Cornell Medical College in Qatar, funded by the Qatar Foundation. The Estonian Genome Center of University of Tartu (EGCUT) received support from the Estonian Government IUT20–60, IUT24-6, Estonian Research Roadmap through the Estonian Ministry of Education and Research (3.2.0304.11-0312), Center of Excellence in Genomics (EXCEGEN), Development Fund from the University of Tartu (SP1GVARENG), an EFSD New Horizons grant and the US National Institute of Health (R01DK075787). We acknowledge EGCUT personnel, especially Mr V. Soo. Data analyses were carried out in part in the High Performance Computing Center of the University of Tartu. TwinsUK receives support from the National Institute for Health Research (NIHR) Clinical Research Facility at Guy’s & St Thomas’ NHS Foundation Trust and NIHR Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust and King’s College London. The Netherlands Twin Register (NTR) was funded by the Netherlands Organization for Scientific Research (NWO: MagW/ZonMW grants 904-61-090; 985-10-002; 904-61-193; 480-04-004; 400-05-717; Middelgroot-911-09-032; 56-464-14192); Center for Medical Systems Biology (CMSB); Netherlands Bioinformatics Centre/BioAssist/RK(2008.024); Biobanking and Biomolecular Resources Research Infrastructure (BBMRI)–NL (184.021.007) and the European Science Council (ERC-230374). H.H.M.D. is supported by an EMGO+ Fellowship. This work was carried out on the Dutch national e-infrastructure with the support from the SURF Foundation. The Erasmus Rucphen Family study (ERF) is grateful to Petra Veraart for her help in genealogy, Jeannette Vergeer for supervision of the laboratory work and Peter Snijders for help in data collection. ERF was supported by grants from NWO, Erasmus MC and CMSB. For the Queensland Institute of Medical Research (QIMR) cohort, serum metabolite level assessment was co-funded by the European Community’s Seventh Framework Programme (FP7/2007–2013), ENGAGE project, grant agreement HEALTH-F4-2007-201413 and NHMRC-European Union Collaborative Research Grant 496739.

Author information

Harmen H. M. Draisma and René Pool: These authors contributed equally to this work
Cornelia M. van Duijn and Dorret I. Boomsma: These authors jointly supervised this work

Authors and Affiliations

Department of Biological Psychology, VU University Amsterdam, van der Boechorststraat 1, Amsterdam, 1081 BT, The Netherlands
Harmen H. M. Draisma, René Pool, Eco J. C. de Geus, Jouke Jan Hottenga, Gonneke Willemsen & Dorret I. Boomsma
The EMGO+ Institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, Amsterdam 1081 BT, The Netherlands
Harmen H. M. Draisma, René Pool, Eco J. C. de Geus, Jouke Jan Hottenga, Gonneke Willemsen, Brenda W. Penninx & Dorret I. Boomsma
Neuroscience Campus Amsterdam, De Boelelaan 1085, Amsterdam 1081 HV, The Netherlands
Harmen H. M. Draisma
BBMRI-NL: Infrastructure for the Application of Metabolomics Technology in Epidemiology (RP4), S4-P, Postbus 9600, Leiden 2300 RC, The Netherlands
René Pool, Anika A. M. Vaarhorst, P. Eline Slagboom, Cornelia M. van Duijn & Dorret I. Boomsma
Institute of Genetic Epidemiology, Helmholtz Zentrum München—German Research Center for Environmental Health, Ingolstädter Landstraße 1, Neuherberg, 85764, Germany
Michael Kobl, Ann-Kristin Petersen, Konstantin Strauch & Christian Gieger
Department of Psychiatry, VU University Medical Center, Neuroscience Campus Amsterdam, VUmc, A.J. Ernststraat 1187, Amsterdam, 1081 HL, The Netherlands
Rick Jansen
Department of Molecular Epidemiology, Leiden University Medical Center, PO Box 9600, Leiden, 2300 RC, The Netherlands
Anika A. M. Vaarhorst, Marian Beekman & P. Eline Slagboom
Netherlands Consortium for Healthy Aging, Leiden University Medical Center, Leiden, The Netherlands
Anika A. M. Vaarhorst & Cornelia M. van Duijn
Department of Twin Research and Genetic Epidemiology, King's College London, Westminster Bridge Road, London, SE1 7EH, UK
Idil Yet, Tim D. Spector, Jordana T. Bell & Massimo Mangino
Estonian Genome Center, University of Tartu, 23b Riia Street, Tartu, 51010, Estonia
Toomas Haller, Tõnu Esko & Andres Metspalu
Department of Human Genetics, Leiden University Medical Center, S4-P, PO Box 9600, Leiden, 2300 RC, The Netherlands
Ayşe Demirkan, Jan Bert van Klinken, Harish Dharuri, Ko Willems van Dijk, Peter A. C. ‘t Hoen & Gert-Jan B. van Ommen
Department of Epidemiology, Genetic Epidemiology Unit, Erasmus Medical Center, P.O. Box 2040, Rotterdam, 3000 CA, The Netherlands
Ayşe Demirkan, Elisabeth M. van Leeuwen, Najaf Amin, Aaron Isaacs & Cornelia M. van Duijn
Divisions of Endocrinology and Genetics and Center for Basic and Translational Obesity Research, Boston Children's Hospital, 300 Longwood Ave, Boston MA02115, Massachusetts, USA
Tõnu Esko
Medical and Population Genetics Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts 2142, USA
Tõnu Esko
Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, NRB 0330, Boston MA 02115, Massachusetts, USA
Tõnu Esko
Department of Genetics and Computational Biology, QIMR Berghofer Medical Research Institute, 300 Herston Road, Brisbane, 4006, Australia
Gu Zhu, Anjali K. Henders, Grant W. Montgomery, Dale R. Nyholt, John B. Whitfield & Nicholas G. Martin
Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, PO Box 9600, Leiden, 2300 RC, The Netherlands
Stefan Böhringer
Institute of Bioinformatics and Systems Biology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, Neuherberg, 85764, Germany
Werner Römisch-Margl & Karsten Suhre
Institute of Experimental Genetics, Genome Analysis Center, Helmholtz Zentrum München, Ingolstädter Landstraße1, Neuherberg, 85764, Germany
Cornelia Prehn & Jerzy Adamski
German Center for Diabetes Research at Helmholtz Zentrum München, Ingolstädter Landstr. 1, Neuherberg, 85764, Germany
Jerzy Adamski
Lehrstuhl für Experimentelle Genetik, Technische Universität München, Freising-Weihenstephan, 85350, Germany
Jerzy Adamski
Department of Gerontology and Geriatrics, Leiden University Medical Center, PO Box 9600, Leiden, 2300 RC, The Netherlands
Anton J. M. de Craen
Department of Genetics, CB50, University Medical Center Groningen, University of Groningen, P.O. Box 30001, Groningen 9700 RB, The Netherlands
Harm-Jan Westra & Lude Franke
Statistical and Genomic Epidemiology, Institute of Health and Biomedical Innovation, Queensland University of Technology, 60 Musk Ave, Kelvin Grove QLD 4059, Queensland, Australia
Dale R. Nyholt
Department of Psychiatry, VU University Medical Center, A.J. Ernststraat 1187, Amsterdam, 1081 HL, The Netherlands
Brenda W. Penninx
Department of Endocrinology, Leiden University Medical Center, S4-P, PO Box 9600, Leiden, 2300 RC, The Netherlands
Ko Willems van Dijk
Institute of Medical Informatics, Biometry and Epidemiology, Chair of Genetic Epidemiology, Ludwig-Maximilians-Universität, Munich, 81377, Germany
Konstantin Strauch
Research Unit of Molecular Epidemiology, Helmholtz Zentrum München, Ingolstädter Landstraße 1, Neuherberg, 85764, Germany
Thomas Illig & Christian Gieger
Hannover Unified Biobank, Hannover Medical School, Hannover, 30625, Germany
Thomas Illig
Institute for Human Genetics, Hannover Medical School, Carl-Neuberg-Strasse 1, Hannover, 30625, Germany
Thomas Illig
Faculty of Biology, Ludwig-Maximilians-Universität, Planegg-Martinsried, 82152, Germany
Karsten Suhre
Department of Physiology and Biophysics, Weill Cornell Medical College in Qatar (WCMC-Q), PO Box 24144, Education City—Qatar Foundation, Doha, Qatar
Karsten Suhre
Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
Mark I. McCarthy
Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Churchill Hospital, Headington OX3 7LE, UK
Mark I. McCarthy
Oxford National Institute for Health Research Biomedical Research Centre, The Joint Research Office, Block 60, Churchill Hospital, Old Road, Headington OX3 7LE, UK
Mark I. McCarthy
Institute of Epidemiology II, Helmholtz Zentrum München—German Research Center for Environmental Health, Ingolstädter Landstraße 1, Neuherberg, 85764, Germany
Christian Gieger

Authors

Harmen H. M. Draisma
View author publications
You can also search for this author in PubMed Google Scholar
René Pool
View author publications
You can also search for this author in PubMed Google Scholar
Michael Kobl
View author publications
You can also search for this author in PubMed Google Scholar
Rick Jansen
View author publications
You can also search for this author in PubMed Google Scholar
Ann-Kristin Petersen
View author publications
You can also search for this author in PubMed Google Scholar
Anika A. M. Vaarhorst
View author publications
You can also search for this author in PubMed Google Scholar
Idil Yet
View author publications
You can also search for this author in PubMed Google Scholar
Toomas Haller
View author publications
You can also search for this author in PubMed Google Scholar
Ayşe Demirkan
View author publications
You can also search for this author in PubMed Google Scholar
Tõnu Esko
View author publications
You can also search for this author in PubMed Google Scholar
Gu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Böhringer
View author publications
You can also search for this author in PubMed Google Scholar
Marian Beekman
View author publications
You can also search for this author in PubMed Google Scholar
Jan Bert van Klinken
View author publications
You can also search for this author in PubMed Google Scholar
Werner Römisch-Margl
View author publications
You can also search for this author in PubMed Google Scholar
Cornelia Prehn
View author publications
You can also search for this author in PubMed Google Scholar
Jerzy Adamski
View author publications
You can also search for this author in PubMed Google Scholar
Anton J. M. de Craen
View author publications
You can also search for this author in PubMed Google Scholar
Elisabeth M. van Leeuwen
View author publications
You can also search for this author in PubMed Google Scholar
Najaf Amin
View author publications
You can also search for this author in PubMed Google Scholar
Harish Dharuri
View author publications
You can also search for this author in PubMed Google Scholar
Harm-Jan Westra
View author publications
You can also search for this author in PubMed Google Scholar
Lude Franke
View author publications
You can also search for this author in PubMed Google Scholar
Eco J. C. de Geus
View author publications
You can also search for this author in PubMed Google Scholar
Jouke Jan Hottenga
View author publications
You can also search for this author in PubMed Google Scholar
Gonneke Willemsen
View author publications
You can also search for this author in PubMed Google Scholar
Anjali K. Henders
View author publications
You can also search for this author in PubMed Google Scholar
Grant W. Montgomery
View author publications
You can also search for this author in PubMed Google Scholar
Dale R. Nyholt
View author publications
You can also search for this author in PubMed Google Scholar
John B. Whitfield
View author publications
You can also search for this author in PubMed Google Scholar
Brenda W. Penninx
View author publications
You can also search for this author in PubMed Google Scholar
Tim D. Spector
View author publications
You can also search for this author in PubMed Google Scholar
Andres Metspalu
View author publications
You can also search for this author in PubMed Google Scholar
P. Eline Slagboom
View author publications
You can also search for this author in PubMed Google Scholar
Ko Willems van Dijk
View author publications
You can also search for this author in PubMed Google Scholar
Peter A. C. ‘t Hoen
View author publications
You can also search for this author in PubMed Google Scholar
Konstantin Strauch
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas G. Martin
View author publications
You can also search for this author in PubMed Google Scholar
Gert-Jan B. van Ommen
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Illig
View author publications
You can also search for this author in PubMed Google Scholar
Jordana T. Bell
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Mangino
View author publications
You can also search for this author in PubMed Google Scholar
Karsten Suhre
View author publications
You can also search for this author in PubMed Google Scholar
Mark I. McCarthy
View author publications
You can also search for this author in PubMed Google Scholar
Christian Gieger
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Isaacs
View author publications
You can also search for this author in PubMed Google Scholar
Cornelia M. van Duijn
View author publications
You can also search for this author in PubMed Google Scholar
Dorret I. Boomsma
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jointly supervised the research: A.I., J.T.B., M.M., B.W.P., K.W.v.D., P.A.C.H., K. Strauch, G.J.B.v.O., T.I., J.T.B., M.M., M.I.M., C.G., A.I. and D.I.B. Contributed reagents/materials/analysis tools: N.A., A.J.M.C., P.E.S., H.-J.W., L.F., J.J.H., G.W. and D.R.N. Conceived and designed the experiments: C.M.v.D., C.G. and D.I.B. Performed statistical analysis: E.M.v.L., S.B., H.H.M.D., R.P., M.K., R.J., A.-K.P. and J.B.v.K. Performing metabolomics measurements (sample preparation, mass spectrometric measurements, quality assurance): C.P. and J.A. Measurements and data quality control: W.R.-M. Wrote the paper: A.D., C.M.v.D., H.H.M.D., R.P., R.J., A.-K.P., J.B.v.K., E.J.C.G., D.R.N., K.W.v.D., P.A.C.H., K.Suhre, C.M.v.D. and D.I.B. Principal investigator: T.D.S., A.M., P.E.S., K.W.v.D., P.A.C.H., K.Strauch, N.G.M., G.J.B.v.O., T.I., C.M.v.D. and D.I.B. Analysed data: I.Y., A.A.M.V., M.B., H.H.M.D., R.P., M.K., R.J., A.-K.P., T.H., G.Z., A.D., T.E., J.B.v.K., H.D. and K.Suhre. Genotyping: T.E. Directed wet lab: G.W.M. Directed biochemistry: J.B.W. Managed samples: A.K.H.

Corresponding author

Correspondence to Harmen H. M. Draisma.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Figures, Tables, Notes, Methods and References

Supplementary Figures 1-6, Supplementary Tables 1-6, Supplementary Note 1-2, Supplementary Methods and Supplementary References (PDF 18656 kb)

Supplementary Movie 1

Meta-analysis associations over all metabolites, filtered on P value < 1.09 ✕ 10-9. (MOV 11747 kb)

Supplementary Data 1

Exclusion reasons per cohort for each metabolite. (XLSX 15 kb)

Supplementary Data 2

All associations between metabolites and independent single nucleotide polymorphisms for which P value < 1.09 ✕ 10-9 in discovery stage meta-analysis (XLSX 24 kb)

Supplementary Data 3

Description of the 31 loci identified in the current study from 59 independent single nucleotide polymorphisms. (XLSX 12 kb)

Supplementary Data 4

For NTR-NESDA sample, results of cis expression quantitative trait locus (cis eQTL) analysis for lead metabolomic single nucleotide polymorphisms (SNPs) showing overlap with cis eQTL SNPs. (XLSX 9 kb)

Supplementary Data 5

For Fehrmann-EGCUT sample, results of cis expression quantitative trait locus (cis eQTL) analysis for lead metabolomic single nucleotide polymorphisms (SNPs) showing overlap with cis eQTL SNPs (XLSX 13 kb)

Supplementary Data 6

Significant (false discovery rate < 0.05) trans expression quantitative trait locus effects in Fehrmann-EGCUT sample for lead metabolomic single nucleotide polymorphisms (XLSX 7 kb)

Supplementary Data 7

Loci overlapping cis expression quantitative trait loci in whole blood and liver (XLSX 11 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Draisma, H., Pool, R., Kobl, M. et al. Genome-wide association study identifies novel genetic variants contributing to variation in blood metabolite levels. Nat Commun 6, 7208 (2015). https://doi.org/10.1038/ncomms8208

Download citation

Received: 15 August 2014
Accepted: 20 April 2015
Published: 12 June 2015
DOI: https://doi.org/10.1038/ncomms8208

This article is cited by

Identification of blood metabolites associated with risk of Alzheimer’s disease by integrating genomics and metabolomics data
- Shuai Liu
- Hua Zhong
- Lang Wu
Molecular Psychiatry (2024)
Genetically predicted plasma levels of amino acids and metabolic dysfunction-associated fatty liver disease risk: a Mendelian randomization study
- Jian Zhao
- Jing Zeng
- Jian-Gao Fan
BMC Medicine (2023)
The use of class imbalanced learning methods on ULSAM data to predict the case–control status in genome-wide association studies
- R. Onur Öztornaci
- Hamzah Syed
- Bahar Taşdelen
Journal of Big Data (2023)
Uncovering the complex genetic architecture of human plasma lipidome using machine learning methods
- Miikael Lehtimäki
- Binisha H. Mishra
- Pashupati P. Mishra
Scientific Reports (2023)
Genome-wide association analysis of plasma lipidome identifies 495 genetic associations
- Linda Ottensmann
- Rubina Tabassum
- Matti Pirinen
Nature Communications (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Discovery meta-analysis of GWA scans

Replication analysis

Integration with gene expression analysis results

Variance explained

Discussion

Methods

Participants

Biocrates metabolite quantification

Association analyses and meta-analyses

Definition of loci and secondary signals analysis

Identification of new SNP–metabolite associations

Replication analyses

Lookup of association with cardiovascular disease

Association and Manhattan plots

SNP annotation

Variance explained

PGS analysis

EQTL analyses

Meta-analysis

Additional information

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links