Genetic studies of paired metabolomes reveal enzymatic and transport processes at the interface of plasma and urine

Schlosser, Pascal; Scherer, Nora; Grundner-Culemann, Franziska; Monteiro-Martins, Sara; Haug, Stefan; Steinbrenner, Inga; Uluvar, Burulça; Wuttke, Matthias; Cheng, Yurong; Ekici, Arif B.; Gyimesi, Gergely; Karoly, Edward D.; Kotsis, Fruzsina; Mielke, Johanna; Gomez, Maria F.; Yu, Bing; Grams, Morgan E.; Coresh, Josef; Boerwinkle, Eric; Köttgen, Michael; Kronenberg, Florian; Meiselbach, Heike; Mohney, Robert P.; Akilesh, Shreeram; Schmidts, Miriam; Hediger, Matthias A.; Schultheiss, Ulla T.; Eckardt, Kai-Uwe; Oefner, Peter J.; Sekula, Peggy; Li, Yong; Köttgen, Anna

doi:10.1038/s41588-023-01409-8

Download PDF

Article
Open access
Published: 05 June 2023

Genetic studies of paired metabolomes reveal enzymatic and transport processes at the interface of plasma and urine

Nature Genetics volume 55, pages 995–1008 (2023)Cite this article

11k Accesses
3 Citations
100 Altmetric
Metrics details

Subjects

Abstract

The kidneys operate at the interface of plasma and urine by clearing molecular waste products while retaining valuable solutes. Genetic studies of paired plasma and urine metabolomes may identify underlying processes. We conducted genome-wide studies of 1,916 plasma and urine metabolites and detected 1,299 significant associations. Associations with 40% of implicated metabolites would have been missed by studying plasma alone. We detected urine-specific findings that provide information about metabolite reabsorption in the kidney, such as aquaporin (AQP)-7-mediated glycerol transport, and different metabolomic footprints of kidney-expressed proteins in plasma and urine that are consistent with their localization and function, including the transporters NaDC3 (SLC13A3) and ASBT (SLC10A2). Shared genetic determinants of 7,073 metabolite–disease combinations represent a resource to better understand metabolic diseases and revealed connections of dipeptidase 1 with circulating digestive enzymes and with hypertension. Extending genetic studies of the metabolome beyond plasma yields unique insights into processes at the interface of body compartments.

Genome-wide association studies

Article 26 August 2021

Emil Uffelmann, Qin Qin Huang, … Danielle Posthuma

Utility of polygenic scores across diverse diseases in a hospital cohort for predictive modeling

Article Open access 12 April 2024

Ting-Hsuan Sun, Chia-Chun Wang, … Kai-Cheng Hsu

A lipidome landscape of aging in mice

Article 12 April 2024

Hiroshi Tsugawa, Tomoaki Ishihara, … Makoto Arita

Main

The human kidney clears small molecular waste products from plasma while retaining valuable solutes such as amino acids to maintain metabolic homeostasis. After glomerular filtration of plasma to primary urine ultrafiltrate, its composition is modified in a highly coordinated process along the nephron. Hundreds of highly specialized transport proteins move solutes across the membranes of the cells lining the nephron to reabsorb important molecules while actively excreting toxic or unnecessary ones¹. Many of these transport proteins as well as the enzymes responsible for generating or breaking down the transported metabolites have been identified through the study of human monogenic diseases. They represent attractive drug targets not only to treat kidney diseases but also metabolic diseases, as exemplified by inhibitors of the transporters SGLT2 and URAT1 (refs. ^2,3). However, many transporters and enzymes, as well as their substrates and products in vivo remain to be characterized. We hypothesized that linking information from human genetic studies to plasma and urine metabolomes would provide new insights into the roles of these proteins in health and disease.

Genetic effects on metabolite levels in urine can reflect systemic processes such as genotype-dependent intestinal metabolite uptake or hepatic transformation reactions that are detected in urine because of the respective metabolites’ filtration from plasma. They can also reflect kidney-specific processes, for example, the active production, reuptake or secretion of small molecules by the cells lining the nephron. Studies with paired plasma and urine metabolite measurements have the potential to distinguish between these processes.

Here, we study differences and similarities regarding genetic influences on metabolomes derived from two ‘matrices’, plasma and urine, to test the hypothesis that both provide complementary information. Through systematic integration of genome-wide genetic information with paired plasma and urine metabolite measurements from 5,023 participants in the German Chronic Kidney Disease (GCKD) study, we highlight underlying systemic as well as kidney-specific processes. We detect 1,299 genome-wide significant associations and show that studying plasma alone would have missed associations with almost 40% of metabolites. We highlight examples of urine-specific associations, of footprints that kidney-expressed transporters leave in plasma and urine metabolomes and of previously undescribed systemic roles of a kidney-enriched enzyme. This study generates a rich resource for future experimental validation of yet uncharacterized enzymatic and transport processes that may represent a molecular link between genetic variants and human traits and diseases.

Results

We performed genome-wide screens for genetic variants significantly associated with levels of 1,296 plasma and 1,399 urine metabolites (779 overlapping metabolites; Fig. 1). Metabolites were quantified by non-targeted mass spectrometry⁴ in plasma and urine specimens from 5,023 participants in the GCKD study (Methods and Supplementary Tables 1 and 2).

**Fig. 1: Overview of the study design.**

mGWAS identify 1,299 signals for 760 metabolites

Genome-wide association studies (GWAS) of plasma metabolite levels (mGWAS) yielded 677 regions that contained at least one significantly associated SNP (P value < 3.9 × 10⁻¹¹; Fig. 1 and Supplementary Table 3). For each metabolite and region, the SNP with the lowest association P value was chosen as the index SNP, termed metabolite quantitative trait locus (mQTL; regional association plots in Supplementary Data 1). While we have previously shown that genetic effects on the urine metabolome are of comparable magnitude in persons with and without reduced kidney function⁵, we now used data from the independent, population-based Atherosclerosis Risk in Communities (ARIC) study (Supplementary Table 1 and Supplementary Methods) to show that this also holds true for plasma mQTLs, as detailed in the Supplementary Results, Supplementary Table 4 and Extended Data Fig. 1.

We next compared our findings with those from seven large genetic studies of the plasma or serum metabolome^{6,7,8,9,10,11,12} (Methods). We observed excellent correlations of genetic effects and high validation rates of published mQTLs with results from our study (Supplementary Table 5) and conversely of our plasma mQTLs with the results of the published studies (Supplementary Table 6 and Supplementary Fig. 1; details in the Supplementary Results). Not surprisingly, the majority (92.6%) of plasma mQTLs from our study were already reported in at least one of these up-to-17-fold-larger studies. There were, however, 50 mQTLs not reported as significant in any of these published studies, with 20 of them arising from previously unreported metabolites. Together, these comparisons underscored the validity and generalizability of our findings in plasma.

Across results from 1,399 GWAS of urine metabolite levels, we identified 622 mQTLs (P value < 3.6 × 10⁻¹¹; Fig. 1, Supplementary Table 3 and Supplementary Data 2). In comparison to our previous study of the urine metabolome⁵, 64% of the now detected mQTLs (399 of 622) were not reported before, and the number of unique metabolites with at least one urine mQTL more than doubled. Investigation of the detected urine mQTL in the seven mGWAS of the circulating metabolome^{6,7,8,9,10,11,12} underscored the additional discovery potential of urine: 56.6% (352 of 622) of urine mQTL were not significant in any of these studies, with 212 of these mQTL arising from urine metabolites not reported in the plasma or serum metabolomes (Supplementary Table 6). Comparisons of the urine mQTLs to their associations with levels of the respective circulating metabolites from both the GCKD as well as the seven published mGWAS and vice versa are detailed in the Supplementary Results, Supplementary Fig. 2 and Supplementary Table 5 and contain interesting examples of inversely correlated genetic effects that are consistent with the localization of the encoded proteins and enzymes at the apical membrane of kidney tubular epithelial cells.

Across both matrices, we identified 1,299 mQTLs from the results from 2,697 GWAS (Supplementary Table 3 and Fig. 2), 37 of which showed interaction with sex (P_interaction < 3.8 × 10⁻⁵; Supplementary Table 7) and are summarized in the Supplementary Results. Statistical fine mapping enabled prioritization of the most likely causal variants at each mQTL (Methods). Of 1,509 independent signals (Supplementary Table 8), 396 (26%) were fine mapped to credible sets of two to five SNPs, and 192 (13%) were mapped even to a single SNP, including 53 missense, one splice and one stop-lost variant (Supplementary Table 9). Smaller credible set size was significantly associated with lower minor allele frequency (MAF) of the independent index SNPs (P = 2.3 × 10⁻¹³) but not with the number of associated metabolites (P > 0.8). In summary, discovery GWAS of the plasma and urine metabolomes identified a wealth of significantly associated loci, the basis for subsequent characterizations.

**Fig. 2: Circular presentation of the 1,299 identified genetic associations with metabolite levels in plasma and urine.**

Differences in plasma and urine mQTL

The 1,299 mQTLs arose from 760 unique metabolites, of which 301 (40%) only showed genetic associations with their levels in plasma, 275 (36%) only showed associations with their levels in urine and 184 (24%) showed associations with their levels in both matrices (Supplementary Table 3). Estimated genome-wide heritability was similar for most matched urine and plasma metabolites (Extended Data Fig. 2). There were 41% (213 of 517) plasma-specific, 30% (183 of 620) urine-specific and 47% (364 of 779) shared metabolites with an mQTL (Fig. 3a). Among the 364 shared metabolites with an mQTL, 49% (180) exclusively showed a significant genetic association in plasma (88 metabolites) or in urine (92 metabolites; Fig. 3a).

**Fig. 3: Comparative analyses of mQTLs from plasma and urine.**

Whereas plasma mQTLs more likely arose from lipid superpathway metabolites than urine mQTLs (301 versus 97 metabolites), consistent with the lack of glomerular filtration of many lipids, urine mQTLs were more likely connected to nucleotide, peptide or unnamed metabolites (Fig. 3b). The power to detect significant associations for almost all metabolite superpathways was similar for plasma and urine (Extended Data Fig. 3). The variance in metabolite levels explained by plasma mQTLs ranged from 0.18% to 50.9% (median 1.3%) and by urine mQTLs ranged from 0.55% to 61.4% (median 2.0%; Supplementary Table 3).

Plasma and urine mQTLs highlight distinct major genes

Pairwise colocalization testing between metabolite association signals at the same locus to detect shared genetic associations likely to arise from the same underlying causal variant identified 10,596 positive colocalizations (posterior probability for a shared causal variant (PP H₄) > 0.8; Methods) involving 1,162 mQTLs. Colocalizing associations were divided into four groups (Supplementary Table 10): those where the same genetic signal affected different metabolites in the same matrix ((1) ‘intraplasma’, n = 3,189; (2) ‘intraurine’, n = 3,155), the same metabolite in both plasma and urine ((3) ‘intermatrix, same metabolite’, n = 204) and different metabolites in plasma and urine ((4) ‘intermatrix, different metabolite’, n = 4,048).

We next asked whether there were central genes shaping the matrix-specific metabolome by assessing major differences between the genes underlying positive intramatrix colocalizations in plasma and urine. Combination of multiple complementary sources of evidence at each mQTL enabled prioritization of 282 most likely underlying genes^5,13,14 (Methods and Supplementary Table 11), of which the majority encode enzymes (n = 211, 75%), followed by transport proteins^15,16 (Fig. 2 and Supplementary Table 12). Whereas FADS1 and SLCO1B1 accounted for nearly half of the 3,189 intraplasma colocalizations, NAT8 and the solute carrier (SLC)17A genes (mostly SLC17A1) made up >50% of the 3,155 intraurine colocalizations (Fig. 3c). This is consistent with FADS1 encoding a central enzyme in polyunsaturated fatty acid metabolism¹⁷ and the predominance of these lipid metabolites in plasma and with NAT8 encoding an N-acetyltransferase highly expressed in the kidney that generates water-soluble molecules for excretion¹⁸ and the abundance of N-acetylated metabolites in urine. Similarly, the organic anion transporter encoded by SLCO1B1 and the solute transporters encoded by the SLC17A family show high and specific expression in liver and kidney, respectively, where they transport dozens of physiological and pharmacological substrates^19,20.

The direction and strength of association of almost all 204 ‘intermatrix, same metabolite’ mQTLs was nearly identical in plasma and urine (Extended Data Fig. 4), consistent with these metabolites’ filtration from plasma to urine. Observed differences in explained metabolite variance as well as effect direction are detailed in the Supplementary Results.

mQTL share genetic associations with biomarkers and diseases

Pairwise colocalization analysis of mQTL summary statistics with those of 2,942 unique clinical biomarkers and diseases from the UK Biobank (Methods) identified 7,073 positive colocalizations (Supplementary Table 13). The corresponding metabolites may represent a molecular link between genetic variants and clinical endpoints, as detailed for genetic variants at the CYP3A7 locus that colocalized with plasma androsterone sulfate levels and hypertension in the Supplementary Results.

With respect to kidney diseases, evidence for a shared genetic signal was detected between metabolite associations at GSTM1 and kidney cancer²¹; FMO4 and hypertensive chronic kidney disease (CKD); ALPL, CYP2D6 and SLC34A1 as well as ABCG2 and kidney stones; and ABCC4 and urine retention.

Many colocalizations were detected with continuous markers of kidney (14.8%) and liver function (7.7%). Creatinine-based estimated glomerular filtration rate (eGFR) (eGFRcrea, 127 colocalizations) and alanine aminotransferase (ALT, 86 colocalizations) as exemplary kidney and liver function markers often colocalized with metabolite levels in the expected matrix (Fig. 4a and Extended Data Fig. 5): for example, loci containing lipid metabolism-related genes such as FADS1 or LIPC showed evidence of shared genetic architecture between plasma lipid metabolite levels and liver but not kidney function. Likewise, several loci encoding transporters with important roles in the kidney such as SLC34A1 or SLC7A9 exhibited shared genetic architecture between urine levels of associated metabolites and kidney function. At the majority of loci, however, several mQTLs colocalized with kidney or liver function markers, some of which were detected from urine and some from plasma. These observations further emphasize the value of studying paired matrices. Metabolites most strongly connected to kidney function by correlation analyses and genetic evidence as well as Mendelian randomization studies are summarized in the Supplementary Results, Supplementary Table 14 and Extended Data Fig. 6.

**Fig. 4: Association of mQTLs and implicated genes with clinical biomarkers, diseases and phenotypes in genetically manipulated mice.**

We also searched for gene-level associations of the 282 prioritized genes (Supplementary Table 15) and for variant-level associations of the identified mQTL (Supplementary Table 16) with several thousands of phenotypes based on whole-exome-sequencing data from ~450,000 UK Biobank participants²² (Methods). At the gene level, putative damaging rare variants in 28 genes were associated with at least one of 437 phenotypes at P < 2 × 10⁻⁹ (Supplementary Table 15 and the Methods). We observed both trait- or risk-increasing and -decreasing associations upon the genes’ assumed loss of function (Fig. 4b), highlighting opportunities in which therapeutic target inhibition confers protection as exemplified by ANGPTL3 and dyslipidemia, for which new drugs have recently gained approval²³. While in this example plasma lipid levels can serve as the required intermediate biomarker for the clinical development pipeline, such biomarkers may be elusive for other potential targets and contained among results of this study. At the variant level, 14 coding variants were associated with genitourinary traits at P < 1 × 10⁻⁵ (Supplementary Table 16), including experimentally confirmed positive controls such as p.Gln141Lys in the transporter ABCG2 and serum urate²⁴.

Tissue, pathway and murine phenotype enrichment

Over-representation analyses of the 282 prioritized genes revealed a large number of significant gene ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways (Supplementary Table 17), human tissues and cell types (Supplementary Tables 18 and 19), especially in kidney and liver, as well as metabolic homeostasis-related phenotypes in genetically manipulated mice (Supplementary Table 20 and Fig. 4c). The Supplementary Results contain details, including a focus on matrix-specific mQTLs (Extended Data Fig. 7).

mQTLs from urine-specific metabolites: the FUT2 locus

mQTLs arising from the 187 urine-specific metabolites may highlight kidney-specific processes or systemic processes only detected in urine (Fig. 5a). Regarding kidney-specific processes, there were multiple examples of associations between variants in genes encoding transporters at the apical membrane of tubular cells with the levels of metabolites that they reabsorb from the urine ultrafiltrate (Supplementary Results). An example in which urine-specific metabolites serve as a readout of systemic processes were two mQTLs for galactosylglycerol and 1,6-anhydroglucose at FUT2. Both index SNPs are in high linkage disequilibrium (LD; r² > 0.8) with rs601338, at which the minor A allele encodes the stop-gain variant p.Trp154Ter (NP_000502.4) that was associated with higher levels of only these two urine metabolites. The encoded fucosyltransferase 2 is a ubiquitously expressed enzyme that mediates the inclusion of fucose into glycans on a variety of glycolipids and glycoproteins. Individuals homozygous for p.Trp154Ter have lower risk of several infectious diseases during childhood^25,26, a selective advantage. Indeed, we detected positive selection at this and other loci, including positive controls such as the LCT locus (Methods and Supplementary Table 21). The extended homozygosity of the haplotype carrying the minor, derived allele at the galactosylglycerol mQTL further supported positive selection (Fig. 5b).

**Fig. 5: Urine-specific mQTLs deliver insights into systemic and kidney-specific processes.**

Exploration of phenome-wide associations for fucosyltransferase 2 (FUT2) p.Trp154Ter in the UK Biobank (Methods) showed significant associations with dyslipidemia, hypertension and cholelithiasis (Fig. 5c). Colocalization confirmed a shared genetic basis of the two mQTL with these diseases, as well as of several plasma proteins (Supplementary Tables 13 and 22). These observations suggest that higher urine levels of galactosylglycerol and 1,6-anhydroglucose could reflect increased risk for FUT2 genotype-related cardiometabolic diseases of adult onset, motivating future studies.

Urine-specific mQTLs from shared metabolites: the AQP7 locus

An interesting example of a urine-specific mQTLs arising from a matrix-shared metabolite (Fig. 5d) was detected at AQP7 with urine glycerol levels. The signal (Fig. 5e, P_urine = 9.93 × 10⁻⁵⁸; P_plasma = 0.53) was fine mapped to rs62542743, encoding the missense variant AQP-7 p.Gly264Val (NP_001161.1). The channel AQP-7 mainly transports water and glycerol and, in the rat and mouse kidney, localizes to proximal straight tubules²⁷. Aqp7-knockout mice show glycerol loss in urine, supporting a role in glycerol reabsorption²⁸. The minor A allele (p.264Val) was associated with higher urine glycerol levels, which is in agreement with the knockout mouse findings when assuming loss of function (Fig. 5f). The mutant valine carries two more methyl residues than wild-type glycine, which may decrease the channels’ passing ability for glycerol. Moreover, a previous case report exists of three children homozygous for p.264Val who presented with normoglycerolemic hyperglyceroluria²⁹. Indeed, we confirmed a recessive effect, with persons homozygous for the A allele showing >64-fold higher urine but not plasma glycerol levels (Fig. 5g), thereby confirming a single case report through evidence from population studies.

Different matrices implicate distinct variants at SLC10A2

At SLC10A2, fine mapping revealed a single, yet different, underlying variant for the plasma mQTL of the secondary bile acid glycodeoxycholate 3-sulfate and the urine mQTL of the primary bile acid glycocholate (Supplementary Table 8). SLC10A2 encodes the primary transporter for bile acid uptake in the distal ileum³⁰, ASBT, which is also responsible for bile acid reabsorption in the proximal kidney tubules^31,32. The fingerprints of the plasma index SNP rs55971546 and the urine index SNP rs16961281 on all quantified bile acid metabolites differed markedly (Fig. 6a,b): the minor T allele at rs55971546, encoding p.Val98Ile, was associated with higher plasma levels of several sulfated bile acids, which was propagated to urine (inner two bands). The ASBT 98Ile protein has experimentally been shown to result in a partial loss of function in one but not another in vitro system^33,34, and the corresponding T allele was associated with higher risk of gallstone disease³⁵.

**Fig. 6: Plasma and urine implicate distinct causal variants and bile acid metabolites in the *SLC10A2* locus.**

On the other hand, the minor A allele at the urine index SNP rs16961281 showed a urine-specific fingerprint mainly as lower levels of glycocholate but also of cholate, glycoursodeoxycholate and glycodeoxycholate. These metabolites are known substrates of ASBT³⁶, whereas sulfated bile acids are not³⁷. Thus, the urine bile acid profile may reveal genetic variants more directly related to ASBT function, whereas plasma levels may reflect secondary changes. The minor A allele at rs16961281 was significantly associated with higher SLC10A2 gene expression in publicly available kidney expression quantitative trait locus (eQTL) data (P = 8.7 × 10⁻³⁴)³⁸. Colocalization supported a shared variant underlying lower urine glycocholate levels and higher renal SLC10A2 expression (PP H₄ = 1; Fig. 6d).

We generated assay for transposase-accessible chromatin with sequencing (ATAC-seq) and RNA-seq data from manually dissected primary human kidney tissues to annotate prioritized variants (Methods) and found that rs16961281 mapped into highly accessible chromatin in the kidney cortex (Fig. 6c) and specifically in proximal tubular cells (Extended Data Fig. 8), supporting its regulatory function. Higher ASBT abundance should increase substrate reabsorption and result in lower urine levels, as observed (Fig. 6d). Colocalization of genetic associations with gallstone disease and urine glycocholate levels (concordant direction, PP H₄ = 1) as well as SLC10A2 expression (inverse direction, PP H₄ = 1) supported the idea that higher ASBT abundance results in lower risk of gallstone disease (Fig. 6d). The occurrence of gallstones as a potential adverse effect of the new class of SLC10A2 inhibitors, such as odevixibat to treat cholestasis or elobixibat to treat constipation (Supplementary Table 12), therefore deserves attention. In fact, data from emerging clinical trials describe increased rates of cholelithiasis in the treatment group (https://clinicaltrials.gov identifier NCT03566238).

Metabolome footprints of kidney-enriched proteins: SLC13A3

NaDC3, a kidney-enriched transport protein encoded by SLC13A3, also exemplifies a different metabolic ‘footprint’ in plasma and urine (Fig. 7a,b). We observed significant genetic associations with levels of plasma (P = 1.2 × 10⁻²³) and urine (P = 3.7 × 10⁻²⁵) methylsuccinoylcarnitine as well as plasma malate (P = 4.7 × 10⁻¹⁴) and fumarate (P = 1.1 × 10⁻¹¹; Supplementary Tables 3). Functional annotation using our ATAC-seq and RNA-seq data and publicly available single-nucleus ATAC-seq data from the human kidney (Methods) showed that only rs6124828 of eight fine-mapped SNPs (Supplementary Tables 8 and 9) mapped into highly accessible chromatin, specifically in the kidney cortex (Fig. 7c) and in proximal tubule cells³⁹ (Extended Data Fig. 9). A potential regulatory function of rs6124828 was supported by histone chromatin immunoprecipitation followed by sequencing (ChIP–seq)-based chromatin state prediction from primary human kidney tissue (Methods) that showed active enhancer function at the variant’s position. Screening of the presence of binding motifs of 517 kidney-expressed transcription factors at this position showed an intersection only with hepatocyte nuclear factor (HNF)1A and HNF1B. These master regulators of renal gene expression programs have been shown to bind at this position based on publicly available ChIP–seq data (Fig. 7c and the Methods). The minor A allele at rs6124828 was predicted to significantly reduce the binding probability of HNF1A and HNF1B (Fig. 7c), motivating an investigation of allele-specific binding in future ChIP datasets from primary human tissue.

**Fig. 7: Primary human kidney tissue permits prioritization of causal variants in kidney-enriched genes implicated by mQTLs.**

NaDC3 is an Na⁺–dicarboxylate transporter in the basolateral membrane of proximal tubule cells^40,41. It transports a variety of substrates when overexpressed in cellular assays, including tricarboxylic acid cycle intermediates such as α-ketoglutarate, (methyl)succinate, malate and fumarate, that are used for mitochondrial energy generation^42,43. Thus, genetic variants leading to lower expression of NaDC3 and hence lower intracellular substrate uptake, for example, via the presumed mechanism involving the regulatory A allele at rs6124828, are consistent with our observation of higher plasma levels of malate and fumarate and of lower levels of the resulting intracellular downstream metabolites such as methylsuccinoylcarnitine (Fig. 7d). The metabolomic signatures of SLC13A3 shed light on physiological functions of NaDC3 in humans and permit identification of a likely causal regulatory allele.

Systemic roles of dipeptidase 1: digestive enzymes and diseases

DPEP1 encodes dipeptidase 1 (DPEP1), an ectoenzyme localized on the apical membrane of tubular kidney cells. It has a role in dipeptide hydrolysis, including glutathione metabolism breakdown products such as cysteinyl-bis-glycine⁴⁴. DPEP1 has been well studied in the kidney because it metabolizes several β-lactam antibiotics⁴⁵. We detected several DPEP1 mQTLs for glutathione pathway metabolites as well as for dipeptides in urine. For example, the index SNP for urine prolylglycine (P = 9.0 × 10⁻³⁶⁹) explained 25% of its variance. Although the high renal expression of DPEP1 and its apical localization may be expected to primarily affect the urine metabolome, we also observed six plasma mQTLs for glutathione-related (cysteinylglycine, oxidized cysteinylglycine, cysteinylglycine disulfide*) and other (picolinate, picolinoylglycine, X-25244) metabolites, suggesting extra-renal roles (Fig. 8a and Supplementary Table 3).

**Fig. 8: DPEP1 influences plasma levels of major digestive enzymes.**

DPEP1 is also highly expressed but less well studied in the small intestine, pancreas and testis (Fig. 8b). We therefore explored additional, systemic roles of DPEP1 through colocalization analysis using data from a recently published GWAS of the circulating plasma proteome (Methods)⁴⁶. Eight of 4,907 protein readouts contained significant associations with SNPs in the DPEP1 locus, including the DPEP1 protein itself. Strikingly, all of the seven other proteins are digestive enzymes or zymogens produced primarily in the exocrine pancreas and secreted into the small intestine (Supplementary Table 23). Positive colocalizations supported a shared genetic basis of DPEP1-related metabolites and of circulating readouts of DPEP1 and the digestive enzymes (Fig. 8c and Supplementary Fig. 3). Higher urine prolylglycine, that is, lower inferred DPEP1 function, was associated with lower plasma levels of DPEP1 and all seven digestive enzymes. These observations point toward an underappreciated role of DPEP1 and motivate experimental studies to identify the underlying mechanisms.

We also detected multiple, pleiotropic colocalizations of DPEP1-related mQTL, especially with osteoarthrosis, hypertension and intake of blood pressure medication (Supplementary Table 13 and Fig. 8d). Colocalization supported inverse associations between genetically higher levels of glutathione-related metabolites, that is, lower DPEP1 activity, and lower risk of arthropathies, consistent with a reported beneficial effect of glutathione on osteoarthritis⁴⁷. Conversely, genetically predicted higher levels of urine and plasma picolinate and picolinoylglycine showed a positive relationship with osteoarthrosis. An opposite pattern was observed with respect to hypertension (Fig. 8d). Modification of DPEP1 function, for example, with its specific inhibitor cilastatin, is therefore expected to have opposing effects on osteoarthritis and hypertension risk.

Discussion

This large-scale comparative study of the genetic footprint on the plasma and urine metabolomes uncovered numerous associations not reported previously and yielded several principal findings: first, the number of detected mQTLs is similarly large in plasma and urine, while the underlying metabolites show differences. Second, multi-matrix studies deliver many more associations than the individual analysis of similarly sized plasma or urine studies. Third, differences in metabolomic footprints between plasma and urine can deliver insights into the physiological function and localization of proteins operating at compartment interfaces and implicate different disease-related mechanisms. Fourth, the detected mQTLs and their colocalizing traits and diseases constitute a rich resource for the formulation of biologically plausible hypotheses regarding the in vivo physiological function of transporters and enzymes for future experimental studies.

Genetic studies of the metabolome using multiple matrices can provide information that cannot be obtained from studies using a single matrix such as plasma. Not only were ~60% of the metabolites quantified in just one matrix, but the combined study of paired metabolomes also allowed for the distinction of kidney-specific and systemic processes. In fact, 49% of mQTLs arising from a metabolite quantified in both matrices were detected exclusively in plasma or in urine, underscoring the fact that plasma and urine contain complementary information on the handling of metabolites by different organs. This is exemplified by the effect of AQP-7 p.Gly264Val on glycerol levels: urine is in direct contact with the apical membrane of tubular epithelial cells, where this glycerol transporter is expressed. Urine therefore is the appropriate matrix to capture the function of this transporter in the kidney in vivo, as was true for a urine-specific association between bile acids and a regulatory variant affecting renal SLC10A2 expression. Conversely, the detection of plasma-specific effects of SLC13A3 variants on malate and fumarate levels can be explained by the basolateral localization of the encoded NaDC3 transporter in kidney epithelial cells. More generally, the study of similarities and differences of the paired plasma and urine metabolome is especially informative for functions of the kidney. Paired studies of the plasma metabolome and other matrices such as intestinal fluids or breath air could provide new insights about specific functions of the digestive organs and lungs, respectively.

Our study confirms that common genetic variants, mQTLs, sometimes explain >50% of the observed metabolite variance. Although this translates into much smaller effects on complex diseases such as hypertension, arthropathies or gallstone disease, colocalization can nominate shared pathophysiological mechanisms and inform about potential therapeutic targets, repurposing opportunities and potential side effects of approved drugs. Our study includes numerous such examples, supported by the recent launch of new drugs such as evinacumab, a monoclonal antibody targeting angiopoietin-like 3 (ANGPTL3) to treat dyslipidemia, or the SLC10A2 inhibitor odevixibat to treat cholestasis. Even if a target implicated by metabolites in our study is not desirable or amenable for therapeutic modulation, disease-associated metabolites may represent valuable intermediate biomarkers for risk prediction or response to treatment.

Some limitations warrant mention: while we show here and in prior work⁵ that genetic effects on metabolites are of comparable direction and magnitude in persons with and without reduced eGFR, future studies are required to examine whether our findings are generalizable to persons of non-European ancestry. Our study did not test the effects of rare and ultra-rare coding variants that may have particularly large effects, which could address remaining uncertainties in the assignment of the underlying causal gene(s) inherent to GWAS. Our gene-prioritization workflow incorporated information on gene expression from dozens of tissues. In addition to differences in tissue sample size, such prioritization can implicate several genes and tissues in a given locus, including scenarios in which different genes in one locus receive support from different tissues⁴⁸. Moreover, our workflow prioritized coding genes over noncoding ones such as long noncoding RNA. Although many GWAS loci in which a causal gene has been experimentally validated implicate coding genes, noncoding genes are also recognized mediators of association signals with complex human traits such as cardiovascular diseases^49,50. Lastly, our study employed semi-quantitative metabolite quantification, while additional targeted studies with absolute quantification are required to study fractional metabolite excretion and for clinical translation.

In conclusion, this genetic study of the paired plasma and urine metabolome emphasizes the role of multi-matrix studies to gain new insights into in vivo metabolic processes in general and the function of the kidney in particular. The results provide a rich resource for the experimental validation of yet unknown enzymatic and transport processes that may represent a molecular link between genetic variants and human traits and diseases.

Methods

Study design and participants

The GCKD study is an ongoing prospective observational study that enrolled 5,217 adult persons with CKD between 2010 and 2012. Patients regularly seen by nephrologists with eGFR between 30 and 60 ml min⁻¹ per 1.73 m² or eGFR >60 ml min⁻¹ per 1.73 m² with UACR > 300 mg per g (or urinary protein/creatinine ratio > 500 mg per g) were included⁵³. This study used biomaterials collected at the baseline visit, shipped frozen to a central biobank and stored at −80 °C⁵⁴. A more detailed description of the study design, standard operating procedures and the recruited study population has been published^53,55. The GCKD study was registered in the national registry for clinical studies (DRKS 00003971) and approved by local ethic committees of the participating institutions (universities or medical faculties of Aachen, Berlin, Erlangen, Freiburg, Hannover, Heidelberg, Jena, München and Würzburg)⁵³. All participants provided written informed consent. For this project, metabolites were quantified from stored EDTA plasma and spot urine. Information on genome-wide genotypes, covariates and metabolites was available for 4,960 (plasma) and 4,912 (urine) persons.

Genotyping and imputation

Genotyping and data cleaning in the GCKD study were conducted as follows^5,56. Genomic DNA from GCKD participants was genotyped at 2,612,357 variants using Illumina Omni2.5Exome BeadChip arrays and imputed using minimac3 version 2.0.1 at the Michigan Imputation Server⁵⁷ and the Haplotype Reference Consortium haplotype version r1.1 and Eagle 2.3 for phasing. On the variant level, SNPs with <96% call rate, imputation quality of r² ≤ 0.3, MAF < 1% or deviating from Hardy–Weinberg equilibrium (P < 1 × 10⁻¹⁰) and all multi-allelic SNPs were removed. The cleaned genotype dataset contained 5,034 individuals and 7,724,508 high-quality autosomal variants for GWAS. Genotyping of ARIC samples was performed on the Affymetrix 6.0 DNA microarray and filtered for call rates <90% and Hardy–Weinberg equilibrium P values < 10⁻⁶. SNPs were then imputed to the TOPMed Freeze 5b reference panel and filtered for r² ≤ 0.1 (imputation quality).

Metabolite identification and quantification

Non-targeted mass spectrometry analysis was performed at Metabolon, and sample preparation was carried out as published by Schlosser et al.⁵. Automated comparison of the ion features in the experimental samples to a reference library of chemical standard entries (>4,500 purified standards) was used for metabolite identification. Known metabolites reported in this study conformed to confidence level 1 (the highest confidence level of identification) of the Metabolomics Standards Initiative^58,59, unless otherwise denoted with an asterisk. Additional mass spectral entries have been created for compounds of unknown structural identity (unnamed biochemicals; >2,750 in the Metabolon library), which have been identified by virtue of their recurrent nature (both chromatographic and mass spectral). Peaks were quantified using the area under the curve and normalized to correct for variation resulting from instrument interday tuning differences by the median value for each run day. Likewise, metabolites in the ARIC replication sample were also quantified with the Metabolon HD4 platform.

Data cleaning of quantified metabolites

An in-house pipeline was set up for data quality control, filtering and normalization of metabolite concentrations. No plasma specimens and four pairs of urine specimens with a Pearson correlation coefficient greater than 0.9 and differing sample IDs were removed. Four plasma specimens and no urine specimens were removed for >50% missing data. A total of 130 plasma and 131 urine metabolites were removed, as less than 300 genotyped samples were available.

To account for urine dilution, concentrations of each metabolite were pq normalized based on endogenous metabolites with <1% missing values (n_metabolites = 309)⁶⁰. Of the log₂-transformed metabolites, 15 plasma metabolites were excluded for low variance (<0.01), and none were excluded for too many outliers (>5% of samples outlying >5 s.d.). Three plasma samples and one urine sample represented an outlier >5 s.d. along one of the first 15 principal components based on metabolites with complete information. The final dataset consisted of 1,296 plasma and 1,401 urine log₂-transformed traits for subsequent GWAS. Supplementary Table 2 provides detailed annotation of the metabolites, including heritability estimates for metabolites with at least one genetic association. Over the course of this project, two formerly different urine metabolites were merged because they represented the same molecule: X-12739 and X-24527 to the glutamine conjugate of C₆H₁₀O₂ (1)* and X-23667 and X-24759 to (2-butoxyethoxy)acetic acid.

Definition of additional variables

In the GCKD study, an IDMS-traceable enzymatic assay (Creatinine Plus, Roche) was used to measure serum creatinine levels, for estimating GFR by means of the CKD-EPI formula⁶¹, and to measure urine creatinine levels. The Tina-quant Albumin assay (Roche) was used to measure serum and urine albumin, for adjustment and calculation of the UACR, respectively. The GFR was estimated in the ARIC study from serum creatinine and cystatin C using the CKD-EPI formula⁶².

Genome-wide association study of metabolite levels

Based on log₂-transformed metabolite levels, residuals adjusted for age, sex and the first three genetic principal components were generated (similar to previous mGWAS^5,6,56,63,64), with plasma levels additionally adjusted for ln(eGFR) and serum albumin. GWAS analyses of these residuals were performed with SNPTEST version 2.5.2 (https://www.well.ox.ac.uk/~gav/snptest/), using imputed genotype dosages and linear regression under additive modeling. Statistical significance was defined as genome-wide significance after correcting for multiple testing by a Bonferroni procedure (3.9 × 10⁻¹¹ = 5 × 10⁻⁸ ÷ 1,296 plasma traits; 3.6 × 10⁻¹¹ = 5 × 10⁻⁸ ÷ 1,401 urine traits).

Significantly associated SNPs were assigned to mQTL by selecting, for each trait, the SNP with the lowest P value as the index SNP, defining the corresponding locus as a 1-Mb interval centered on the index SNP and repeating the procedure for unassigned SNPs until no further genome-wide significant SNP remained. For each trait, overlapping intervals were combined into mQTL. The extended MHC region (chromosome 6, 25.5–34 Mb) was treated as one region. For associations with MAF < 3%, mQTLs were only kept if the index SNP remained significant with inverse-normal-transformed metabolite data. A regional association plot centered on the index SNP for each mQTL was generated using LocusZoom (version 1.3) and LD information from GCKD study genotypes⁵². Circular plots were created using Circos version 0.69-6 (ref. ⁶⁵). The variance in metabolite levels explained by the index SNP of an mQTL was computed independently of other covariates.

We compared our findings to those from seven large studies of the plasma–serum metabolome that were published in peer-reviewed journals and shared their summary statistics^{6,7,8,9,10,11,12}. These studies were selected to maximize overlap with our findings as studies of EA participants with large sample size that examined the effects of common SNPs on plasma–serum metabolite levels quantified with the Metabolon assay, rather than on rare variant association studies, or GWAS of metabolites quantified by different methods and/or in other populations, for example, refs. ^{66,67,68,69,70}. Metabolites were matched by compound or chemical ID, if available, and biochemical name (ones not identical were checked manually). First, for each mQTL identified in one of the published plasma–serum studies mentioned above, available index SNPs were extracted from GWAS of the corresponding metabolite in plasma and urine, and effect direction and statistical significance were assessed at different levels of statistical significance (P value < 0.05, <0.05 ÷ no. mQTLs in the previous study, <5 × 10⁻⁸ and <5 × 10⁻⁸ ÷ no. mQTLs in the previous study). Validation required effect-direction consistency for comparisons involving results from the GCKD plasma mGWAS. Second, for each mQTL identified in this study in plasma or urine, availability of the corresponding index SNP and metabolite in the summary statistics of the previously published plasma–serum studies was assessed. If the index SNP was missing, we searched for proxy SNPs in high LD (r² > 0.8) within a window of ±500 kb around the index SNP based on genetic data from the 1000 Genome Project phase 3 version 5 of European ancestry using https://snipa.helmholtz-muenchen.de/snipa/?task=proxy_search. For each study, the best available proxy SNP in terms of maximal LD and minimal distance was selected. Summary statistics were downloaded from https://metabolomics.helmholtz-muenchen.de/gwas/index.php?task=download (Shin et al.⁶), http://www.hli-opendata.com/Metabolome (Long et al.⁷, only summary statistics with P value < 10⁻⁵), https://omicscience.org/apps/crossplatform/ (Lotta et al.⁸), https://pheweb.org/metsim-metab/ (Yin et al.¹⁰), https://omicscience.org/apps/mgwas/mgwas.table.php (Surendran et al.¹¹) and http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/; accession numbers for European GWAS are GCST90199621–GCST90201020 (Chen et al.¹²). Hysi et al.⁹ shared their summary statistics upon request.

To determine the number of urine mQTLs not reported in our earlier study⁵, we examined for each mQTL from this study whether an associated SNP within a window of ±500 kb for the corresponding metabolite was identified in the earlier study.

Replication analyses in the ARIC study were performed using log₂-transformed metabolite levels and the same covariables. Statistical significance was defined by a Bonferroni procedure (P value < 0.05 ÷ 459 and 0.05 ÷ 430 association tests with matching data for EA and AA, respectively) and consistent effect directions as in the GCKD study.

We included an interaction term between the mQTL and sex in a linear regression model with the same adjustments as before to test for potential differences of the 1,299 mQTLs in men and women. For significant interactions (P value < 0.05 ÷ 1,299), we performed sex-stratified analyses (Supplementary Table 7).

Heritability estimation

A genetic relationship matrix was calculated from all autosomal SNPs with an imputation quality of r² > 0.6 using GCTA-GRM⁷¹. GCTA-GREML⁷² was then used to estimate the proportion of variation in log₂-transformed and, in the case of urine, pq-normalized metabolite levels that can be explained by the SNPs for all metabolites that gave rise to an mQTL.

Independent SNP selection and statistical fine mapping

We identified independent signals within mQTL using approximate conditional analyses, with LD information estimated from our study sample. The fine-mapping regions of mQTL were aligned within matrices across metabolites, if index SNPs were in LD (r² > 0.8). For each mQTL, the GCTA-COJO Slct algorithm version 1.91.6 (ref. ⁷³) was used to identify independent genome-wide significant SNPs (P_conditional < 3.9 × 10⁻¹¹), using a collinearity cutoff of 0.1. For mQTL with multiple independent SNPs, approximate conditional analyses were carried out conditioning on the other independent SNPs in the region using the GCTA-COJO Cond algorithm to estimate conditional effect sizes. Statistical fine mapping was performed for all independent SNPs per mQTL. In loci with a single independent SNP, approximate Bayes factors (ABFs) were calculated from the original GWAS effect estimates using Wakefield’s formula⁷⁴ with a standard deviation prior of 1.33. For mQTL with multiple independent SNPs, ABFs were derived from the conditional effect estimates. The SNP’s ABF was used to calculate the posterior probability for the variant driving the association signal (PPA, ‘causal variant’). Credible sets were calculated by summing the PPA across PPA-ranked variants until the cumulative PPA was >99%. log₂-transformed credible set sizes were regressed on the MAFs of independent index SNPs.

Pairwise colocalization tests of plasma and urine mQTL

To examine whether association patterns with metabolites measured in plasma and/or urine are shared across or within matrices, we conducted pairwise colocalization analyses between mQTL. When the windows of ±500 kb around the index SNPs for two mQTLs overlapped, colocalization was performed within the region of the merged windows using a version of Giambartolomei’s colocalization method⁷⁵ as implemented with the ‘coloc.fast’ function from the R package ‘gtx’ (https://github.com/tobyjohnson/gtx) with default parameters and prior definitions. To visualize the effect sizes and explained variance for colocalizing signals for mQTLs detected for the same metabolite across matrices (Extended Data Fig. 4), we used the R package ‘circlize’ (ref. ⁷⁶).

Annotation

SNP annotation was performed by querying the SNiPA database version 3.4 (released 13 November 2020)¹³, based on the 1000 Genomes phase 3 version 5 and Ensembl version 87 datasets. The retrieved combined annotation-dependent depletion (CADD) score was based on CADD version 1.3. The Ensembl VEP tool was used for the effect prediction of SNPs. SNiPA was used to collect the following annotations for each index SNP: gene hit or close by, regulated genes, CADD score, SnpEff effect impact (exonic and noncoding), mQTL, pQTL, GWAS Catalog, cis eQTL, disease genes (based on ClinVar, OMIM, HGMD and Drugbank) and UK Biobank associations.

To select the most likely causal gene for each mQTL, the following steps were carried out: first, we compiled the ‘genes’ and ‘evidence’ information based on SNiPA¹³. Index SNPs were queried for association with differential expression of a nearby gene in tubulointerstitial kidney portions (cis eQTL) from 187 patients with CKD using the NephQTL browser⁷⁷ and GTEx version 8 eQTL data⁷⁸. Similarly, SNPs were queried for associations with differential levels of nearby proteins in plasma (2,751 unique proteins represented by 3,022 SOMAmers) in data from Sun et al.⁷⁹ downloaded from http://www.phpc.cam.ac.uk/ceu/proteins/. Second, when one or more cis eQTL or cis pQTL associations with P < 0.05 ÷ 409 (plasma, 409 unique index SNPs) or P < 0.05 ÷ 410 (urine, 410 unique index SNPs), respectively, was identified within ±100 kb of an index SNP, colocalization analyses of the respective metabolite(s)’ mQTL and each of the eQTL and/or pQTL association(s) were performed within the eQTL–pQTL cis window in the underlying study (gene region ±500 kb) using the method outlined above. Positive colocalizations with gene expression received equal weight for all investigated tissues to maximize the opportunity to detect processes in tissues interacting with blood and being filtered to urine. Sensitivity analyses assigning 1.5-fold and twofold greater weight to colocalizations arising from kidney or liver tissue or from kidney tissue only yielded almost identical results. The evidence codes h, r, e, p, m and c based on SNiPA¹³ correspond to gene hit or close by, regulated genes, cis eQTL, cis pQTL, missense variants and disease genes based on pathogenic variants known to cause monogenic diseases, respectively. The evidence code E designated genes with evidence for colocalization with gene expression genome-wide association, and P designated those with protein-level genome-wide association. Evidence codes were collected and summed for each gene, where Ee and Pp only counted as one. The gene with the highest sum of scores within each locus was assigned as the most likely causal gene. In the case of ties, genes with evidence for gene expression colocalization were prioritized, followed by protein-level colocalization, followed by genes for which an inborn error of metabolism with the corresponding metabolite is known. When ties still remained, Ee scores were prioritized over E scores and Pp scores were prioritized over P scores. In all other cases, ties were resolved by prioritizing the closest gene; prioritization by distance determined the assigned most likely causal gene at 17% (221 of 1,299) of mQTLs. Lastly, the prioritized gene list was manually reviewed for biological plausibility based on published evidence and at colocalizing mQTLs as outlined in the Supplementary Methods. In case of a clear biological fit to another scored gene (that is, corresponding monogenic disease or animal model), the prioritized gene was reassigned. This final gene list (n = 282) was used as input for downstream gene-based analyses. Known drugs were annotated for each gene and the corresponding indication and status of approval based on https://platform.opentargets.org/.

Relation of mQTLs to plasma proteins in trans and phenotypes

We also performed colocalization analyses of mQTLs with disease outcomes and biomarker measurements in the UK Biobank, with two representative kidney function traits and with trans pQTLs using the precomputed pQTL data from Sun et al.⁷⁹ to gain insights into clinical consequences and potential molecular mediators of mQTLs. Association summary statistics between SNPs and 30 biomarkers from the UK Biobank baseline examination, including the liver function markers AST, ALT, GGT, bilirubin and albumin, were computed using BOLT-LMM⁸⁰ (application no. 20272) in the same subset of European-ancestry participants as previous studies⁸¹. Precomputed GWAS summary statistics of diseases as ascertained in the UK Biobank and analyzed using phecodes were obtained from https://www.leelabsg.org/resources (1,403 binary traits) and from https://yanglab.westlake.edu.cn/data/ukb_fastgwa/imp_binary/ (2,325 of 2,989 binary traits⁸²; traits containing job-coding terms were excluded from the analysis). There were 816 phecodes analyzed in both, but only unique phecodes were counted for positive colocalizations. We used GWAS summary statistics of creatinine-based and cystatin C-based eGFR (eGFRcrea and eGFRcys) from Stanzick et al.⁸³, who meta-analyzed kidney function GWAS from the CKDGen Consortium and the UK Biobank. The GWAS summaries were downloaded from the CKDGen data website at https://ckdgen.imbi.uni-freiburg.de. Colocalization testing between mQTL and trans pQTL was performed within a window of ±500 kb around the mQTL’s index SNP when at least one trans pQTL association with P < 0.05 ÷ 409 ÷ 3,000 for plasma and P < 0.05 ÷ 410 ÷ 3,000 for urine was present within a window of ±100 kb around the index SNP. Similarly, colocalization analysis between mQTL and biomarkers, diseases and kidney function traits was performed within ±500 kb of the index SNP when there were one or more associated variants with MAF > 0.01 and P < 0.05 ÷ 409 or P < 0.05 ÷ 410, respectively, within ±100 kb of the index SNP, using the method outlined above.

Moreover, we investigated whether the most likely mQTL-related genes contained rare, putatively damaging variants that in aggregate are associated with clinical traits and diseases. Gene–phenotype associations based on whole-exome-sequencing data from ~450,000 UK Biobank participants were obtained on 4 February 2022 from the AstraZeneca PheWAS Portal (https://azphewas.com/) for the 274 available genes of the 282 mQTL-related genes²². We identified 2,745 distinct suggestive (P < 1 × 10⁻⁵) gene–phenotype associations for 115 of those genes. The significance threshold as derived for the PheWAS was 2 × 10⁻⁹ (ref. ²²). Only the most significant collapsing model per trait was retained for Fig. 4b. In addition, the exome-wide variant-level results were downloaded on 26 August 2022. The 17,493 analyzed phenotypes were queried for significant (P value < 0.05 ÷ 17,493) associations with mQTL index SNPs (Supplementary Table 16).

We further performed colocalization testing of independent signals for all the 12 identified mQTLs within the DPEP1 genomic region and plasma proteins with a reported pQTL in the DPEP1 locus⁴⁶. Metabolite and plasma protein summary statistics were extracted with a 500-kb flanking region around DPEP1 and the DPEP1 mQTL index SNP for the proteins CPB1, AMY2B, PNLIP, AMY2A, REG3G, CTRB2 and PNLIPRP1. First, independent association signals were identified based on approximate conditional analyses via the GCTA-COJO Slct algorithm (P value < 5 × 10⁻⁸; collinearity threshold = 0.1)⁷³. For each conditionally independent SNP, conditional summary statistics were computed by conditioning on all other independent SNPs in the region using the GCTA-COJO Cond algorithm (collinearity threshold = 0.1)⁷³. Subsequently, colocalization analyses were conducted for all pairwise combinations of the conditionally independent mQTL and pQTL associations as outlined above. For the gallstone disease GWAS⁸⁴ and urine glycocholate, we performed colocalization analysis of signals conditioning on the plasma index SNP (rs55971546) within ±500 kb of the SLC10A2 urine mQTL index SNP (rs16961281). The same conditional mQTL summary statistics were colocalized with kidney eQTL³⁸. Marginal statistics were used for these, as rs55971546 was not available (FDR > 0.01).

Processing of gene expression data from tissue and cell types

To test for over-representation of plasma or urine mQTL-related genes among those highly expressed in specific tissues and cell types, we compiled bulk and single-cell gene expression (RNA-seq) datasets. These included GTEx version 8 (ref. ⁷⁸), the Human Liver Cell Atlas⁸⁵, a single-cell dataset and a single-nucleus dataset from the human kidney^86,87, a single-cell dataset from the mouse kidney⁸⁸, a single-cell dataset from the human intestine⁸⁹ and a single-nucleus dataset from the kidneys of patients with CKD from the Kidney Precision Medicine Project (KPMP)⁹⁰. Except for the KPMP, data sources and processing followed the workflow published by Cheng et al.⁹¹. KPMP data were downloaded from the KPMP Kidney Tissue Atlas repository at https://atlas.kpmp.org/repository as Seurat-format files and were subsequently processed in Seurat⁹² similar to the other datasets. For generation of the top 10% highly expressed genes for each tissue and cell type in each dataset, we followed the workflow published by Schlosser et al.⁵.

GO, KEGG, tissue and cell type enrichment analyses

Enrichment testing of the 282 identified genes was performed as follows. The number of independent SNPs per gene was computed using GCKD genotypes (PLINK version 1.90 (ref. ⁹³)), and a database of Entrez gene identifiers based on org.Hs.eg.db version 3.8.2 was generated. Gene annotation included the number of independent SNPs per gene, gene length, GO terms⁹⁴ and KEGG pathways⁹⁵, as well as being Human Protein Atlas tissue or group enriched⁹⁶; Human Protein Atlas cell type enhanced, enriched or group enriched⁹⁷; being a VIP gene from PharmGKB (accessed 5 December 2020)⁹⁸; being a gene with an actionable drug interaction from the Clinical Pharmacogenetics Implementation Consortium (levels A, A/B and B; accessed 13 January 2021)⁹⁹; and being among the top 10% highly expressed genes in each GTEx version 8 tissue⁷⁸ and human^{85,86,87,89,90} and murine cell types⁸⁸. We performed 100 million random draws of an equal number of genes as contained in the respective source list (combined mQTLs, 282; plasma mQTLs, 214; urine mQTLs, 195; plasma-only mQTLs, 87; urine-only mQTLs, 68), matched for deciles of the number of independent SNPs and deciles of gene length and compared any overlap with cell types, tissues and terms with the ones identified for the original source list. Multiple-testing correction was performed using the Benjamini–Hochberg procedure¹⁰⁰.

Lastly, we tested for over-representation of certain phenotypes among mice in which the implicated genes had been genetically manipulated. The phenotype terms ‘abnormal homeostasis’ (MP:0001764) and ‘abnormal metabolism’ (MP:0005266), all of their child terms and all genes associated with these terms were downloaded from MouseMine¹⁰¹ on 9 December 2021. Mouse genes were mapped to their human homologs using the getLDS function from the biomaRt package¹⁰². Human and mouse genes that did not map to a homolog in the respective other species were excluded from the analysis. This excluded 861 of 6,051 abnormal homeostasis genes, 61 of 952 abnormal metabolism genes and ten of 282 mGWAS genes (PYCRL, GBA3, PPDPFL, CETP, NAT16, ZNF680, ENOSF1, ACSM6, FUT3, ZNF675). The genes identified from urine, plasma or both were tested for over-representation among the genes belonging to each of the phenotype terms using Fisher’s exact test (with the universe set to 13,151 genes, the number of mouse genes that mapped to human homologs in the Mouse Genome Informatics database), followed by Benjamini–Hochberg correction for multiple testing.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Genome-wide summary statistics are available through the NHGRI-EBI GWAS Catalog (GCST90264176–GCST90266872, https://www.ebi.ac.uk/gwas/).

Code availability

We have clearly indicated each software whenever applicable and provided information on options (Methods and Reporting Summary).

References

Boron, W. F. & Boulpaep, E. L. Medical Physiology (Elsevier, 2017).
Gyimesi, G., Pujol-Gimenez, J., Kanai, Y. & Hediger, M. A. Sodium-coupled glucose transport, the SLC5 family, and therapeutically relevant inhibitors: from molecular discovery to clinical application. Pflugers Arch. 472, 1177–1206 (2020).
CAS PubMed PubMed Central Google Scholar
Anzai, N. & Endou, H. Urate transporters: an evolving field. Semin. Nephrol. 31, 400–409 (2011).
CAS PubMed Google Scholar
Evans, A. M. et al. High resolution mass spectrometry improves data quantity and quality as compared to unit mass resolution mass spectrometry in high-throughput profiling metabolomics. Metabolomics 4, 132 (2014).
Google Scholar
Schlosser, P. et al. Genetic studies of urinary metabolites illuminate mechanisms of detoxification and excretion in humans. Nat. Genet. 52, 167–176 (2020).
CAS PubMed PubMed Central Google Scholar
Shin, S. Y. et al. An atlas of genetic influences on human blood metabolites. Nat. Genet. 46, 543–550 (2014).
CAS PubMed PubMed Central Google Scholar
Long, T. et al. Whole-genome sequencing identifies common-to-rare variants associated with human blood metabolites. Nat. Genet. 49, 568–578 (2017).
CAS PubMed Google Scholar
Lotta, L. A. et al. A cross-platform approach identifies genetic regulators of human metabolism and health. Nat. Genet. 53, 54–64 (2021).
CAS PubMed PubMed Central Google Scholar
Hysi, P. G. et al. Metabolome genome-wide association study identifies 74 novel genomic regions influencing plasma metabolites levels. Metabolites 12, 61 (2022).
CAS PubMed PubMed Central Google Scholar
Yin, X. et al. Genome-wide association studies of metabolites in Finnish men identify disease-relevant loci. Nat. Commun. 13, 1644 (2022).
CAS PubMed PubMed Central Google Scholar
Surendran, P. et al. Rare and common genetic determinants of metabolic individuality and their effects on human health. Nat. Med. 28, 2321–2332 (2022).
CAS PubMed PubMed Central Google Scholar
Chen, Y. et al. Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases. Nat. Genet. 55, 44–53 (2023).
CAS PubMed Google Scholar
Arnold, M., Raffler, J., Pfeufer, A., Suhre, K. & Kastenmuller, G. SNiPA: an interactive, genetic variant-centered annotation browser. Bioinformatics 31, 1334–1336 (2015).
PubMed Google Scholar
Schlosser, P. et al. Netboost: boosting-supported network analysis improves high-dimensional omics prediction in acute myeloid leukemia and Huntington’s disease. IEEE/ACM Trans. Comput. Biol. Bioinform. 18, 2635–2648 (2021).
CAS Google Scholar
Meixner, E. et al. A substrate-based ontology for human solute carriers. Mol. Syst. Biol. 16, e9652 (2020).
CAS PubMed PubMed Central Google Scholar
Gyimesi, G. & Hediger, M. A. Systematic in silico discovery of novel solute carrier-like proteins from proteomes. PLoS ONE 17, e0271062 (2022).
CAS PubMed PubMed Central Google Scholar
Reynolds, L. M. et al. FADS genetic and metabolomic analyses identify the 5 desaturase (FADS1) step as a critical control point in the formation of biologically important lipids. Sci. Rep. 10, 15873 (2020).
CAS PubMed PubMed Central Google Scholar
Veiga-da-Cunha, M. et al. Molecular identification of NAT8 as the enzyme that acetylates cysteine S-conjugates to mercapturic acids. J. Biol. Chem. 285, 18888–18898 (2010).
CAS PubMed PubMed Central Google Scholar
Konig, J., Seithel, A., Gradhand, U. & Fromm, M. F. Pharmacogenomics of human OATP transporters. Naunyn Schmiedebergs Arch. Pharm. 372, 432–443 (2006).
Google Scholar
Reimer, R. J. SLC17: a functionally diverse family of organic anion transporters. Mol. Aspects Med. 34, 350–359 (2013).
CAS PubMed PubMed Central Google Scholar
Cheng, H. Y., You, H. Y. & Zhou, T. B. Relationship between GSTM1/GSTT1 null genotypes and renal cell carcinoma risk: a meta-analysis. Ren. Fail. 34, 1052–1057 (2012).
CAS PubMed Google Scholar
Wang, Q. et al. Rare variant contribution to human disease in 281,104 UK Biobank exomes. Nature 597, 527–532 (2021).
CAS PubMed PubMed Central Google Scholar
Raal, F. J. et al. Evinacumab for homozygous familial hypercholesterolemia. N. Engl. J. Med. 383, 711–720 (2020).
CAS PubMed Google Scholar
Woodward, O. M. et al. Identification of a urate transporter, ABCG2, with a common functional polymorphism causing gout. Proc. Natl Acad. Sci. USA 106, 10338–10342 (2009).
CAS PubMed PubMed Central Google Scholar
Bustamante, M. et al. A genome-wide association meta-analysis of diarrhoeal disease in young children identifies FUT2 locus and provides plausible biological pathways. Hum. Mol. Genet. 25, 4127–4142 (2016).
CAS PubMed PubMed Central Google Scholar
Barton, S. J. et al. FUT2 genetic variants and reported respiratory and gastrointestinal illnesses during infancy. J. Infect. Dis. 219, 836–843 (2019).
CAS PubMed Google Scholar
Nielsen, S. et al. Aquaporins in the kidney: from molecules to medicine. Physiol. Rev. 82, 205–244 (2002).
CAS PubMed Google Scholar
Sohara, E. et al. Defective water and glycerol transport in the proximal tubules of Aqp7 knockout mice. Am. J. Physiol. Renal Physiol. 289, F1195–F1200 (2005).
CAS PubMed Google Scholar
Goubau, C. et al. Homozygosity for aquaporin 7 G264V in three unrelated children with hyperglyceroluria and a mild platelet secretion defect. Genet. Med. 15, 55–63 (2013).
CAS PubMed Google Scholar
Dawson, P. A., Lan, T. & Rao, A. Bile acid transporters. J. Lipid Res. 50, 2340–2357 (2009).
CAS PubMed PubMed Central Google Scholar
Wilson, F. A., Burckhardt, G., Murer, H., Rumrich, G. & Ullrich, K. J. Sodium-coupled taurocholate transport in the proximal convolution of the rat kidney in vivo and in vitro. J. Clin. Invest. 67, 1141–1150 (1981).
CAS PubMed PubMed Central Google Scholar
Craddock, A. L. et al. Expression and transport properties of the human ileal and renal sodium-dependent bile acid transporter. Am. J. Physiol. 274, G157–G169 (1998).
CAS PubMed Google Scholar
Ho, R. H. et al. Functional characterization of genetic variants in the apical sodium-dependent bile acid transporter (ASBT; SLC10A2). J. Gastroenterol. Hepatol. 26, 1740–1748 (2011).
CAS PubMed PubMed Central Google Scholar
Love, M. W. et al. Analysis of the ileal bile acid transporter gene, SLC10A2, in subjects with familial hypertriglyceridemia. Arterioscler. Thromb. Vasc. Biol. 21, 2039–2045 (2001).
CAS PubMed Google Scholar
Ferkingstad, E. et al. Genome-wide association meta-analysis yields 20 loci associated with gallstone disease. Nat. Commun. 9, 5101 (2018).
PubMed PubMed Central Google Scholar
Grosser, G., Muller, S. F., Kirstgen, M., Doring, B. & Geyer, J. Substrate specificities and inhibition pattern of the solute carrier family 10 members NTCP, ASBT and SOAT. Front. Mol. Biosci. 8, 689757 (2021).
CAS PubMed PubMed Central Google Scholar
St-Pierre, M. V., Kullak-Ublick, G. A., Hagenbuch, B. & Meier, P. J. Transport of bile acids in hepatic and non-hepatic tissues. J. Exp. Biol. 204, 1673–1686 (2001).
CAS PubMed Google Scholar
Liu, H. et al. Epigenomic and transcriptomic analyses define core cell types, genes and targetable mechanisms for kidney disease. Nat. Genet. 54, 950–962 (2022).
CAS PubMed Google Scholar
Sheng, X. et al. Mapping the genetic architecture of human traits to cell types in the kidney identifies mechanisms of disease and potential treatments. Nat. Genet. 53, 1322–1333 (2021).
CAS PubMed PubMed Central Google Scholar
Breljak, D. et al. Distribution of organic anion transporters NaDC3 and OAT1–3 along the human nephron. Am. J. Physiol. Renal Physiol. 311, F227–F238 (2016).
CAS PubMed Google Scholar
Chen, X., Tsukaguchi, H., Chen, X. Z., Berger, U. V. & Hediger, M. A. Molecular and functional analysis of SDCT2, a novel rat sodium-dependent dicarboxylate transporter. J. Clin. Invest. 103, 1159–1168 (1999).
CAS PubMed PubMed Central Google Scholar
Wang, H. et al. Structure, function, and genomic organization of human Na⁺-dependent high-affinity dicarboxylate transporter. Am. J. Physiol. Cell Physiol. 278, C1019–C1030 (2000).
CAS PubMed Google Scholar
Pajor, A. M., Gangula, R. & Yao, X. Cloning and functional characterization of a high-affinity Na⁺/dicarboxylate cotransporter from mouse brain. Am. J. Physiol. Cell Physiol. 280, C1215–C1223 (2001).
CAS PubMed Google Scholar
McIntyre, T. & Curthoys, N. P. Renal catabolism of glutathione. Characterization of a particulate rat renal dipeptidase that catalyzes the hydrolysis of cysteinylglycine. J. Biol. Chem. 257, 11915–11921 (1982).
CAS PubMed Google Scholar
Nitanai, Y., Satow, Y., Adachi, H. & Tsujimoto, M. Crystal structure of human renal dipeptidase involved in β-lactam hydrolysis. J. Mol. Biol. 321, 177–184 (2002).
CAS PubMed Google Scholar
Ferkingstad, E. et al. Large-scale integration of the plasma proteome with genetics and disease. Nat. Genet. 53, 1712–1721 (2021).
CAS PubMed Google Scholar
Setti, T. et al. The protective role of glutathione in osteoarthritis. J. Clin. Orthop. Trauma 15, 145–151 (2021).
PubMed Google Scholar
Xu, X. et al. Molecular insights into genome-wide association studies of chronic kidney disease-defining traits. Nat. Commun. 9, 4800 (2018).
PubMed PubMed Central Google Scholar
Kumar, V. et al. Human disease-associated genetic variation impacts large intergenic non-coding RNA expression. PLoS Genet. 9, e1003201 (2013).
CAS PubMed PubMed Central Google Scholar
Giral, H., Landmesser, U. & Kratzer, A. Into the wild: GWAS exploration of non-coding RNAs. Front. Cardiovasc. Med. 5, 181 (2018).
CAS PubMed PubMed Central Google Scholar
Gagliano Taliun, S. A. et al. Exploring and visualizing large-scale genetic associations by using PheWeb. Nat. Genet. 52, 550–552 (2020).
CAS PubMed PubMed Central Google Scholar
Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 26, 2336–2337 (2010).
CAS PubMed PubMed Central Google Scholar
Eckardt, K. U. et al. The German Chronic Kidney Disease (GCKD) study: design and methods. Nephrol. Dial. Transplant. 27, 1454–1460 (2012).
CAS PubMed Google Scholar
Prokosch, H. U. et al. Designing and implementing a biobanking IT framework for multiple research scenarios. Stud. Health Technol. Inform. 180, 559–563 (2012).
PubMed Google Scholar
Titze, S. et al. Disease burden and risk profile in referred patients with moderate chronic kidney disease: composition of the German Chronic Kidney Disease (GCKD) cohort. Nephrol. Dial. Transplant. 30, 441–451 (2015).
PubMed Google Scholar
Li, Y. et al. Genome-wide association studies of metabolites in patients with CKD identify multiple loci and illuminate tubular transport mechanisms. J. Am. Soc. Nephrol. 29, 1513–1524 (2018).
CAS PubMed PubMed Central Google Scholar
Das, S. et al. Next-generation genotype imputation service and methods. Nat. Genet. 48, 1284–1287 (2016).
CAS PubMed PubMed Central Google Scholar
Sumner, L. W. et al. Proposed minimum reporting standards for chemical analysis Chemical Analysis Working Group (CAWG) Metabolomics Standards Initiative (MSI). Metabolomics 3, 211–221 (2007).
CAS PubMed PubMed Central Google Scholar
Schrimpe-Rutledge, A. C., Codreanu, S. G., Sherrod, S. D. & McLean, J. A. Untargeted metabolomics strategies—challenges and emerging directions. J. Am. Soc. Mass. Spectrom. 27, 1897–1905 (2016).
CAS PubMed PubMed Central Google Scholar
Dieterle, F., Ross, A., Schlotterbeck, G. & Senn, H. Probabilistic quotient normalization as robust method to account for dilution of complex biological mixtures. Application in ¹H NMR metabonomics. Anal. Chem. 78, 4281–4290 (2006).
CAS PubMed Google Scholar
Levey, A. S. et al. A new equation to estimate glomerular filtration rate. Ann. Intern. Med. 150, 604–612 (2009).
PubMed PubMed Central Google Scholar
Inker, L. A. et al. New creatinine- and cystatin C-based equations to estimate GFR without race. N. Engl. J. Med. 385, 1737–1749 (2021).
CAS PubMed PubMed Central Google Scholar
Suhre, K. et al. Human metabolic individuality in biomedical and pharmaceutical research. Nature 477, 54–60 (2011).
CAS PubMed Google Scholar
Marchini, J. & Howie, B. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11, 499–511 (2010).
CAS PubMed Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
CAS PubMed PubMed Central Google Scholar
Konig, E. et al. Whole exome sequencing enhanced imputation identifies 85 metabolite associations in the Alpine CHRIS Cohort. Metabolites 12, 604 (2022).
CAS PubMed PubMed Central Google Scholar
Bomba, L. et al. Whole-exome sequencing identifies rare genetic variants associated with human plasma metabolites. Am. J. Hum. Genet. 109, 1038–1054 (2022).
CAS PubMed PubMed Central Google Scholar
Feofanova, E. V. et al. A genome-wide association study discovers 46 loci of the human metabolome in the Hispanic Community Health Study/Study of Latinos. Am. J. Hum. Genet. 107, 849–863 (2020).
CAS PubMed PubMed Central Google Scholar
Yousri, N. A. et al. Whole-exome sequencing identifies common and rare variant metabolic QTLs in a Middle Eastern population. Nat. Commun. 9, 333 (2018).
PubMed PubMed Central Google Scholar
Li-Gao, R. et al. Genetic studies of metabolomics change after a liquid meal illuminate novel pathways for glucose and lipid metabolism. Diabetes 70, 2932–2946 (2021).
CAS PubMed Google Scholar
Yang, J., Lee, S. H., Goddard, M. E. & Visscher, P. M. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88, 76–82 (2011).
CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42, 565–569 (2010).
CAS PubMed PubMed Central Google Scholar
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, S1–S3 (2012).
Google Scholar
Wakefield, J. Bayes factors for genome-wide association studies: comparison with P-values. Genet. Epidemiol. 33, 79–86 (2009).
PubMed Google Scholar
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
PubMed PubMed Central Google Scholar
Gu, Z., Gu, L., Eils, R., Schlesner, M. & Brors, B. circlize implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812 (2014).
CAS PubMed Google Scholar
Gillies, C. E. et al. An eQTL landscape of kidney tissue in human nephrotic syndrome. Am. J. Hum. Genet. 103, 232–244 (2018).
CAS PubMed PubMed Central Google Scholar
The GTEx Consortium. et al. Genetic effects on gene expression across human tissues. Nature 550, 204–213 (2017).
Google Scholar
Sun, B. B. et al. Genomic atlas of the human plasma proteome. Nature 558, 73–79 (2018).
CAS PubMed PubMed Central Google Scholar
Loh, P. R., Kichaev, G., Gazal, S., Schoech, A. P. & Price, A. L. Mixed-model association for biobank-scale datasets. Nat. Genet. 50, 906–908 (2018).
CAS PubMed PubMed Central Google Scholar
Sinnott-Armstrong, N. et al. Genetics of 35 blood and urine biomarkers in the UK Biobank. Nat. Genet. 53, 185–194 (2021).
CAS PubMed Google Scholar
Jiang, L., Zheng, Z., Fang, H. & Yang, J. A generalized linear mixed model association tool for biobank-scale data. Nat. Genet. 53, 1616–1621 (2021).
CAS PubMed Google Scholar
Stanzick, K. J. et al. Discovery and prioritization of variants and genes for kidney function in >1.2 million individuals. Nat. Commun. 12, 4350 (2021).
CAS PubMed PubMed Central Google Scholar
Kurki, M. I. et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 613, 508–518 (2023).
CAS PubMed PubMed Central Google Scholar
Aizarani, N. et al. A human liver cell atlas reveals heterogeneity and epithelial progenitors. Nature 572, 199–204 (2019).
CAS PubMed PubMed Central Google Scholar
Wu, H. et al. Comparative analysis and refinement of human PSC-derived kidney organoid differentiation with single-cell transcriptomics. Cell Stem Cell 23, 869–881 (2018).
CAS Google Scholar
Stewart, B. J. et al. Spatiotemporal immune zonation of the human kidney. Science 365, 1461–1466 (2019).
CAS PubMed PubMed Central Google Scholar
Park, J. et al. Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease. Science 360, 758–763 (2018).
CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. Single-cell transcriptome analysis reveals differential nutrient absorption functions in human intestine. J. Exp. Med. 217, e20191130 (2020).
PubMed Google Scholar
Hansen, J. et al. A reference tissue atlas for the human kidney. Sci. Adv. 8, eabn4965 (2022).
CAS PubMed PubMed Central Google Scholar
Cheng, Y. et al. Rare genetic variants affecting urine metabolite levels link population variation to inborn errors of metabolism. Nat. Commun. 12, 964 (2021).
CAS PubMed PubMed Central Google Scholar
Stuart, T. et al. Comprehensive integration of single-cell data. Cell 177, 1888–1902 (2019).
CAS PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
PubMed PubMed Central Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
CAS PubMed PubMed Central Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
CAS PubMed PubMed Central Google Scholar
Uhlen, M. et al. Proteomics. Tissue-based map of the human proteome. Science 347, 1260419 (2015).
PubMed Google Scholar
Karlsson, M. et al. A single-cell type transcriptomics map of human tissues. Sci. Adv. 7, eabh2169 (2021).
Whirl-Carrillo, M. et al. An evidence-based framework for evaluating pharmacogenomics knowledge for personalized medicine. Clin. Pharmacol. Ther. 110, 563–572 (2021).
PubMed PubMed Central Google Scholar
Relling, M. V. et al. The Clinical Pharmacogenetics Implementation Consortium: 10 years later. Clin. Pharmacol. Ther. 107, 171–175 (2020).
PubMed Google Scholar
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B Stat. Methodol. 57, 289–300 (1995).
Motenko, H., Neuhauser, S. B., O’Keefe, M. & Richardson, J. E. MouseMine: a new data warehouse for MGI. Mamm. Genome 26, 325–330 (2015).
CAS PubMed PubMed Central Google Scholar
Durinck, S., Spellman, P. T., Birney, E. & Huber, W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat. Protoc. 4, 1184–1191 (2009).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The work of Y.L. was supported by grant KO 3598/4-2 (to A.K.). The work of P. Schlosser was supported by the German Research Foundation (DFG) project ID 1050086601 (SCHL 2292/2-1, Walter Benjamin Fellowship) and the EQUIP Program for Medical Scientists, Faculty of Medicine, University of Freiburg. The work of S.H., M.W., P. Sekula, M.K., M.S. and A.K. was supported by the DFG project ID 431984000 (SFB 1453). The work of F. Kotsis and U.T.S. was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the e:Med research and funding concept (grant 01ZX1912B). The work of A.K., M.K. and M.S. was supported by Germany’s Excellence Strategy (CIBSS, EXC-2189, project ID 390939984). The work of M.S. was supported by the ERC starting grant TREATCilia, grant agreement no. 716344. The work of P. Sekula was supported by the DFG SE 2407/3-1. The work of Y.C. was supported by the DFG SFB 1479 (project ID 441891347-S1). The work of M.E.G. and J.C. was supported by grant R01DK124399. Genotyping and urine metabolomics were supported by Bayer Pharma. This project has received funding from the Innovative Medicines Initiative 2 Joint Undertaking (JU) under grant agreement no. 115974. The JU receives support from the European Union’s Horizon 2020 research and innovation program and the EFPIA and the JDRF. Any dissemination of results reflects only the authors’ view; the JU is not responsible for any use that may be made of the information it contains. The GCKD study was and is supported by the BMBF (FKZ 01ER 0804, 01ER 0818, 01ER 0819, 01ER 0820 and 01ER 0821) and the KfH Foundation for Preventive Medicine. Unregistered grants to support the study were provided by corporate sponsors (listed at https://gckd.org). We are grateful for the willingness of the patients to participate in the GCKD study. The enormous effort of the study personnel of the various regional centers is highly appreciated. We thank the large number of nephrologists who provide routine care for the patients and collaborate with the GCKD study. The GCKD investigators are listed in the Supplementary Note. The ARIC study has been funded in whole or in part with federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services, under contract nos. 75N92022D00001, 75N92022D00002, 75N92022D00003, 75N92022D00004 and 75N92022D00005. Funding was also supported by R01HL087641 and R01HL086694, National Human Genome Research Institute contract U01HG004402 and National Institutes of Health contract HHSN268200625226C. Infrastructure was partly supported by grant no. UL1RR025005, a component of the National Institutes of Health Roadmap for Medical Research. We thank the staff and participants of the ARIC study for their important contributions. The metabolite data at ARIC visit 5 were supported by R01 HL141824. We thank M. Meier for her support with data transfer.

Author information

These authors contributed equally: Pascal Schlosser, Nora Scherer.

Authors and Affiliations

Institute of Genetic Epidemiology, Faculty of Medicine and Medical Center - University of Freiburg, Freiburg, Germany
Pascal Schlosser, Nora Scherer, Franziska Grundner-Culemann, Sara Monteiro-Martins, Stefan Haug, Inga Steinbrenner, Burulça Uluvar, Matthias Wuttke, Yurong Cheng, Fruzsina Kotsis, Ulla T. Schultheiss, Peggy Sekula, Yong Li & Anna Köttgen
Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA
Pascal Schlosser, Josef Coresh & Anna Köttgen
Spemann Graduate School of Biology and Medicine, University of Freiburg, Freiburg, Germany
Nora Scherer
Institute of Human Genetics, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen–Nürnberg, Erlangen, Germany
Arif B. Ekici
Membrane Transport Discovery Lab, Department of Nephrology and Hypertension and Department of Biomedical Research, University of Bern, Bern, Switzerland
Gergely Gyimesi & Matthias A. Hediger
Metabolon, Inc., Morrisville, NC, USA
Edward D. Karoly & Robert P. Mohney
Department of Medicine IV—Nephrology and Primary Care, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany
Fruzsina Kotsis, Michael Köttgen & Ulla T. Schultheiss
Research and Early Development, Pharmaceuticals Division, Bayer AG, Wuppertal, Germany
Johanna Mielke
Department of Clinical Sciences in Malmö, Lund University Diabetes Centre, Lund University, Lund, Sweden
Maria F. Gomez
Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX, USA
Bing Yu & Eric Boerwinkle
New York University Grossman School of Medicine, New York, NY, USA
Morgan E. Grams
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Eric Boerwinkle
Centre for Integrative Biological Signalling Studies (CIBSS), Albert-Ludwigs-University Freiburg, Freiburg, Germany
Michael Köttgen, Miriam Schmidts & Anna Köttgen
Institute of Genetic Epidemiology, Department of Genetics, Medical University of Innsbruck, Innsbruck, Austria
Florian Kronenberg
Department of Nephrology and Hypertension, University Hospital Erlangen, Friedrich-Alexander-Universität Erlangen–Nürnberg, Erlangen, Germany
Heike Meiselbach & Kai-Uwe Eckardt
Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
Shreeram Akilesh
Freiburg University Faculty of Medicine, Center for Pediatrics and Adolescent Medicine, University Hospital Freiburg, Freiburg, Germany
Miriam Schmidts
Department of Nephrology and Medical Intensive Care, Charité—Universitätsmedizin Berlin, Berlin, Germany
Kai-Uwe Eckardt
Institute of Functional Genomics, University of Regensburg, Regensburg, Germany
Peter J. Oefner

Authors

Pascal Schlosser
View author publications
You can also search for this author in PubMed Google Scholar
Nora Scherer
View author publications
You can also search for this author in PubMed Google Scholar
Franziska Grundner-Culemann
View author publications
You can also search for this author in PubMed Google Scholar
Sara Monteiro-Martins
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Haug
View author publications
You can also search for this author in PubMed Google Scholar
Inga Steinbrenner
View author publications
You can also search for this author in PubMed Google Scholar
Burulça Uluvar
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Wuttke
View author publications
You can also search for this author in PubMed Google Scholar
Yurong Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Arif B. Ekici
View author publications
You can also search for this author in PubMed Google Scholar
Gergely Gyimesi
View author publications
You can also search for this author in PubMed Google Scholar
Edward D. Karoly
View author publications
You can also search for this author in PubMed Google Scholar
Fruzsina Kotsis
View author publications
You can also search for this author in PubMed Google Scholar
Johanna Mielke
View author publications
You can also search for this author in PubMed Google Scholar
Maria F. Gomez
View author publications
You can also search for this author in PubMed Google Scholar
Bing Yu
View author publications
You can also search for this author in PubMed Google Scholar
Morgan E. Grams
View author publications
You can also search for this author in PubMed Google Scholar
Josef Coresh
View author publications
You can also search for this author in PubMed Google Scholar
Eric Boerwinkle
View author publications
You can also search for this author in PubMed Google Scholar
Michael Köttgen
View author publications
You can also search for this author in PubMed Google Scholar
Florian Kronenberg
View author publications
You can also search for this author in PubMed Google Scholar
Heike Meiselbach
View author publications
You can also search for this author in PubMed Google Scholar
Robert P. Mohney
View author publications
You can also search for this author in PubMed Google Scholar
Shreeram Akilesh
View author publications
You can also search for this author in PubMed Google Scholar
Miriam Schmidts
View author publications
You can also search for this author in PubMed Google Scholar
Matthias A. Hediger
View author publications
You can also search for this author in PubMed Google Scholar
Ulla T. Schultheiss
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Uwe Eckardt
View author publications
You can also search for this author in PubMed Google Scholar
Peter J. Oefner
View author publications
You can also search for this author in PubMed Google Scholar
Peggy Sekula
View author publications
You can also search for this author in PubMed Google Scholar
Yong Li
View author publications
You can also search for this author in PubMed Google Scholar
Anna Köttgen
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

GCKD Investigators

Fruzsina Kotsis
, Florian Kronenberg
, Heike Meiselbach
, Ulla T. Schultheiss
, Kai-Uwe Eckardt
, Peter J. Oefner
& Anna Köttgen

Contributions

Design of this study: A.K., P. Schlosser. Recruitment for and management of the study: K.-U.E., A.K., F. Kotsis, H.M., U.T.S., B.Y., M.E.G., J.C., E.B. Genotyping: A.B.E., E.B. Metabolite quantification: N.S., P. Schlosser, E.D.K., R.P.M. Bioinformatics and statistical analysis: A.K., B.G., F.G.-C., I.S., M.W., N.S., P. Schlosser, P. Sekula, S.A., S.H., S.M.-M., Y.L. Interpretation of results: A.K., M.K., N.S., P. Schlosser, P. Sekula, S.A., S.H., Y.C., G.G., E.D.K., F. Kronenberg, M.S., M.A.H., P.J.O. Wrote the manuscript: A.K., N.S., P. Schlosser. Critically read and approved the manuscript: P. Schlosser, N.S., F.G.-C., S.H., S.M.-M., I.S., B.G., M.W., Y.C., A.B.E., G.G., E.D.K., F. Kotsis, J.M., M.F.G., M.K., F. Kronenberg, H.M., R.P.M., S.A., M.S., M.A.H., P. Sekula, U.T.S., K.-U.E., P.J.O., Y.L., A.K., B.Y., M.E.G., J.C., E.B.

Corresponding authors

Correspondence to Pascal Schlosser or Anna Köttgen.

Ethics declarations

Competing interests

R.P.M. and E.D.K. are employees of Metabolon and, as such, have affiliations or financial involvement with Metabolon. J.M. is an employee of Bayer. All other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Nasa Sinnott-Armstrong and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Evaluation of genetic associations of plasma mQTLs from CKD patients in a multi-ethnic, population-based sample.

Each point represents the index SNP of one of 459 (EA) and 430 (AA) associations that could be matched between the Metabolon platforms of the GCKD and ARIC studies (see Supplementary Table 6). Data are presented as effect size estimate +/- 1.96x standard errors in each study and the dot size is proportional to the two-sided -log₁₀(P-value) in GCKD (N_GCKD = 4960, N_{ARIC EA} = 3603, N_{ARIC AA} = 818).

Extended Data Fig. 2 Comparison of the heritability for 184 matched plasma and urine metabolites with at least one mQTL.

The positive correlation between the estimated heritabilities for a given metabolite’s plasma and urine levels is consistent with the metabolites’ filtration from plasma to urine, without substantial additional genetic influences on their tubular handling. The blue line is the linear regression line and the gray shaded area represents the 95%-confidence interval. Differences in estimated heritability for plasma and urine (instances with >25% are labeled with the associated metabolite and most likely gene; error bars represent h² variance) can contain interesting biological information: for example, three metabolites with larger estimated heritabilities in urine than in plasma are N-acetylated amino acids, all of which have an mQTL at NAT8. NAT8 is highly and selectively expressed in the kidney, where the encoded enzyme N-acetylates molecules to make them water soluble for subsequent excretion.

Extended Data Fig. 3 Post-hoc power analyses for plasma and urine mQTLs by metabolite super-pathway.

Power analyses are based on a sample size of 5,000, the genome-wide statistical significance thresholds used in our study, and are conducted across a range of minor allele frequencies. For each matrix-super-pathway subgroup, the median observed effect size across mQTLs as well as the median standard deviation of the metabolites with an mQTL within the group were used.

Extended Data Fig. 4 Comparison of direction of genetic associations and explained variance at inter-matrix mQTLs.

Comparison of effect sizes and explained variance for colocalization signals for mQTLs detected for the same metabolite in both plasma and urine (N = 204; only the 99 mQTLs for which the explained variance in metabolite levels in at least one of both matrices is >3% are shown). The two inner bands represent the effect size of the mQTL in plasma (framed in red) and urine (framed in blue). Shades of orange indicate positive effect sizes, shades of aquamarine negative ones. The two outer bands represent the variance in metabolite levels in plasma and urine explained by the index SNP of the corresponding mQTL, where a darker shade of green corresponds to a greater explained variance.

Extended Data Fig. 5 Colocalization of mQTLs with selected clinical markers of kidney and liver function.

The mQTLs are represented by the implicated genes on the rows and the colocalized clinical markers are on the columns. Liver function markers include alanine aminotransferase (ALT), aspartate aminotransferase (AST), gamma glutamyltransferase (GGT), albumin and bilirubin. Kidney function markers include eGFRcrea, eGFRcys and urea. The size of pie represents the total number of colocalizations grouped into four categories. The slices in each pie colored in red and blue represent the proportion of colocalizations of plasma and urine mQTLs with the respective clinical markers.

Extended Data Fig. 6 Proportion of metabolite variance explained by eGFR.

The proportion of a metabolite’s variance explained by eGFR is represented on the x-axis. All metabolites quantified from plasma and urine are shown along the y-axis, ordered by the maximum variance explained across plasma (red color) and urine (blue color). The metabolite with the largest amount of variance explained by eGFR was plasma creatinine.

Extended Data Fig. 7 Enrichment of mQTL-related genes among GO terms, KEGG pathways, tissues, and cell types.

(a) Similarities and differences between terms and pathways enriched for genes identified by all plasma vs. all urine mQTLs; (b) mQTLs exclusively identified in plasma and urine; (c) between tissues enriched for genes identified by all plasma vs. all urine mQTLs, and (d) between cell types enriched for genes identified by all plasma vs. all urine mQTLs. Terms significantly (adjusted P-value < 0.05) enriched for genes identified by mQTLs from only one matrix are colored in red and blue respectively and terms significantly enriched for genes from both matrices are colored in purple. OR: odds ratio.

Extended Data Fig. 8 Extended view of the SLC10A2 region.

The upper part of the figure shows the same RNA-seq, ATAC-seq, chromatin state and histone ChIP-seq tracks as Fig. 6. The RNA-seq and ATAC-seq tracks show the overlayed signal from tissue of three different donors. The index SNP rs16961281, that is associated with urine glycocholate, is located at the vertical dashed line. The bottom part shows publicly available single nucleus (sn)ATAC-seq data for different kidney cell types, which was derived from primary human kidney samples³⁸. The position of rs16961281 is nearly exclusively accessible in cells of all proximal tubule segments (PT-S1, PT-S2, PT-S3). PTs are the predominant cell type in kidney cortex, underscoring the consistency of the snATAC-seq data and the bulk ATAC-seq data. Other cell types shown include: Endothelial cells (Endo), podocytes (Podo), loop of Henle cells (LOH), distal convoluted tubule cells (DCT), collecting duct principal cells (PC), collecting duct intercalated cells (IC), stroma cells (Stroma), immune cells (Immune), lymph cells (Lymph).

Extended Data Fig. 9 Extended view of the SLC13A3 region.

The upper part of the figure shows the same RNA-seq, ATAC-seq, chromatin state and histone ChIP-seq tracks as Fig. 7. The index SNP rs6124828, that is associated with malate, fumarate, and methylsuccinoylcarnitine in plasma as well as with methylsuccinoylcarnitine in urine is located at the second vertical dashed line from the left. The bottom part shows single nucleus (sn)ATAC-seq data for different kidney cell types, which was derived from primary human kidney samples³⁹. The position of rs6124828 is nearly exclusively accessible in proximal tubule cells (PT). PTs are the predominant cell type in the kidney cortex, underscoring the consistency of the snATAC-seq data and the bulk ATAC-seq data. Other cell types shown include: Endothelial cells (Endo), podocytes (Podo), loop of Henle cells (LOH), distal convoluted tubule cells (DCT), collecting duct principal cells (CDPC), collecting duct intercalated cells (CDIC), immune cells (Immune).

Supplementary information

Supplementary Information

Supplementary Results, Note, Methods and Figs. 1–3

Reporting Summary

Peer Review File

Supplementary Data 1

Regional association plots for mQTL identified in mGWAS of plasma metabolite levels.

Supplementary Data 2

Regional association plots for mQTL identified in mGWAS of urine metabolite levels.

Supplementary Tables 1–23

Supplementary Tables 1–23.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schlosser, P., Scherer, N., Grundner-Culemann, F. et al. Genetic studies of paired metabolomes reveal enzymatic and transport processes at the interface of plasma and urine. Nat Genet 55, 995–1008 (2023). https://doi.org/10.1038/s41588-023-01409-8

Download citation

Received: 23 May 2022
Accepted: 26 April 2023
Published: 05 June 2023
Issue Date: June 2023
DOI: https://doi.org/10.1038/s41588-023-01409-8

This article is cited by

Genetic imputation of kidney transcriptome, proteome and multi-omics illuminates new blood pressure and hypertension targets
- Xiaoguang Xu
- Chachrit Khunsriraksakul
- Maciej Tomaszewski
Nature Communications (2024)
The citrate transporter SLC13A5 as a therapeutic target for kidney disease: evidence from Mendelian randomization to inform drug development
- Dipender Gill
- Loukas Zagkos
- Grit Zahn
BMC Medicine (2023)