Differential and shared genetic effects on kidney function between diabetic and non-diabetic individuals

Reduced glomerular filtration rate (GFR) can progress to kidney failure. Risk factors include genetics and diabetes mellitus (DM), but little is known about their interaction. We conducted genome-wide association meta-analyses for estimated GFR based on serum creatinine (eGFR), separately for individuals with or without DM (nDM = 178,691, nnoDM = 1,296,113). Our genome-wide searches identified (i) seven eGFR loci with significant DM/noDM-difference, (ii) four additional novel loci with suggestive difference and (iii) 28 further novel loci (including CUBN) by allowing for potential difference. GWAS on eGFR among DM individuals identified 2 known and 27 potentially responsible loci for diabetic kidney disease. Gene prioritization highlighted 18 genes that may inform reno-protective drug development. We highlight the existence of DM-only and noDM-only effects, which can inform about the target group, if respective genes are advanced as drug targets. Largely shared effects suggest that most drug interventions to alter eGFR should be effective in DM and noDM.

I mpaired kidney function can progress to kidney failure requiring dialysis or transplantation, which implicates high early mortality 1 and public health burden 2 . The most common kidney function measure is the glomerular filtration rate (GFR). Chronic kidney disease (CKD) can be defined as GFR < 60 ml/ min per 1.73 m 2 , and has a prevalence of~10% in the population 3 . In large-scale epidemiological studies, GFR can be estimated from serum creatinine (eGFR) using the CKD-Epi formula 4 .
Diabetes mellitus (DM) is a major risk factor for impaired kidney function 5 and is present in 30% of individuals starting dialysis in Europe 6 . Most current pharmacotherapy in DM has substantial side effects [7][8][9][10] . Recent work showed that SGLT2 inhibitors and GLP1R are reno-protective for individuals with DM 11,12 and early evidence indicates that this might be also true for individuals without DM (noDM) 13 . The biological processes related to lower kidney function might differ between individuals with DM and individuals without DM. Understanding the mechanisms of kidney function variability within and between these groups is pivotal for understanding pathogenesis. Genomewide association studies (GWAS) have pinpointed hundreds of loci for eGFR [14][15][16] . These loci help understand kidney function variability in the overall population. However, commonalities and differences in genetic kidney function effects between DM and noDM individuals are not well understood.
Genes underneath GWAS loci have been shown to double the success rate in drug development pipelines 17,18 . This renders each gene in eGFR loci a drug target to alter kidney function, particularly when the gene maps to association-driving variants that are relevant for the protein or for the gene's expression in kidney tissue 19 . This opens up a clinically important question in the context of differential or shared genetic kidney function effects between DM and noDM individuals: when a locus identified for eGFR in the overall population turns out to have no effect in DM individuals, the underlying causal gene might not be informative for reno-protection in DM individuals. This is not an unlikely scenario, since only 10% of individuals in previous populationbased GWAS on eGFR overall had DM 16,20 .
Previous DM-/noDM-stratified searches for genetic eGFReffects have been limited 21 and without identification of a significant DM/noDM-difference. It is unclear whether this was due to limited power or lack of such differential effects. GWAS for eGFR-based kidney function outcomes in DM, used as one approach to identify the genetics of Diabetic Kidney Disease (DKD), identified four loci in~40,000 type 1 or type 2 DM patients 22 and four other loci in~19,500 type 1 DM patients 23 . When genetic effects on eGFR in DM versus noDM have little overlap, such focused searches will remain the best option. For shared genetics, GWAS on eGFR overall will have much better power.
The quest for understanding commonalities and differences of genetic eGFR-effects between DM and noDM has two aspects: a search for eGFR-associated loci with DM/noDM-difference will help understand whether such differential effects exist and a search for SNP-association on eGFR allowing for DM/noDMdifference may detect novel eGFR loci. Statistically, DM/noDMdifference in eGFR association and genetics-by-DM-status interaction on eGFR association are equivalent. There are various approaches to do so 24 . These approaches have been applied successfully for other phenotypes [25][26][27][28][29] , but not yet for GWAS interaction analyses for kidney function.
We thus set out to search for eGFR loci with DM/noDMdifference and for novel eGFR loci allowing for difference. For this, we gathered GWAS data on eGFR separately for 178,691 DM and 1,296,113 noDM individuals. We prioritized genes underneath identified loci using in-silico functional evidence. We also evaluated the impact of DM-/noDM-specific weights on the genetic risk score (GRS) for eGFR in data independent from the variant-and weight-identifying step 30 .

Results
Overview of the GWAS meta-analyses. We first analyzed GWAS data on 7,046,926 single nucleotide polymorphisms (SNPs) and their association with logarithm-transformed eGFR separately for 109,993 DM (type 2 or type 1) and 1,070,999 noDM individuals (stage 1, including 72 studies from CKDGen Consortium 31 and UK Biobank, UKB 32 ; mostly European-ancestry; Supplementary Data 1, Methods, Fig. 1a). These DM-/noDM-stratified GWAS summary statistics allowed us to apply the difference test, the joint test and the stratified tests (Methods). Based on these tests, we searched for eGFR-associated loci with DM/noDM-difference and for novel eGFR-associated loci allowing for DM/noDMdifference.
We sought replication of identified loci in independent DM-/ noDM-stratified data (stage 2, n DM = 68,698, n noDM = 225,114; Million Veterans Program, MVP 33 , Michigan Genomics Initiative, MGI 34 , Trøndelag health study, HUNT 35 ; all European-ancestry; Supplementary Data 1, Methods). While the discovery+replication design augments confidence in identified loci, this comes at the cost of lower power 36 . We also conducted GWAS searches in a combined stage design (stage 1 + 2: n DM = 178,691, n noDM = 1,296,113), to yield significant loci exploiting the full sample size, yet without independent replication.
The two stages of data were also used to separate the variant identification and weight quantification for the GRS (stage 1 data) from the GRS association analyses (stage 2 data). The analysis workflow is shown in Fig. 1.
Seven eGFR loci identified with DM-/noDM-differential effects. To search for eGFR loci with DM/noDM-difference, we applied two approaches 24,37 (Fig. 1b): (i) a genome-wide difference test (P Diff < 5 × 10 −8 , difference test approach), and (ii) a search for genome-wide significant association with overall eGFR followed by a difference test in the same data (P Overall < 5 × 10 −8 and P Diff < 0.05/k, k = number of followed SNPs, overall+difference test approach, Methods).
Approximate conditional analyses using the difference test (combined stage 1 + 2, European only, Methods) did not identify any further independent variants within the 7 identified difference loci (regional difference plots in Supplementary Fig. 4). Interestingly, there was also no further independent variant with DM/noDM-difference in the UMOD-PDILT locus besides rs77924715, despite the multiple independent variants known for eGFR association overall 16,20 (Supplementary Fig. 4a, b). Sensitivity analyses confirmed that DM/noDM-difference was also observed without log-transformation of eGFR and also after adjusting for interaction with age or other eGFR risk factors (Supplementary Note 2, Supplementary Data 4, Supplementary Table 1).
With respect to DM-/noDM specificity and magnitude of effects, we found the following (Fig. 2b): (i) six of the seven variants identified with differential eGFR-effects showed more pronounced effects in DM versus noDM (near UMOD-PDILT, TPPP, DCDC5, NRIP1, and SLC22A2) or a DM-only effect on eGFR (near CSRNP1) and (ii) one variant showed a noDM-only effect (near MED1-NEUROD2). These patterns were consistent in stage 1 and stage 2 separately ( Supplementary Fig. 5). Of note, with the combined stage 1 + 2 sample size, the difference detectable for a common variant of 30% minor allele frequency at 80% power was 0.0050 or 0.0037 log(ml/min/1.73 m²) for a DM-only or noDM-only effect, respectively (Supplementary Note 3, Supplementary Fig. 6).
In summary, we established seven eGFR loci with DM/noDMdifference (near UMOD-PDILT, TPPP, MED1-NEUROD2, CSRNP1, DCDC5, NRIP1, and SLC22A2) including one with DM-only (CSRNP1) and one with noDM-only effect (MED1-NEUROD2). While all seven difference loci were known loci for eGFR 16 , these are the first eGFR loci reported with significant DM/noDM-difference from a genome-wide search for the difference to the best of our knowledge.
Allowing for difference identifies 32 novel eGFR-associated loci including CUBN. To search for novel eGFR-associated loci allowing for the difference between DM and noDM, we applied two approaches (Fig. 1c, Methods): (i) a genome-wide joint test 38 , which considers the main SNP effect on eGFR and the SNP-by-DM-status interaction effect jointly (joint-test approach) and (ii) GWAS separately in DM or noDM (stratified tests approach).
Approximate conditional analyses (combined stage, European only, Methods) identified two additional variants independently associated with eGFR at two joint-test loci (near SERTAD2 and PIK3CG, P Joint_cond < 5 × 10 −8 , Supplementary Table 2). This raised the total number of newly identified eGFR-associated variants to 34.
Among the 34 novel eGFR variants, four showed a suggestive DM/noDM-difference for eGFR (P Diff < 0.0015 = 0.05/34, Table 2, Fig. 3b): one was a DM-only effect (near SH3BP4) and three showed more pronounced effects in DM (near LOXL4, ALPL, and PIK3CG). These patterns were consistent also in stage 1 and stage 2 separately ( Supplementary Fig. 5). Of note, the difference test is mathematically dependent on the joint test and the stratified tests, but not on the overall test 24 . Therefore, the observed difference in the overall+difference test approach applied to the same data is an established significant difference, while the observed difference in loci established by the joint or stratified tests is termed here as suggestive difference.
In summary, the joint test and stratified tests identified 34 genome-wide significant independent eGFR variants across 32 novel eGFR loci. These included four loci with suggestive DM/ noDM-difference, which yielded a total of 11 eGFR loci with established or suggestive DM/noDM-difference.
Interaction with Hba1C and overlap with associations for DM risk, glycemic and other traits. We explored the lead variants of the 11 eGFR loci identified with DM/noDM-difference (i.e., established/suggestive SNP-by-DM interaction) for SNP-by-HbA1c interaction in UK Biobank (n = 368,005), making full use of the continuous variable HbA1c instead of binary DM status. For all 11 variants, the SNP main effects and SNP-by-HbA1c interaction effect sizes were directionally consistent with main and interaction effects sizes in the SNP-by-DM interaction analysis (Supplementary Data 7, Methods). This underscored again the negative interaction effect for the 10 of the 11 variants with larger effects in DM or DM-only effects, while the one variant (near MED1/NEUROD2) showed a negative interaction effect for the SNP-by-DM as well as the SNP-by-HbA1c interaction.
We were also interested in whether any of the 11 loci with DM/ noDM-difference overlapped with known genome-wide significant loci for type 1 DM 42 , type 2 DM 19 , or glycemic traits 43 . None of the 11 loci overlapped with type 1 DM, type 2 DM, or glycemic traits loci (all 11 lead variants P > 5.0 × 10 −8 in published GWAS for DM risk, glucose, or insulin levels, Supplementary Data 8). None of the 11 variants was associated with type 1 DM judged at Bonferronicorrected significance (P > 0.05/11 = 4.5 × 10 −3 ). Three of the 11 variants were associated with type 2 DM (P < 0.05/11 = 4.5 × 10 −3 , rs77924615 near UMOD-PDILT, rs55722796 near MED1-NEU-ROD2) or fasting glucose (rs963837 near DCDC5). For all three variants, the eGFR-lowering allele was associated with decreased type 2 DM risk or fasting glucose. Thus, the SNP effect on eGFR cannot be fully explained by DM status, which would have yielded a DM risk increasing or glucose-increasing effect by the eGFRlowering allele. The observation is in line with a pleiotropic effect on DM and eGFR, but with adverse effects for one of the two (eGFR or DM/glucose) and a beneficial effect for the other, which should be considered in drug design when applicable. When taking-intoaccount the DM/noDM-stratum where the eGFR effect was more pronounced, the three variants consisted of (i) two variants with stronger eGFR-effects in DM (still significant eGFR effect in noDM, near UMOD/PDILT and DCDC5) and (ii) one variant with a noDM-only effect on eGFR (near MED1-NEUROD2).
We also queried the Open Targets Genetics database 44 for associations of the 11 variants with other traits. We found 126 genome-wide significant associations (P < 5 × 10 −8 , Supplementary Data 9), particularly for hypertension and blood counts: for 3 of the 11 variants, the eGFR-decreasing alleles were associated with increased risk of hypertension (near UMOD/PDILT, DCDC5, and PIK3CG). The variant rs55722796 near MED1/ NEUROD2, solely associated with eGFR in noDM individuals, was not associated with hypertension, but the eGFR-decreasing allele (in noDM) was associated with decreased blood counts (i.e., decreased red blood cells, hemoglobin, and hematocrit).
GWAS for eGFR in individuals with DM identified 29 genomewide significant loci. Previous searches for SNP-effects on eGFRbased kidney function outcome in DM patients had identified eight genome-wide significant loci (i.e., eGFR-based DKD loci; four identified in type 1 DM patients 23 , four other in type 1 or type 2 DM patients 22 ). We wanted to understand the overlap of these with the genome-wide significant loci in our DM-only GWAS on eGFR (n DM = 178,691, combined stage, Methods): we observed 29 genome-wide significant eGFR loci in DM (P DM < 5 × 10 −8 , Fig. 5, Supplementary Data 10). These included: (i) 27 novel eGFR loci in DM, and (ii) two of the four previously identified DKD loci among type 1 or type 2 DM patients 22 (near UMOD/PDILT and PRKAG2; r 2 = 0.60 and 0.94, respectively, between eGFR and DKD lead variants). None of the previously identified DKD loci among type 1 DM patients 23 were identified. Two of the 27 novel eGFR loci in DM were detected here as completely novel eGFR loci (near SH3BP4 and LOXL4; also detected by the joint test). The 29 identified eGFR loci in DM included 6 of our 11 difference loci (near UMOD/PDILT, TPPP, DCDC5, SH3BP4, NRIP1, and LOXL4).
In silico evidence prioritized 18 genes by protein-or expression-altering variants or as human kidney disease monogene. Genes at GWAS loci for eGFR might pinpoint relevant drug targets for kidney function 18 . Dissecting eGFR loci as DM-only, noDM-only, or shared will help define the target population for potential therapy. Gene PrioritiSation (GPS) was conducted previously 16 for all genes underneath the 424 eGFRassociated loci in a meta-analysis of CKDGen and UKB (i.e., our stage 1 studies) focused on European-ancestry. Since all 7 difference loci were among these 424 loci, we used this established GPS tool to extract in-silico functional evidence for the 159 genes Seven eGFR loci with differential effects by diabetes status. We searched for DM/noDM-differential genetic associations on eGFR using the difference test approach and the overall+difference approach in combined stage 1 + 2 (CKDGen, UKB, MVP, MGI, and HUNT; n DM = 178,691; total n noDM = 1,296,113). Seven difference loci were identified. a Shown are difference test P-values over chromosomal base position (Manhattan plot) highlighting the six loci identified by the difference test approach (red, P Diff < 5 × 10 −8 ) and the one locus identified by the overall+difference test approach (orange, 610 variants 16 with stage 1 P Overall < 5 × 10 −8 , P Diff < 0.05/610 = 8.2 × 10 −5 ). Loci are annotated by the name(s) of the nearest gene(s); asterix indicates loci that were also identified by the discovery+replication design (Table 1, Supplementary Fig. 1). b Shown is a comparison of DM-/noDM-specific eGFR-effect sizes for the seven identified difference lead variants based on combined stage 1 + 2 data. Effect sizes are aligned to the eGFR-decreasing alleles in noDM except for CSRNP1 (aligned to eGFR-decreasing allele in DM). Error bars reflect 95% confidence intervals of the estimated genetic effect.   Fig. 3 Accounting for potential DM-/noDM-differences identified 32 novel eGFR loci. We searched for novel loci associated with eGFR allowing for DM-/noDM-difference using the joint test or DM-/noDM-stratified tests approaches in combined stage 1 + 2 (n DM = 178,691; total n noDM = 1,296,113). We found 32 novel genome-wide significant eGFR loci (P < 5 × 10 −8 , >500 kB distant of known eGFR loci compared to previous work 16,20  Loci were annotated by nearest genes if P Diff < 0.10 or if they were also identified by the discovery+replication design (the latter also indicated by asterix, Table 2). b Shown is a comparison of DM-/noDM-specific eGFR-effect sizes for the 32 novel eGFR locus lead variants. Highlighted in red are the locus names of loci with suggestive DM/noDM-difference (P Diff < 0.05/34; corrected for 34 independent variants across 32 loci). Effect sizes are aligned to the eGFR-decreasing alleles in noDM. Error bars reflect 95% confidence intervals of the estimated genetic effect. Fig. 4 The variants associated with eGFR in the CUBN locus differ from those associated with urinary albumin-to-creatinine ratio. Shown are P-values for associations at the wider (top) and more narrow (bottom) CUBN locus region for a eGFR (joint test P-values, n DM = 178,691 and n noDM = 1,296,113) and b urinary albumin-to-creatinine ratio (UACR; P-values from ref. 39 45 , i.e., CADD score ≥15) or (ii) for expression in kidney tissue (eQTL, false-discovery rate ≥5%) 46,47 or (iii) when the gene was a known kidney disease monogene in human (Methods). We found 6 prioritized genes (Fig. 6, Supplementary  Fig. 8): (i) PGAP3 mapping to the locus with noDM-only effect on eGFR (near MED1/NEUROD2) and (ii) five genes in loci with more pronounced eGFR-effects in DM but direction-consistent effects in noDM (TPPP, UMOD, SLC6A19, NRIP1, and SLC22A2).
We also used the same approach as in the previous GPS tool 16 to prioritize the genes underneath the 32 novel eGFR loci (34 independent signals, Supplementary Data 11-15, Supplementary  Supplementary Fig. 9): (i) among the 40 genes in the four novel loci with suggestive DM/noDM-difference, 2 genes were human kidney disease monogenes (HPS1 and HPSE2, Fig. 7a); (ii) for the 341 genes in the other 28 novel eGFR loci (i.e., noDM/ noDM-difference), 10 genes were prioritized (Fig. 7b): 6 genes contained an association-driving variant that was proteindeleterious (AUTS2, CUBN, DVL2, RASSF6, SLC2A4, and ZFP36L1), one gene mapped to an eQTL in glomerular tissue (TNIK), and 3 additional genes were human kidney disease monogenes (SLC2A2, SLC30A9, and SLC7A7). Particularly interesting was an association-driving variant, rs1801232, in CUBN that was a missense variant (r² = 0.73 to eGFR lead variant rs11254238). The missense variant rs17804499 in RASSF6 had a particularly high probability of being the association-driving variant (posterior probability of association = 86%), which rendered this variant and gene a compelling candidate for  178,691). This DM-specific analysis identified 29 independent eGFR-associated loci in DM. Compared to known DKD loci 22,23 (i.e., association with eGFR or CKD in type 1 and/or type 2 DM individuals) and known overall eGFR loci 16,20 , 2 loci are novel for eGFR overall and novel for DKD (red), 24 are novel for DKD but known for eGFR (orange), and 3 are known DKD and known eGFR loci (purple).  (Table 1, Supplementary Fig. 8). We highlighted six genes based on association-driving variants (PPA > 5%) that were deleteriously protein-relevant or expression-modulating, genes that were known as human kidney monogenes (OMIM or ref. 78 with subsequent manual curation) and in addition SLC22A2 due to its known link to metformin response. functional follow-up. Two genes, SLC2A2 and SLC2A4, encode the glucose transporters GLUT2 and GLUT4 and reside in two novel eGFR loci.
In summary, across the identified seven loci identified with established or suggestive DM-/noDM-difference in eGFR-effects and the 32 novel eGFR loci, we prioritized 18 genes with evidence of protein-deleterious or expression-modulating variants or being human kidney disease monogenes. These genes are compelling candidates for functional follow-up and drug development pipelines.
DM/noDM-stratified weights for GRS show similar explained variance as overall weights. We assessed whether the GRS for eGFR, computed as an effect size weighted sum of eGFR-lowering alleles, could be improved by using DM/noDM-specific eGFR-effects as weights rather than overall eGFR-effects. We built the GRS based on the previously identified 634 eGFR variants 16 (i.e., identified in stage 1 studies) or based on the seven variants with established DM/noDM-difference using different weighting schemes (overall effects, DM-/noDM-specific effects, or a combination; all weights from stage 1 data). We computed the GRS for each individual in the stage 2 study HUNT (unrelated, European-ancestry; n DM = 3023, n noDM = 25,484, Fig. 1d, Methods).
Across all weighting schemes, we found no significant DM/ noDM-difference in the 634-variant GRS association with eGFR (P Diff > 0.05, Table 3). The 634-variant GRS explained more of the eGFR variance in noDM compared to DM (e.g., R 2 = 6.0% vs. 4.0% using overall-effect weights). Since the absolute 634-variant GRS effect was similar in DM and noDM (e.g., beta per sd GRS = −2.54 versus −2.84 ml/min/1.73 m 2 using overall-effect weights), this larger relative GRS effect in noDM can be attributed  Table 2, Supplementary  Fig. 9): a for the four novel eGFR loci with suggestive difference, and b for the 28 other novel eGFR loci. We highlighted 12 genes based on associationdriving variants (PPA > 5%) that were deleteriously protein-relevant or expression-modulating or genes that were known as human kidney monogenes (OMIM or ref. 78 with subsequent manual curation).
-at least in part-to the smaller eGFR variance in noDM versus DM (HUNT standard deviation = 11.8 versus 13.4 ml/min/ 1.73 m 2 , respectively). Yet, the 7-variant GRS explained more of the eGFR variance in DM compared to noDM (e.g., R 2 = 0.98% vs. 0.62% using overall-effect weights). The reason for this is that six of the seven variants had larger eGFR-effects in DM than in noDM, which accumulated to a larger absolute GRS effect in DM (e.g., beta per sd GRS = −1.02 in DM compared to −0.68 in noDM using overall-effect weights).
When comparing different weighting schemes, we found no notable improvement in R 2 values by using DM-/noDM-specific weights compared to overall weights (Table 3). In the 634-variant GRS, using DM-specific weights even reduced R 2 in DM individuals (R² = 3.1% versus 4.0%). This may be attributable to the larger uncertainty in DM-specific weights estimated from the meta-analysis restricted to DM individuals. For the 7-variant GRS, DM-specific weights slightly increased R 2 in DM individuals, but not markedly (R 2 = 1.0% versus 0.98%). These findings underscore a similar performance of overall weights and DM-/ noDM-specific weights when building a GRS for eGFR. The comparison of the 634-variant with the 7-variant GRS suggests that more differential effect loci and higher precision in DMspecific weights might substantially improve the genetically explained variance also in DM individuals.

Discussion
In this GWAS on eGFR in~180,000 individuals with DM and 1.3 million individuals without DM, we established seven loci with significant DM/noDM-differential effects on eGFR near UMOD/PDILT, TPPP, MED1/NEUROD2, CSRNP1, DCDC5, NRIP1, and SLC22A2. We also identified 32 novel eGFRassociated loci when allowing for potential difference, including four loci with a suggestive difference near SH3BP4, LOXL4, ALPL, and PIK3CG. The 11 loci with established or suggestive difference included two loci with DM-only effects on eGFR (near CSRNP1, SH3BP4), one with noDM-only effect (near MED1/NEUROD2), and all others showed more pronounced effects in DM compared to noDM. Our GWAS focused on DM individuals confirmed two known DKD loci, but also identified 27 novel loci for eGFR in DM, which make these potential new DKD loci. The DM-/ noDM-stratified GRS analyses showed no improvement in explained trait variance when using DM-/noDM-stratified weights, but the seven identified DM/noDM-differential eGFReffects explained more of the eGFR variance in DM individuals than in noDM individuals.
Our gene prioritization at the 11 loci with DM/noDM-difference and at the 28 other novel eGFR loci highlighted 18 genes: 10 genes mapped to eGFR-association-driving variants that were deleteriously protein-relevant or eQTLs in kidney tissue, which suggests them as potential drug targets to alter eGFR. Further eight genes were prioritized because they were known as human kidney disease monogenes, which made them plausible causal genes for these common variant findings. Particularly interesting was CUBN in a novel eGFR locus. The CUBN locus is the GWAS locus with the strongest effect on UACR 39 and microalbuminuria 40 and identified here for eGFR at genome-wide significance, but respective lead variants for eGFR and UACR are uncorrelated. The lead variant for eGFR, rs11254238, is highly correlated with a variant that alters the encoded protein, cubilin (rs1801232, r 2 = 0.91). We observed a twice as large effect on eGFR in DM compared to noDM in both discovery and replication data, but not statistically significantly different. This is in line with previously observed larger CUBN variants' effects on microalbuminuria 40 and UACR 41,48 in DM, which provides further evidence for an interaction of CUBN with DM status on kidney function. In fact, our previous sequencing of the CUBN gene found rare variants in CUBN that were associated with higher UACR and with better eGFR 49 . Our study now reports such a parallel association between UACR and eGFR now also for a common variant at the genome-wide significance levels. Together, our results provide further support for the importance to assess the physiological role of CUBN by functional studies not only for microalbuminuria, but also for finding pathophysiological explanations related to impaired filtration rate.
The RASSF6 gene has been studied in relation to the kidney as well: RASSF6 mediates apoptosis in various cells 50 . It is shown to be expressed in the slit diaphragm in glomeruli and the apical membranes in proximal renal tubular epithelial cells of rat kidney. The same animal study suggests the involvement of the RASSF6 pathway in contrast-induced nephropathy. Thus, the novel detection of the RASSF6 locus for eGFR here and the lead variant, rs17804499, being protein-altering and with 86% probability causal render RASSF6 a compelling candidate for the functional follow-up to help understand its role in kidney function that is elusive so far. Another interesting gene was PGAP3 in the MED1-NEUROD2 locus, which mapped to an expression-modulating variant that was associated with eGFR in noDM, but not in DM individuals. PGAP3 is a known gene for CKD and eGFR: eQTL analyses showed colocalization of the association signal with PGAP3 expression 51 . PGAP3 knockout mice developed larger glomeruli with deposition of immunoglobulins 52 , although we acknowledge that this does not provide a mechanistic explanation for its association with eGFR. Eventually, if causal genes at eGFR loci with significant differences by DM status will be advanced as drug targets for reno-protective therapies, the knowledge about DMor noDM specificity of the eGFR association can help define the target group for potential subsequent therapy, as DM-only, noDM-only, or both.
The difference in SNP effects between DM and noDM can be interpreted as SNP-by-DM-status interaction effect. To our knowledge, our study is the first genome-wide search for SNP-by-DM-status interaction effects for eGFR. Sensitivity analyses indicated that the observed differences were neither biased from the stratified modeling, nor from log-transformed eGFR, and the SNP-by-DM-status interaction was not explained by SNPinteraction with age, sex, hypertension, or body-mass-index 53 . The seven loci with an established difference are known loci for eGFR 16,20 . To our knowledge, the significant DM/noDM-differential effects for eGFR or eGFR-based CKD were identified here for the first time, except for the UMOD-PDILT locus 21 . At UMOD-PDILT, a well-known kidney function locus 16,20 , the twofold higher eGFR effect in DM compared to noDM confirmed previous observations for eGFR 21,54 and CKD 55 . Very interesting was the TPPP locus, where the three-fold higher eGFR effect in DM now draws attention to a locus that was just one of the hundreds of small effects eGFR loci. The lead variant for the difference, rs434215, modulates TPPP expression in tubulointerstitial tissue 46 , but the role of TPPP in diabetes and kidney disease is yet unknown. Previous studies on TPPP focused on the nervous system: TPPP is highly expressed in the brain and shown to affect neural microtubules 56 .
Our GWAS on eGFR in DM provides a link to the genetics of DKD. Previous GWAS for DKD analyzed eGFR or eGFR-based CKD in type 1 and/or type 2 DM individuals 22,23 . Our GWAS in >178,000 DM individuals was 4-to 7-fold larger. These DM individuals were mostly from population-based studies of adult individuals. As such, the proportion of type 1 DM among DM individuals analyzed here reflects the proportion of type 1 DM of 5-10% among adult DM individuals 57 . Due to the substantially larger number of type 2 DM individuals, the identified DM/noDMdifferential effects on eGFR may mostly reflect differences between type 2 DM versus noDM. A better distinction by type of DM would require more granular data focusing on the distinction between these two major diabetes groups and substantially larger data on type 1 DM. While the use of self-reports or one-time measurements of glucose or HbA1c to define DM here is typical also for GWAS on DM 19,58 , this implies heterogeneity in the DM definition and may include some individuals without clinically manifest DM.
Yet, our 29 loci with genome-wide significant association with eGFR in DM confirmed two of eight previously identified DKD loci 22,23 and the 27 other are compelling new candidate loci for further analyses in studies specialized on DKD patients. A large sample size eGFR GWAS with a more heterogeneous spectrum of DM can be a powerful complementary approach to focused searches in DKD patients 22,23 . Several prioritized genes mapped to these novel eGFR loci in DM: TPPP already mentioned above and SLC6A19, NRIP1, HPS1 as well as HPSE2 with strong monogenic impact on kidney. All these genes reside in loci with more pronounced eGFR-effects in DM compared to noDM, but none of these effects was DM-only.
Our GWAS allowing for DM/noDM-differential effects identified 32 novel eGFR-associated loci compared to previous GWAS 16,20 . Some of our novel loci might be identified due to increased power by~20% increased sample size or due to chance by using alternative statistical tests. However, four of the novel loci showed suggestive DM/noDM-difference, suggesting that their identification was facilitated by using tests allowing for differential effects 59 . The four suggestive difference loci included one locus, near SH3BP4, with eGFR effect only in DM.
In contrast to hundreds of loci found in GWAS for eGFR, the eGFR loci with significant DM/noDM-difference were few. This suggests a largely shared genetics of kidney function between DM and noDM individuals, which has an important implication for drug development: most drug interventions aimed at altering eGFR should thus be effective among persons with and without DM. This is mirroring what is observed for SGLT2 inhibitors-a medication originally developed for individuals with DM that is now also being tested for reno-protection among individuals without DM 13 . While our GWAS here is, to date, the largest for eGFR in DM individuals, we still might have missed loci with effects in DM-only. Power for interaction effects is generally smaller than for overall effects 60 , and particularly reduced when one subgroup, like DM, is substantially smaller (~10%) than the other 28 . Future work with an increased sample size might detect more eGFR loci with differences by DM status, especially those with effects in DM-only. This GWAS included mostly European individuals and our findings require replication in non-European ancestries 61 , particularly because of the large differences in DM prevalence across ancestries 62 .
In summary, our results highlight the existence of DM-and noDM-specific genetic effects on kidney function, but emphasize that the majority of eGFR locus associations do not differ between individuals with and without DM. Larger DM-/noDM-stratified data on eGFR in the future will improve the detectability of differential effect loci and the precision of DM-specific weights. This might also improve DM-/noDM-stratified GRS prediction of eGFR. The identified eGFR loci with difference between DM-and noDM individuals include loci with effects only in DM as well as loci with effects in noDM. This has highly relevant implications, if the respective genes are advanced as a drug target: the specificity of the association might help sharpen the target group for such potentially arising drug therapies.

Methods
Definition of the outcome eGFR and study-specific participant information. GFR was estimated in all study participants in all studies using serum creatinine measurements via the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) formula 20 . Study-specific information on the utilized assay and year of measurement is given in Supplementary Data 1. We used the R package nephro 63 , winsorized at 15 and 200 ml min -1 per 1.73 m 2 , and logarithmized using a natural logarithm. For a better interpretation of effect size, we there used eGFR on the original scale in GRS analyses. Study-specific information on study design, sample size, sex and age, utilized serum creatinine assay, year of measurement, and average eGFR is given in Supplementary Data 1.
Each study is conducted according to the declaration of Helsinki; local ethics committees approved research protocols and participants provided written informed consent. as non-fasting plasma glucose ≥200 mg/dl (11.0 mmol/L) or diabetes therapy, or (iii) (glucose unavailable) as self-reported diabetes. For UKB, DM was defined as HbA1c≥48 mmol/mol (≥6.5%) or diabetes therapy (i.e., A10 ATC codes obtained from 64 ).
Study-specific GWAS analyses stratified by DM status. We distributed an analysis plan to the 72 participating studies of the CKDGen consortium. Each study conducted analyses separately for individuals with DM and individuals without DM. All studies imputed genotypes to the Haplotype Reference Consortium v1.1 (HRC) or 1000 Genomes Project phase 3 v5 (1000Gp3v5) ALL or phase 1 v3 (1000Gp1v3) ALL panel. Each study conducted linear regression GWAS on log(eGFR) using natural logarithm, an additive genotype model as well as adjusted for sex, age, and other study-specific covariates. Details on study-specific genotyping, imputation, and GWAS were described previously for CKDGen 20 . For UKB, we utilized the fastGWA tool 65 to conduct GWAS for log(eGFR) based on linear mixed models while accounting for sex, age, age × sex, age 2 , age 2 × sex, and 20 genetic principal components and assuming an additive genetic model, which allowed to include related individuals in the GWAS 65 . For stage 2 studies, MVP, MGI, and HUNT, the same analysis plan was distributed. Details on genotyping, imputation, and GWAS in UKB as well as for stage 2 studies are shown in Supplementary Data 1. For quality control, we excluded variants with low imputation quality, Info < 0.6, or rare variants with minor allele frequency, MAF < 0.1%. We utilized the software packages GWAtoolbox 66 and EasyQC 67 for the quality control of study-specific GWAS results. We conducted a correction for genomic control lambda of the results stratified by diabetes status.
DM/noDM-stratified GWAS meta-analyses. In stage 1 of our analysis, separately in DM (n DM = 88,829) and noDM (n noDM = 620,665) strata, we conducted fixed-effect inverse-variance weighted meta-analyses of 72 GWAS of log(eGFR) using metal 68 , and then meta-analyzed these results with DM/noDM-stratified GWAS from UKB (n DM = 21,040; n noDM = 414,628; European only). To adjust for population stratification within studies, we applied genomic control (GC) correction 69 to each study prior to the meta-analysis. We applied a second GC correction to the DM-and noDM stage 1 meta-analysis results (GC lambda = 1.02 and 1.20, respectively). We excluded variants that were present only in ≤36 stage 1 studies (≤50%) and variants with a cumulative minor allele count of <400 in the stage 1 meta-analyses. In summary, 109,869 individuals with DM and 1,035,190 with noDM were included in stage 1. We followed variants identified at stage 1 in independent stage 2 meta-analyses. For stage 2, we included DM/noDM-stratified GWAS on log(eGFR) from MVP (n DM = 57,430, n noDM = 122,966, hospitalbased), MGI (n DM = 7469, n noDM = 36,558, hospital-based) and HUNT (n DM = 3799, n noDM = 65,590, population-based), totalling 68,698 individuals with DM and 225,114 with noDM, all of European-ancestry. Again, we applied studyspecific GC correction prior to the meta-analysis and a second GC correction 69 to the stage 2 meta-analysis results (GC lambdas = 1.00 and 1.02 for DM-and noDM, respectively). To maximize power for locus identification, we combined the double GC-corrected stage 1 and 2 meta-analysis separately by DM status via fixed-effect inverse-variance weighted meta-analyses of the two sources using metal 68 . The GC lambda in this final meta-analysis was comparable to previous GWAS 16,70 (GC lambdas = 1.03 and 1.15, respectively). The DM-/noDM-specific summary statistics (stratified GWAS) for each variant genome-wide served to investigate potential DM-/noDM-differential genetic effects on log(eGFR) without making any assumptions on the DM association with any other covariate 71 . These DM-/noDMspecific summary statistics on genetic variants associated with log(eGFR) enabled the implementation of all the following statistical tests to search for DM/noDMdifference loci or novel loci allowing for difference as described below.
Approaches to search for DM/noDM-difference in genetic effects on eGFR. We used the meta-analyzed SNP-specific summary statistics, to test for difference in eGFR-effects between DM and noDM. For this, we applied a difference test for each variant using whereβ DM andβ noDM are the genetic effect estimates for eGFR in DM or noDM from the stratified GWAS meta-analysis, respectively, and corresponding standard errors se DM and se noDM . The term r diab reflects the correlation betweenβ DM and β noDM across all variants (Spearman correlation coefficient, r diab ¼ 0:14).
To search genome-wide for eGFR loci with DM-/noDM-differential effects, we applied two approaches 24,37 : (i) a genome-wide difference test (P Diff < 5 × 10 −8 , difference test approach), and (ii) a search for genome-wide significant association with overall eGFR followed by a difference test in the same data (P Overall < 5 × 10 −8 and P Diff < 0.05/k, k = number of followed SNPs, overall+difference test approach). The two approaches complement each other in terms of power to detect difference loci that depend on the magnitude and DM-/noDM specificity 24,37 . We implemented the approaches in two designs: first, we applied a discovery +replication design, where we searched for significant differences in the stage 1 meta-analysis and moved the selected SNPs to a replication stage using the stage 2 meta-analysis (applying a Bonferroni-corrected alpha-level accounting for the SNPs tested). Second, to make full use of stage 1 and stage 2 data, we searched for differences in stage 1 + 2 meta-analysis combined (combined stage design). We utilized the R package EasyStrata 72 to apply the difference test approaches to the DM/noDM-stratified meta-analysis results.
Approaches to search for eGFR loci allowing for DM/noDM-differences. By allowing for differences between DM and noDM in SNP-effects on eGFR, one can possibly detect novel loci that have been masked in analyses on overall eGFR (without DM-status stratification). Again, we used the meta-analyzed SNP-specific summary statistics to apply two approaches to search for novel eGFR loci allowing for difference: (i) a screen using a two degrees of freedom joint test (P joint < 5 × 10 −8 , joint-test approach) 38 derived from a χ 2 -test using 38 whereβ DM andβ noDM are the genetic effect estimates for eGFR in DM or noDM from the stratified GWAS meta-analysis, respectively, and corresponding standard errors se DM and se noDM ; and (ii) a on eGFR association in individuals with or without DM separately (stratified tests approach, P DM < 5 × 10 −8 or P noDM < 5 × 10 −8 ). Analogously to the search for difference loci, we applied twostage designs, the discovery+replication, and the combined stage designs. We utilized the R package EasyStrata 72 to apply the joint and stratified test approaches to the DM/noDM-stratified meta-analysis results.
Variant selection and region definition. In order to derive non-overlapping locus regions and locus lead variants, we clumped genome-wide significant variants from the respective test results (P Diff < 5 × 10 −8 , P Joint < 5 × 10 −8 , P DM < 5 × 10 −8 , or P noDM < 5 × 10 −8 ) as done previously 16 : the most significant variant was selected genome-wide (first locus lead variant) and the corresponding locus was defined as the smallest physical interval on the corresponding chromosome containing this variant such that there were no genome-wide significant variants within 500 kb outside the two borders. Omitting the identified locus, we repeated the procedure until no further genome-wide significant variants were detected. By this, a locus region is defined by adding ±250 kb to the first and last genome-wide significant variant of an identified locus. This procedure also ensured non-overlapping loci. A locus was considered to be known for eGFR from previous GWAS, if it overlapped with one of the 427 known eGFR loci (424 from ref. 16 , 3 additional from ref. 20 ). If a locus is not known, we call it a novel eGFR locus. For the identified variants, we assessed between-study heterogeneity based on the CKDGen meta-analysis using a Chi-Squared test and an I 2 statistic 73 and verified the association statistics with regards to abnormal or unusually large effect sizes.
GCTA analyses to identify independent secondary signals within loci. To evaluate whether there were multiple independent signals within locus, we conducted approximate conditional analyses with GCTA 74 for each identified locus. These analyses were based on European-ancestry individuals, since appropriate trans-ethnic linkage disequilibrium (LD) reference panels were limited. As LD reference panel, we used a random subset of 20,000 unrelated individuals of European-ancestry from UKB, as done previously 16 . To obtain independent signals for associations derived by the difference test, the joint test, or DM-or noDMspecific association tests, we applied a stepwise approach: we conditioned via GCTA analysis on the, respectively, observed lead variant in DM and noDM separately and, second, derived DM/noDM-specific conditioned results for the locus. Separately for DM and noDM, we ensured whether any of the conditioned DM-or noDM-specific associations in the locus showed genome-wide significant association at P DM_Cond < 5 × 10 −8 or P noDM_Cond < 5 × 10 −8 . For loci derived by the difference test or the joint test, we applied the difference test or joint test, respectively, to the conditioned DM/noDM estimates to infer whether additional signals showed significant differences or joint effects (P Diff_Cond < 5 × 10 −8 or P Joint_Cond < 5 × 10 −8 ).
Prioritization of variants and genes. For each variant within each identified signal, we derived the variants that were the most likely to drive the association. For this, we calculated approximate Bayes factor and posterior probabilities of association (PPA) based on Z-scores using the Kichaev method 75 . We calculated PPAs based on unconditioned or conditioned summary statistics depending on whether the locus showed only one or multiple independent signals, respectively. Then, we obtained 99% credible sets of variants by sorting the variants within each signal by descending PPA and then summing up PPAs until a cumulative PPA of 99% was reached. To prioritize genes, we used the results from the Gene PrioritiSation (GPS) published previously 16 for known eGFR loci and generated the GPS de novo for novel eGFR loci accordingly: we queried each gene underneath identified loci and prioritized the gene when it mapped to a 99% credible set variant that was protein-relevant with high predicted deleteriousness (CADD 45 PHRED-Score ≥ 15) or expression-relevant in kidney tissue (eQTL, NEPTUNE 46 , or GTEx v7 76 , falsediscovery rate, FDR < 5% for the eQTL) and for splice quantitative trait loci in kidney tissue (sQTL) 76 (FDR < 5% for the sQTL). We also prioritized genes that were known as human kidney disease monogene: for this, we queried each gene at identified loci for a documented kidney phenotype in human as done previously 16 (Online Mendelian Inheritance in Man® database, OMIM 77 or Groopman et al. 78 ) with additional manual curation by expert review to focus on kidney disease monogenes.
GRS analyses. GRS analyses based on the previously identified 634 eGFR variants 16 and based on the variants identified for DM-/noDM-difference on eGFR were conducted in unrelated European-ancestry individuals from HUNT (n DM = 3023, n noDM = 25,484, stage 2 study). For each individual, the GRS was computed as the weighted sum of eGFR-lowering alleles across the 634 variants applying three different weighting schemes: (i) weighted by the respective overall per-variant effect as derived previously (i.e., estimated in CKDGen and UK Biobank, our stage 1 data 16 ) (ii) weighted by the variant's DM-/noDM-specific effects as derived here in stage 1 depending on whether the individual in HUNT had DM or noDM, respectively (and vice versa), (iii) weighted by the variant's DM-/noDMspecific effect from stage 1 for HUNT individuals with DM/noDM, respectively, when the variant was among the seven with an identified significant difference, and weighted by the overall-effect size otherwise. By this, the GRS association analyses conducted in a stage 2 study were independent of the variant identification and weight estimation, which were based on stage 1 studies. Separately for individuals with and without DM, we estimated the association of the GRS on eGFR by linear regression (original scale; adjusted by sex, age, and genetic principal components). We judged the GRS effect per standard deviation of the GRS and the eGFR variance explained by the GRS separately in DM and noDM separately.
Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
Summary genetic association results for the DM-status-specific meta-analyses for log(eGFRcrea) can be downloaded from https://ckdgen.imbi.uni-freiburg.de/. All other data are available from the corresponding author on reasonable request.

Code availability
The analysis plan can be downloaded from https://ckdgen.eurac.edu/mediawiki/index. php/CKDGen_Round_4_EPACTS_analysis_plan and the phenotype command line script from https://github.com/genepi-freiburg/ckdgen-pheno. All other code is available from the corresponding author on reasonable request.