Main

Genetic factors are important determinants of glucose homeostasis and type 2 diabetes (T2D) susceptibility. Heritability of both fasting glucose (FG) and T2D is high, at 35–40%1 and 30–60%2, respectively. To date, more than 400 genetic loci have been associated with T2D3,4. Genome-wide association studies (GWAS) for glycemic traits in individuals without diabetes have identified genetic predictors of blood glucose, insulin and other metabolic responses during fasting or after oral or intravenous glucose challenge tests5,6,7,8. However, physiological glucose regulation involves responses to diverse nutritional and other stimuli that were, by design, omitted from such studies. Blood glucose is frequently measured at different times throughout the day in clinical practice and research studies (random glucose (RG)). While RG is inherently more variable than standardized measures, we reasoned that, across a very large number of individuals, it gives a more comprehensive representation of complex glucoregulatory processes occurring in different organ systems. Therefore, to identify and functionally validate genetic effects influencing RG, explore its relationships with other traits and diseases, and use these data to provide pathways for T2D treatment stratification, we performed a large-scale cross-ancestry GWAS meta-analysis for RG in individuals without diabetes.

Results

RG GWAS expands the catalog of glycemia-related genetic associations

We undertook RG GWAS in 476,326 individuals without diabetes of European (n = 459,772) and other ancestries (n = 16,554) with adjustment for age, sex and time since last meal (where available), along with the exclusion of extreme hyperglycemia (RG > 20 mmol l−1) and individuals with diabetes (Supplementary Table 1). The covariate selection was done upon extensive phenotype modeling (Supplementary Note, Supplementary Table 2 and Extended Data Fig. 1a). We identified 150 distinct signals (P < 105) by fine mapping through conditional analysis within 120 loci reaching genome-wide significance (P < 5.0 × 108; Fig. 1a and Supplementary Tables 3 and 4). Fifty-three RG signals are reported for glycemic traits for the first time, greatly expanding our knowledge about the genetics of glycemia (Tables 1 and 2 and Supplementary Table 3). Adjustment for last meal timing (Extended Data Fig. 1b) did not change effect size estimates while enabling better power for the analysis. Application of glycated hemoglobin (HbA1c) cut point for diagnosing diabetes (HbA1c ≥ 6.5%) highlighted stronger associations at G6PC2 and GCK lead RG loci (Extended Data Fig. 1c), suggesting their roles in glucose set-point in normoglycemia9. Neither adjustment for body mass index (BMI), nor a more stringent hyperglycemia cut-off (RG > 11.1 mmol l−1; Extended Data Fig. 1d,e) materially changed the magnitude and significance of the RG effect estimates, although when all covariate models were individually applied, 11 additional signals at genome-wide significance were identified (Table 2 and Supplementary Table 5). Despite previous misconceptions that RG is of limited value for genetic discovery because of its inherent variability, our RG GWAS demonstrates that this trait variability has a clear genetic component.

Fig. 1: Summary of all RG loci identified in this study.
figure 1

a, Circular Manhattan plot summarizing findings from this study. In the outermost layer, gene names of the 133 distinct RG signals are labeled with different colors indicating the following three clusters defined in cluster analysis: 1a/1b, metabolic syndrome; 2a/2b, insulin release versus insulin action (with additional effects on inflammatory bowel disease for cluster 2a) and 3, defects of insulin secretion. Asterisks annotate RG signals that are new for glycemic traits. Track 1 shows RG Manhattan plot reporting −log10(P value) for RG GWAS meta-analysis. Signals reaching genome-wide significance (P < 5.0 × 108) are colored in red. Crosses annotate loci that show evidence of sex heterogeneity (Psex-dimorphic < 5.0 × 108 and Psex-heterogeneity < 0.05); blue crosses for larger effects in men, green crosses for larger effects in women. Track 2 shows the effects of the 133 independent RG signals on four GIP/GLP-1-related traits GWAS. The colors of the dotted lines indicate four GIP/GLP-1-related traits: gray dot, signals reaching P < 0.010 for a GIP/GLP-1-related trait; red dot, lead SNP has a significant effect on GIP/GLP-1-related trait (Bonferroni corrected P < 1.0 × 104). Track 3 shows the effects (−log10(P value)) of the 133 independent RG signals on 113 glycan PheWAS. Track 4 shows the effects (−log10(P value)) of the 133 independent RG signals on 210 gut-microbiome PheWAS. Track 5 shows MetaXcan results for ten selected tissues for RG GWAS meta-analysis; signals colocalizing with genes (Bonferroni corrected P < 9.0 × 107) are plotted for each tissue. All P values were calculated from the two-sided z statistics computed by dividing the estimated coefficients by the estimated standard error, without adjustment. b, Credible set analysis of RG associations in the European ancestry meta-analysis. Variants from each of the RG signal credible sets are grouped based on their posterior probability (the percentiles labeled on the sides of the bar). SNP variants with posterior probability >80%, along with their locus names, are provided. All variants from the credible set of lead signals are highlighted in bold.

Table 1 New signals for glycemic traits discovered in GWAS meta-analysis of RG levels in up to 459,772 individuals of European ancestries without diabetes
Table 2 New signals for glycemic traits discovered through UK Biobank (UKBB) (European ancestry only) GWAS in other RG models, UKBB (European ancestry only) GWAS on rare variants and cross-ancestry meta-analysis of up to 476,326 individuals of European or other ancestries (Black, Indian, Pakistani and Chinese) in UKBB

A number of signals identified in individuals of European ancestry showed nominal significance (P < 0.05) in other ancestry groups, including new loci MANSC4/KLHL42 in African, FAM46C and ACVR1C in Indian and RBMS1 in Chinese ancestry groups (Supplementary Table 3). All such signals, except rs540524 at G6PC2, rs183606969 at GCK and rs6006399 at MTMR3/HORMAD2, were directionally concordant across ancestries. At GCK, rs2908286 (r21000GenomesAllAncestries = 0.83 with rs2971670 lead in European ancestry individuals) was genome-wide significant in the African ancestry individuals alone (Supplementary Table 6). Cross-ancestry meta-analyses combining European and the other four ancestral groups revealed two new RG signals at RRNAD1 and PROX1 (Table 2 and Supplementary Table 6). Overall, while being only 16,554 individuals larger in sample size than the European ancestry meta-analysis, the cross-ancestry analysis expanded the new locus discovery for RG, confirming the potential of cross-ancestry studies for complex trait genetics.

The strongest associations with RG were detected at G6PC2 (P < 1.0 × 10−746) and GCK (P < 3.7 × 10277), established loci for FG and with key roles in gluconeogenesis10 and glucose sensing11, respectively (Supplementary Table 3). Notably, only two-thirds of RG signals overlapped with T2D-associated loci (Extended Data Fig. 1f), including three new loci for glycemia (SCD5, RNF6 and TSHZ2). The direction of effects at these loci between RG, T2D and homeostasis model assessment of β-cell function/insulin resistance (HOMA-B/HOMA-IR)6 (Extended Data Figs. 1f,g and 2 and Supplementary Table 7) were consistent with their epidemiological correlation. We also discovered sex dimorphism at 13 RG loci, including male-specific PRDM16 and RSPO3, and female-specific SGIP1, SRRM3 and SLC43A2 (Table 2, Fig. 1a and Supplementary Tables 3 and 8). We conclude that sex dimorphism, characterizing over one-tenth of RG-associated loci, is a widespread feature of glucose metabolism.

Coding, rare and causal variants in RG variability

The lead variants at two new RG loci (NMT1 and RFX1) and three previously reported loci for FG (TET2, THADA and RREB1) were all coding common (minor allele frequency (MAF) ≥ 5 %) variants (Supplementary Table 3 and Extended Data Fig. 3). Additionally, lead RG-associated SNPs at glucagon-like peptide-1 receptor (GLP1R), neuronal differentiation 1 (NEUROD1) and ER degradation enhancing α-mannosidase like protein 3 (EDEM3) loci in our analysis were low-frequency (5% > MAF ≥ 1%) coding variants (Table 1, Supplementary Table 3 and Extended Data Fig. 3). NEUROD1 and EDEM3 are plausible candidates for glucose homeostasis, with the former reported for glucosuria12 and the latter linked to renal function13,14. Within the rare allele frequency range (1% > MAF ≥ 0.001%), we first identified 30 RG loci and validated seven in whole-exome sequencing (WES) UK Biobank (UKBB) data (Supplementary Note). These included noncoding, such as rs2096313127 at CAMK2B (Supplementary Table 9) and synonymous rs2232324 in G6PC2 variant associations (Table 2 and Supplementary Table 9). We expanded the annotation of coding nonsynonymous independent (r21000GenomesAllAncestries < 0.0010) rare variant signals associated with RG to nondeleterious new rs146886108 (Arg187Gln) in ANKH15, and deleterious, including three in G6PC2 with predicted (rs2232326) and established (rs138726309, rs2232323)16 effects (Supplementary Table 9). Thus, a range of coding and rare variants contributes to RG level variability and can be detected in very large genetic studies.

Next, we sought to pinpoint the most plausible set of causal variants by calculating 99% credible sets for each RG locus. In the European ancestry-only analysis, 15 RG signals were explained by one variant with a posterior probability of ≥99% of being causal, including low-frequency variants in GLP1R, G6PC2, MECOM and CCND2 (ref. 17), and common variants in LMO1 and CACNA2D3 (Fig. 1b and Supplementary Table 10a). For another 16 signals, such as at RMST, FOXN3 and ADRA2A, a lead variant had a posterior probability ≥80%. Credible sets at WIPI1, GCKR, TET2, RREB1 and RFX6 included coding common variants. RREB1 and RFX6 encode transcription factors implicated in the development and function of pancreatic β cells18,19. The credible sets were narrowed down for several signals in cross-ancestry RG meta-analysis (European ancestry median credible set size = 12.0 and cross-ancestry = 12.0), with improvements observed at DGKB and TP53INP1 lead signals (Supplementary Table 10b,c). These analyses highlight examples of validated and potential targets for therapeutic development15.

Characterization of RG-associated GLP1R coding variants provides a framework for T2D treatment stratification

Following annotation and definition of likely causal variants, for functional studies, we prioritized GLP1R, which encodes a class B1 GPCR (GLP-1R) important in blood glucose and appetite regulation and a well-established target of the T2D drugs exenatide (exendin-4) and semaglutide20. We used RG data to validate an experimental framework for predicting individual responses to GLP-1R agonists, as this would be a major asset in clinical practice and is currently lacking. Within GLP1R, the lead missense variant at rs10305492 (A316T) has a strong (0.058 mmol l−1 per allele) RG-lowering effect, second by size only to G6PC2 locus variants, and is also associated with FG/T2D21,22.

We functionally tested the impact of rs10305492 (A316T) and 16 other GLP1R coding variants detected in the UKBB dataset, with effect allele frequency ranging from common (G168S, rs6923761, PRG GWAS meta-analysis = 5.20 × 105) to rare (R421W, rs146868158, PRG GWAS meta-analysis = 0.036), by measuring GLP-1-induced recruitment of mini-Gαs23 in HEK293 cells stably expressing wild-type (WT) or variant GLP-1R. This approach captures the most proximal part of the Gαs-adenylate cyclase-cyclic adenosine monophosphate pathway, which links GLP-1R activation to insulin secretion. With correction for differences in cell surface expression determined using SNAP-tag labeling24, mini-Gs-coupling efficiency was indeed predictive of the RG effect for these variants (Fig. 2a and Supplementary Table 11), thereby linking experimentally measured GLP-1R function in vitro to blood glucose homeostasis. This relationship was assessed in UKBB WES data (Supplementary Note and Extended Data Fig. 4).

Fig. 2: Functional and structural analysis of coding GLP1R variants.
figure 2

a, Minor allele frequency-weighted linear regression was used to test if mini-Gs response to GLP-1 stimulation substantially predicted point estimates of GLP1R variant effect on RG levels (AST20 βRG as estimated in the UKBB study, nmax = 401,810). Mini-Gs response to GLP-1 stimulation was corrected for variant surface expression (nmax = 22, exact n for each variant is provided in Supplementary Table 11). Error bars extend one standard error above and below the point estimate. Size of the dots is proportional to the weight applied in the regression model. The regression results (coefficient of determination R2 = 0.74, F(1, 15) = 47.5, P = 5.1 × 106) suggest that mini-Gs coupling in response to GLP-1 stimulation predicts the effect of these coding variants on RG levels (AST20 βRG = −0.030; 95% confidence interval (CI) = −0.039 to −0.020; P = 5.1 × 106). The gray shaded area around the regression line corresponds to the 95% CI of predictions from the model. Variants in red showed no detectable surface expression (NDE) and are not included in regression analysis. b, Mean GLP1R variant mini-Gs coupling and receptor endocytosis, with surface expression correction, in response to GLP-1, OXM, glucagon (GCG), exendin-4 (Ex4), semaglutide (Sema) and tirzepatide (TZP), n = 6. Positive deviation indicates variant gain-of-function, with statistical significance inferred when the 95% CIs shown do not cross zero. Responses are also compared between pathways by unpaired t test, with an asterisk indicating statistically significant differences. c, Architecture of the complex formed between the agonist-bound GLP-1R and Gs; the likely effect triggered by residues involved in GLP-1R isoforms A316T, G168S and R421W (in magenta) are reported. d, Distributions of the distance between Y2423.45 side chain and P3125.42 backbone computed during molecular dynamics simulations of GLP-1R WT and A316T; the cut-off distance for hydrogen bond is shown. e, Difference in the hydrogen bond network between GLP-1R WT and A316T. f, Analysis of water molecules within the TMD of GLP-1R WT and A316T suggests minor changes in the local hydration of position 5.46 (unperturbed structural water molecule). Also, a stabilizing role for the water molecules at the binding site of the G protein (water cluster apha5) cannot be ruled out. g, Distributions of the distance between position 1681.63 and Y1782.48 during molecular dynamics simulations of GLP-1R WT and G168S. h, During molecular dynamics simulations, the GLP-1R isoform S168G showed increased flexibility of ICL1 and H8 compared to WT, suggesting a different influence on G-protein intermediate states. i, Contact differences between Gs and GLP-1R WT or W421R; the C terminal of W421R H8 made more interactions with the N terminal segment of the Gs β subunit. j, Mini-Gs and GLP-1R endocytosis responses to 20 nM exendin-4, plotted against surface GLP-1R expression, from 196 missense GLP1R variants transiently transfected in HEK293T cells (n = 5 repeats per assay), with data represented as mean ± s.e.m. after normalization to WT response and log10-transformation. Variants are categorized as ‘LoF1’ when the response 95% CI falls below zero or ‘LoF2’ where the expression-normalized 95% CI falls below zero. k, GLP-1R snake plot created using gpcr.com summarizing the functional impact of missense variants; for residues with >1 variant, classification is applied as LoF2 > LoF1 > tolerated.

Focusing on the two directly genotyped GLP1R missense variants in UKBB, we also measured mini-Gs responses to several endogenous and pharmacological GLP-1R agonists, observing that A316T (rs10305492-A) showed increased responses and R421W (rs146868158-T) showed reduced responses, to all ligands except exendin-4 (both variants) and semaglutide (A316T only), in line with their RG effects (Fig. 2b). Interestingly, for late-stage T2D candidate tirzepatide, which has pronounced ‘biased agonism’ at GLP-1R25, the difference between A316T and R421W amounted to nearly tenfold difference in activity. The common G168S variant, with a relatively small RG-lowering effect (β = −0.0013, s.e. = 3.1 × 104), also showed increases in function with pharmacological agonist stimulation. As GLP-1R undergoes extensive agonist-induced endocytosis, a process that modulates the subcellular origin and temporal dynamics of receptor signaling26, we also assessed the endocytic characteristics of A316T, G168S and R421W variants using high content microscopy. Here the most notable observation was that agonist-induced GLP-1R endocytosis with R421W was normal despite its signaling deficit, suggesting a specific alteration to how this variant couples to downstream effectors24. These results, supported by RG data and clinical observations27,28, suggest that in vitro assessments can provide valuable insights into the optimal selection of GLP-1R treatment according to genotype.

Next, we performed molecular dynamics (MD) simulations of human GLP-1R bound to oxyntomodulin (OXM)29 to gain structural insights into the above-described GLP1R variant effects. A316T has a single amino acid substitution in the core of the receptor transmembrane (TM) domain (Fig. 2c) that leads to an alteration of the nearby hydrogen bond network that normally serves to stabilize the GLP-1R inactive state (Supplementary Video 1). Specifically, in A316T, residue T3165.46 replaces Y2423.45 (superscripts follow the study discussed in ref. 30 generic GPCR class B1 numbering system, where the number before the dot indicates the TM helix and the number after the dot refers to the sequence distance from the most conserved residue indicated by 50) in a persistent hydrogen bond with the backbone of P3125.42, one turn of the helix above T3165.46 (Fig. 2d,e and Supplementary Video 1). This triggers a local structural rearrangement that could transmit to the intracellular G-protein-binding site through TM3 and TM5, thereby enhancing G-protein coupling. A water molecule is close to position 5.46 in both A316T and WT (water cluster α5; Fig. 2f). Notably, the same water bridges the backbone of Y2413.44 and A3165.46 in WT or the backbone of Y2413.44 and the side chain of T3165.46 in A316T. Given the importance of conserved water networks in the activation of class A GPCRs31,32, the stability of the hydrated spot close to position 5.46 corroborates the importance of this site for GLP-1R effects. In analogy with A316T, simulations with the G168S variant indicated the formation of a stable new hydrogen bond between the side chain of residue S1681.63 and A1641.59, one turn above on the same helix (Fig. 2g and Supplementary Video 2). This moves the C-terminal end of TM1 closer to TM2 and reduces the overall flexibility of intracellular loop 1 (ICL1; Fig. 2h), altering the role of ICL1 in G-protein activation. In contrast to A316T and G168S, the site of variant R421W is consistent with persistent interactions with the G protein, and simulations predicted a propensity of R421W to interact with a different region of the G-protein β-subunit compared to WT (Fig. 2i). These results capture the full range of structural features in the current active GLP-1R models and provide clear clues about the dynamics of A316T and other GLP-1R variants, compared to early models that did not benefit from the structural insights obtained from cryo-electron microscopy22.

For a broader view of the impact of GLP1R coding variation, we screened an additional 178 missense variants identified from exome sequencing33 for exendin-4-induced mini-Gs coupling and endocytosis by transient transfection in HEK293 cells (Supplementary Note, Fig. 2j,k and Supplementary Table 12). In total, 110 variants showed a reduced response in either or both pathways (‘LoF1’) and 67 displayed a specific response deficit that was not fully explained by differences in GLP-1R surface expression (‘LoF2’). Many of these defects were larger than in the analysis in Fig. 2a, with a major loss of GLP-1R function a likely consequence, meaning that patients carrying these variants are less likely to benefit from GLP-1R agonist drug treatment.

Functional annotation of RG associations and intestinal health

Previous T2D and glycemic trait GWAS have primarily implicated pancreatic, adipose and liver tissues3. We performed a range of complementary functional annotation analyses by leveraging our RG GWAS results to identify additional cell and tissue types with etiological roles in glucose metabolism. Data-driven expression prioritized integration for complex traits (DEPICT)34, which predicts enriched tissue types from prioritized gene sets, highlighted intestinal tissues including ileum and colon, as well as pancreas, adrenal glands5, adrenal cortex and cartilage (false discovery rate, FDR < 0.20; Fig. 3a,b and Supplementary Table 13). Similarly, CELL type expression-specific integration for complex traits (CELLECT)35, which facilitates cell type prioritization based on single-cell RNA-sequencing (scRNA-seq) datasets, identified large intestinal tissue as second-ranked only to pancreatic cell types (Fig. 4 and Supplementary Table 14). Interestingly, RG variants were related particularly to enriched expression in pancreatic polypeptide cells, exceeding even the more conventionally implicated insulin-secreting β cells. Supporting evidence was obtained from transcriptome-wide association study (TWAS) analysis, where we identified a total of 216 (119 unique) significant genetically driven associations across the ten tested tissues (Supplementary Table 15a); 51 (25 unique) of highlighted genes are located at genome-wide significant RG loci (Supplementary Table 15). TWAS signals in skeletal muscle5 showed the largest overlap with RG signals, such as GPSM1 (ref. 36) and WARS. The combined results from ileum and colon also showed high enrichment, including the new NMT1 and the established FADS1/3 and MADD genes (Fig. 1a and Supplementary Table 15). Expression quantitative trait locus (eQTL) colocalization analyses, using eQTLgen whole blood expression data from 31,684 individuals37,38 and the COLOC2 approach, identified 14 loci with strong links (posterior probability >70%) to gene expression data, including TET2 (ref. 39), KCNJ11, KLHL42, IKBKAP and CAMK1D, with transcriptional effects in pancreatic islets and kidney mesangial cells (Supplementary Table 16). Similar analyses of human pancreatic islets regulatory variation in the translational human pancreatic islet genotype tissue-expression resource (TIGER) dataset38 defined 58 loci with strong statistical support for colocalization of the effects on RG and tissue expression of ADCY5, RNF6, FADS1, MADD and STARD10 (ref. 40), in addition to KLHL42 and CAMK1D, with the latter overlapping in whole blood. Moreover, epigenetic annotations using the GARFIELD tool highlighted significant (P < 2.5 × 105) enrichment of RG-associated variants in the fetal large intestine, as well as blood, liver and other tissues (Extended Data Fig. 5 and Supplementary Table 17). Adult intestinal tissues are not available in GARFIELD except for colon. Prompted by multiple analyses highlighting a potential role for the digestive tract in glucose regulation, we assessed the overlap between our signals and those from the latest gut-microbiome GWAS41 and identified two genera sharing signals and direction of effect with RG at one locus: Collinsella and Lachnospiraceae-FCS020 at ABO-FUT2 (Fig. 1a and Supplementary Table 18). The ABO-FUT2 locus effects on RG could be mediated by the abundance of Collinsella/Lachnospiraceae-FCS02, producing glucose from lactose and galactose42. Collinsella genus affects gut permeability via interleukin-17A43 and shows higher abundance in individuals with T2D compared to those with normal glucose tolerance and individuals with prediabetes44. Moreover, weight loss decreases Collinsella among obese individuals with T2D45. Higher prevalence of the Lachnospiraceae family is associated with metabolic disorders, while genus Lachnospiraceae-FCS02 abundance shows an inverse correlation with serum triglycerides46. However, the mechanism of their enrichment has yet to be studied. This multi-omics annotation provided strong evidence for links between RG and intestinal health.

Fig. 3: Deterioration of glucose homeostasis progressing into T2D and leading to complications in multiple organs and tissues.
figure 3

Established (left, in peach) and new (right, in green). a, A human figure illustrating the main causes of hyperglycemia (a combination of lifestyle and genetic factors), and how hyperglycemia affects many organs and tissues. Complications on the left panel are well-established for T2D. Those on the right panel are emerging ones and are supported by our current analyses. Figure created with BioRender.com. b, DEPICT prioritization of 134 tissues from the GTEx Project highlights the ileum and pancreas (shown in red, one-sided empirical P value with FDR < 0.05 determined against randomized phenotypes in a null GWAS).

Fig. 4: Cell type prioritization across 17 tissues identified large intestinal tissue ranked second only to pancreatic cell types.
figure 4

CELLECT prioritization of 115 cell types from Tabula Muris highlights pancreatic polypeptide (PP) cells (shown in black, one-sided Wilcoxon rank-sum test with significance threshold depicted by a dotted line indicating cell types with a nominal PS-LDSC < 4.3 × 104).

Finally, we observed associations at HNF1A47 with nine total plasma N-glycome traits48 at a Bonferroni-corrected threshold (Fig. 1a and Supplementary Table 19). These traits represent highly branched galactosylated sialylated glycans (attached to an α1-acid protein, an acute-phase protein49), known to lead to chronic low-grade inflammation50,51 and an increased risk of T2D52,53,54 that might be explained by the role of N-glycan branching of the glucagon receptor in glucose homeostasis55. In addition, ten glycans showed association with five RG loci (HNF1A, BAG1, PLUT) at a suggestive level of significance (Fig. 1a). Among them, three are attached to immunoglobulin G molecules49, and their increased relative abundances are associated with a lower risk of T2D56 and diminished inflammation status57. These observations suggest an overlap between networks regulating RG homeostasis and plasma-protein N-glycosylation.

Genetic relationships between RG and other metabolic or nonmetabolic traits

Using linkage-disequilibrium score regression analyses, we estimated the genetic correlations between RG and other phenotypes to quantify the shared genetic contribution. We detected positive genetic correlations between RG and squamous cell lung cancer (rg = 0.28, P = 0.0015) and lung cancer (rg = 0.12, P = 0.037; Fig. 5 and Supplementary Table 20), as well as inverse genetic correlations with lung function related traits, such as forced vital capacity (FVC, rg = −0.090, P = 0.0059) and forced expiratory volume in 1 second (FEV1, rg = −0.054, P = 0.017; Figs. 3a and 5 and Supplementary Table 20). To investigate this further, we conducted bidirectional Mendelian randomization (MR) analysis, which suggested a causal effect of RG and T2D on lung function, including FEV1 (βMR–RG = −0.66, P = 9.6 × 105; βMR–T2D = −0.049, P = 1.3 × 1013) and FVC (βMR–RG = −0.60, P = 1.5 × 104; βMR–T2D = −0.062, P = 1.4 × 1021), but not vice versa (RG βMR–FEV1 = −0.0048, P = 0.42; βMR–FVC = −0.01, P = 0.17 and T2D (βMR–FEV1 = −0.18, P = 0.040; βMR–FVC = −0.21, P = 0.040; Supplementary Table 21a,b). External factors, such as smoking or sedentary lifestyle, could cause lung function to decline, independent of RG and T2D effects. We implemented multivariable MR (MVMR) and found (Supplementary Table 21c) that RG and T2D causal effects on FVC are independent of both cigarettes smoked per day (CPD; that is, proxy for smoking58) and leisure screen time (LST; that is, proxy for physical activity59). This is important as previous observational studies have highlighted worsening lung function, as defined by FVC, in patients with T2D, but whether this was a causal relationship was not clear60,61. More recently, it was shown that patients with diabetes are at an increased risk of death from the viral infection COVID-19 (ref. 62), with pulmonary dysfunction contributing to mortality63. Our data confirm the causal effect of glycemic dysregulation on a decline in lung function as a new complication of diabetes.

Fig. 5: Genome-wide genetic correlation between RG and a range of traits and diseases.
figure 5

The x axis provides the estimated rg genetic correlation values for traits or diseases (y axis) reaching at least nominal significance (P < 0.05). Correlations reaching P < 0.010 are labeled with the prime symbol, and those P < 2.1 × 104 are labeled with the asterisk symbol. P values were calculated from the two-sided z statistics computed by dividing the estimated rg by the estimated standard error, without adjustment. Each error bar represents the standard error of the estimate.

Genome-wide genetic correlation analyses also showed a strong positive genetic correlation of RG with FG (rg = 0.88, P = 6.93 × 10−61; Fig. 4 and Supplementary Table 20). We meta-analyzed RG studies other than UKBB with FG GWAS summary statistics64, observing 79 signals reaching nominal significance that were directionally consistent in both UKBB and RG + FG (Supplementary Table 3), providing additional support to our RG findings. Given the large genetic overlap between RG, other glycemic traits and T2D, we evaluated the ability of a trait-specific polygenic risk score (PRS) to predict RG, T2D and HbA1c levels using UKBB effect estimates and the Vanderbilt cohort. The RG PRS explained 0.58% of the variance in RG levels when individuals with T2D were included (Supplementary Table 22), and 0.71% of the variance after excluding those who developed T2D within 1 year of their last RG measurement. The RG PRS performance was comparable to that of the FG loci PRS (0.38% versus 0.42% for T2D; 0.40% versus 0.44% for HbA1c), indicating shared genetic variability determining glycemic traits.

We previously highlighted diverse effects of FG and T2D loci on pathophysiological processes related to T2D development by grouping associated loci in relation to their effects on multiple phenotypes6. Cluster analysis of the RG signals with 45 related phenotypes identified three separate clusters (Fig. 1a, Supplementary Table 23 and Extended Data Figs. 6 and 7), including ‘metabolic syndrome’ cluster 1, with 28 loci also leading to higher waist-to-hip ratio, blood pressure, plasma triglycerides, insulin resistance (HOMA-IR) and coronary artery disease risk, as well as lower sex hormone binding globulin levels in both sexes and testosterone in males. Cluster 3 was characterized, in particular, by insulin secretory defects6. Cluster 2 showed a primary effect on insulin release versus insulin action3, but included a subcluster of 11 loci, which exert protective effects on inflammatory bowel disease, a relationship not previously reported. Moreover, cluster 2 was notable for generally reduced T2D risk in comparison to clusters 1 and 3, shaping the partial overlap between genetic determinants of glycemia and T2D that is known to exist65. This RG loci grouping gave innovative insights into the etiology of glucose regulation and associated disease states.

Discussion

Leveraging data from 476,326 individuals, we have expanded by 44 the number of loci associated with glycemic traits. By using RG, our analysis integrates genetic contributions into a wider range of physiological stages, which thus far was not possible with standardized glycemic measures. Moreover, the greater statistical power obtained from large cross-ancestry meta-analysis improves confidence in identifying potentially causal variants, thereby helping to prioritize genes for more detailed functional analyses in the future. Our comprehensive functional characterization of GLP1R coding variation validates its role in blood glucose regulation and, more importantly, shows how GLP-1R-targeting drug responses depend on genetic variation. Notably, additional islet-expressed class B1 GPCRs identified in our current analysis and other glycemic trait/T2D GWAS, including GIPR, GLP2R (refs. 3,66) and SCTR21, are investigational targets for T2D treatment, which should be subjected to similar analysis. Our functional annotation analyses point to underexplored tissue mediators of glycemic regulation, with new evidence highlighting the role of the intestine. This observation supports the profound effects of gastric bypass surgery on T2D resolution67, as well as links between the intestinal microbiome and responses to several diabetes drugs68. In the near future, larger well-phenotyped datasets will enable high-dimensional GWAS investigations, disentangling the role of diet composition, physical activity and lifestyle on RG level variability in relation to genetic effects. Finally, through MR, we identified a causal effect of glucose levels and T2D on lung function, demonstrating the utility of this approach for corroborating findings from observational studies and elevating lung dysfunction as a new complication of diabetes.

Methods

Ethics

All participating studies were approved by their appropriate institutional review boards or committees, and written informed consent was obtained from all study participants. For all the participating studies, approval was received to use their data in the present work. Study-specific ethics statements are provided in the references listed in Supplementary Table 1.

Phenotype definition and model selection for RG GWAS

We used RG (mmol l−1) measured in plasma or in whole blood (corrected to plasma level using the correction factor of 1.13). Individuals were excluded from the analysis if they had a diagnosis of T2D or were on diabetes treatment (oral or insulin). Individual studies applied further sample exclusions, including pregnancy, fasting plasma glucose ≥7 mmol l−1 in a separate visit, when available, and having type 1 diabetes (Supplementary Table 1). Details about RG modeling in the first set of six available cohorts (Supplementary Table 2) can be found in the Supplementary Note. For the GWAS, we included individuals based on the following two RG cut-offs: <20 mmol l−1 (20) to account for the effect of extreme RG values and <11.1 mmol l−1 (11), which is an established threshold for T2D diagnosis. We then evaluated the following six different models in GWAS according to covariates included and cut-offs used: (1) age (A) and sex (S), RG < 20 mmol l−1 (AS20); (2) age, sex and BMI (B), RG < 20 mmol l−1 (ASB20); (3) age and sex, RG < 11.1 mmol l−1 (AS11); (4) age, sex and BMI, RG < 11.1 mmol l−1 (ASB11); (5) age, sex, time since last meal (accounted for as T, T2 and T3), RG < 20 mmol l−1 (AST20) and (6) age, sex, T, T2 and T3 and BMI, RG < 20 mmol l−1 (ASTB20). Apart from the above, additional adjustments for study site and geographical covariates were also applied.

RG meta-analyses

The GWAS meta-analysis of RG consisted of the following five components: (1) 37,239 individuals from ten European ancestry GWAS imputed up to the HapMap 2 reference panel; (2) 3,156 individuals from three European ancestry GWAS with Metabochip coverage; (3) 21,083 individuals from two European ancestry GWAS imputed up to 1000 Genomes reference panel; (4) 380,432 individuals of white European ancestry from the UKBB and (5) 16,983 individuals from the Vanderbilt cohort imputed to the HRC panel (Supplementary Note). We imputed the GWAS meta-analysis summary statistics of each component to all-ancestries 1000 Genomes reference panel69 using the summary statistics imputation method implemented in the SS-Imp v0.5.5 software70. SNPs with imputation quality scores <0.7 were excluded. We then conducted inverse-variance meta-analyses to combine the association summary statistics from all components using METAL v2011-03-25 (ref. 71). We focused our meta-analyses on models AS20 (17 cohorts, nmax = 459,772) and AST20 (when time from last meal was available in the cohort; 12 cohorts, nmax = 417,290). For the FHS cohort, where no information was available for individuals with RG > 11.1 (an established threshold for 2hGlu concentration, which is a criterion for T2D diagnosis), AS11 model results were used. We also performed a meta-analysis using cohorts with time from the last meal available (AST20 model, 12 cohorts) combined with those lacking this information (AS20, five cohorts) to maximize the association power while taking into account T. We termed this analysis as AS20 + AST20 in the following text (17 cohorts, nmax = 458,862). A signal was considered to be associated with RG if it reached genome-wide significance (P < 5.0 × 108) in the meta-analysis of UKBB and other cohorts in either of our two models of interest (AS20) or (AST20) or in their combination (AS20 + AST20).

Of 133 signals detected in the European ancestry subset (Supplementary Note), 105 were directionally consistent in the UK Biobank and other contributing studies grouped together, providing the discovery validation (Supplementary Table 3). We report the P value from the combined model unless otherwise stated. Full results from all models are provided in Supplementary Table 3. We checked for nominal significance (P < 0.05) and directional consistency of the effect sizes for the selected lead SNPs in the combined model in UKBB results versus other cohort results. We further extended the check between UKBB results and meta-analysis of other cohorts including FG GWAS meta-analysis64, excluding overlapping cohorts. This meta-analysis conducted in METAL v2011-03-25 was sample size and P value based due to the measures being at different scales (natural logarithm-transformed RG and untransformed FG).

Cross-ancestry analyses and meta-analysis

We performed GWAS in non-European ancestry populations within UKBB that had a sample size of at least 1,500 individuals. These were Black (n = 7,644), Indian (n = 5,660), Pakistani (n = 1,747) and Chinese (n = 1,503). We further meta-analyzed our European ancestry cohorts with the cross-ancestry UKBB cohorts. The analyses were performed with BOLT-LMM v2.3 (ref. 72) and METAL v2011-03-25.

Sex-dimorphic analysis

To evaluate sex dimorphism in our results, we meta-analyzed the UKBB and the Vanderbilt cohort with the GWAMA v2.1 software73, which provides a 2 degrees of freedom (df) test of association assuming different effect sizes between the sexes. We evaluated the evidence for heterogeneity of allelic effects between sexes using Cochran’s Q statistic73,74. We considered a signal to show evidence of sex dimorphism if the sex-dimorphic P value was <5.0 × 108 and if the sex heterogeneity P value (1 df) was <0.05.

Clumping and conditional analysis

We performed a standard clumping analysis (PLINK v1.90 (ref. 75) criteria—P ≤ 5 × 108, r2 = 0.01, window-size = 1 Mb, 1000 Genomes Phase 3 data as linkage disequilibrium (LD) reference panel) to select a list of near-independent signals. We then performed a stepwise model selection analysis (approximate conditional analysis) to replicate the analysis using GCTA v1.93.0 (ref. 76) with the following parameters: P ≤ 5 × 108 and window-size = 1 Mb. We further checked for additional distinct signals by using a region-wide threshold of P ≤ 1.0 × 105 for statistical significance. For validation and comparison, we also performed direct conditional analyses using BOLT-LMM v2.3 (Supplementary Note). We filtered the direct conditional analysis results and BOLT-LMM results by checking the LD between all the variants within the same locus and keeping only independent signals (r2 < 0.01). LD was calculated from European reference haplotypes from the 1000 Genomes Project using LDlinkR v1.1.2 library.

GLP-1R pharmacological and structural analysis

Mini-Gs recruitment assay

Where stable cell lines were used (that is, Fig. 2a,b), WT or variant T-REx-SNAP-GLP-1R-SmBiT cells (Supplementary Note) were seeded in 12-well plates and transfected with 1 µg per well LgBiT-mini-Gs23 (a gift from N. Lambert, Medical College of Georgia). The following day, GLP-1R expression was induced by the addition of tetracycline (0.2 µg ml−1) to the culture medium for 24 h. For transient transfection assays (that is, Fig. 2j), HEK293T cells in poly-d-lysine-coated white 96-well plates were transfected using Lipofectamine 2000 with 0.05 µg per well WT or variant SNAP-GLP-1R-SmBiT plus 0.05 µg per well LgBiT-mini-Gs and the assay performed 24 h later. Cells were then resuspended in Hank’s balanced salt solution + furimazine (Promega) diluted 1:50 and seeded in 96-well half-area white plates, or the same reagent added to adherent cells for transient transfection assays. Baseline luminescence was measured over 5 min using a Flexstation 3 plate reader at 37 °C before the addition of ligand or vehicle. Agonists were applied at a series of concentrations spanning the response range. After agonist addition, luminescent signal was serially recorded over 30 min, and ligand-induced effects were quantified by subtracting individual well baselines. Signals were corrected for differences in cell number as determined by bicinchoninic acid assay.

Analysis of pharmacological data

Technical replicates within the same assay were averaged to give one biological replicate. For concentration-response assays (Fig. 2a,b), ligand-induced responses were analyzed by three-parameter fitting in Prism 8.0 (GraphPad Software). As a composite measure of agonism77, log10-transformed Emax/half maximal effective concentration (EC50) values were obtained for each ligand/variant response. The WT response was subtracted from the variant response to give ∆log(max/EC50), a measure of gain- or loss-of-function for the variant relative to WT. Log10-transformed surface expression levels were obtained for each variant relative to WT; these were then used to correct mini-Gs ∆log(max/EC50) values for differences in variant GLP-1R surface expression levels, by subtraction with error propagation. GLP-1R internalization responses were already normalized to surface expression within each assay. Statistical significance between WT and variant responses was inferred if the 95% confidence intervals for ∆log(max/EC50) did not cross zero77. Changes to the profile of receptor response between mini-Gs recruitment and GLP-1R internalization were inferred if P < 0.05 with unpaired t test analysis, with Holm–Sidak correction for multiple comparisons. For transient transfection assays (Fig. 2j), responses were normalized to WT response and log10 transformed to give Log ∆ responses. Additionally, the impact of differences in the surface expression on functional responses was determined by subtracting the log-transformed normalized expression level from the log-transformed normalized response.

Variance explained in RG effects by mini-Gs recruitment at coding GLP1R variants

RG (AST20 model) effects estimated in the UKBB study at 16 independent (r2 < 0.02) coding GLP1R variants (Supplementary Table 11) were regressed on mini-Gs coupling in response to glucagon-like peptide-1 (GLP-1) stimulation (corrected for surface expression) giving more weight to variants with higher minor allele frequency.

Computational methods including MD simulations

The active state structure of GLP-1R in complex with OXM29 and Gs protein was used to simulate WT GLP-1R and G168S, A316T and R421W. The WT systems and variants were prepared for MD simulations and equilibrated as reported78. AceMD3 3.3.0 (ref. 79) was used for production runs (four MD replicas of 500 ns each). AquaMMapS v1 analysis80 was performed on 10 ns-long MD simulations of GLP-1R(WT) and GLP-1R(A316T) in complex with OXM, with all the α carbons restrained; coordinates were written every 10 ps of simulation.

Credible set analysis

After selecting the signals with each region based on different meta-analysis results from AS20, AST20 and AS20 + AST20 models, we further performed a credible set analysis to obtain a list of potential causal variants for each of the 133 selected signals (Supplementary Note). We also calculated credible sets for the cross-ancestry meta-analysis and compared the results between the European ancestry-only and cross-ancestry meta-analyses.

DEPICT analysis

DEPICT uses GWAS summary statistics and computes a prioritization of genes in associated loci, which are used to prioritize tissues via enrichment analysis. DEPICT v1_rel 194 was used with default settings and RG GWAS summary statistics as input against a genetic background of SNPsnap data81 derived from the 1000 Genomes Project Phase 3 (ref. 82) to prioritize genes (Supplementary Note).

CELLECT analysis

CELLECT35 v1.0.0 and Cell type EXpression-specificity35 v1.0.0 are two toolkits for genetic identification of likely etiologic cell types using GWAS summary statistics and scRNA-seq data. Tabula Muris gene expression data83, a scRNA-seq dataset derived from 20 organs from adult male and female mice, was preprocessed as described in the Supplementary Note.

Genetically regulated gene expression analysis

We used MetaXcan (S-PrediXcan) v0.6.10 (ref. 84) to identify genes whose genetically predicted gene expression levels are associated with RG in a number of tissues. The tested tissues were chosen based on their involvement in glucose metabolism. Those were adipose visceral omentum, adipose subcutaneous, skeletal muscle, liver, pancreas and whole blood. Additionally, we tested ileum, transverse colon, sigmoid colon and adrenal gland because they were highlighted by DEPICT analysis. The models for the tissues of interest were trained with GTEx Version 7 transcriptome data from individuals of European ancestry85. The tissue transcriptome models and 1000 Genomes86 based covariance matrices of the SNPs used within each model were downloaded from PredictDB Data Repository. The association statistics between predicted gene expression and RG were estimated from the effects and their standard errors coming from the AS20 + AST20 model. Only statistically significant associations after Bonferroni correction for the number of genes tested across all tissues (P ≤ 9.0 × 107) were included in the table. Genes, where less than 80% of the SNPs used in the model were found in the GWAS summary statistics, were excluded due to the low reliability of the association result.

GARFIELD analysis

We applied the GWAS analysis of regulatory or functional information enrichment with LD correction (GARFIELD) tool v2 (ref. 87) on the RG AS20 + AST20 meta-analysis results to assess the enrichment of the RG-associated variants within functional and regulatory features. GARFIELD integrates various types of data from a number of publicly available cell lines. Those include genetic annotations, chromatin states, DNaseI hypersensitive sites, transcription factor binding sites, FAIRE-seq elements and histone modifications. We considered enrichment to be statistically significant if the RG GWAS P value reached 1 × 108 and the enrichment analysis P value was <2.5 × 105 (Bonferroni corrected for 2,040 annotations).

Genetic association with gut microbiome

We assessed the genetic overlap between RG GWAS results and those for gut microbiome. GWAS of microbiome profiles were publicly available and downloaded from https://mibiogen.gcc.rug.nl/. For each of the 210 taxa, the corresponding P values for the 133 RG GWAS SNPs and their proxies were extracted.

Genetic association with GLP-1 and gastric inhibitory polypeptide (GIP)

We assessed the genetic overlap between RG GWAS results and those for GLP-1 and GIP measured at 0 min and 120 min. We extracted the results for the 133 RG signals from the GWAS summary statistics for GLP-1 and GIP88.

eQTL colocalization analysis

We further performed colocalization analysis using whole blood gene eQTL data provided by eQTLGen37 and human pancreatic islets eQTLs provided by TIGER38 for all 133 RG signals. We used meta-analysis results from AS20, AST20 or AS20 + AST20 depending on the degree of association of each signal. Only cis-eQTL data from eQTLGen/TIGER were incorporated to reduce the computational burden. The COLOC2 Bayesian-based method89 was used to interrogate the potential colocalization between RG GWAS signals and the genetic control of gene expression. First, for each signal, depending on which model (AS20, AST20 or AS20 + AST20) had the lowest GWAS P value, we extracted the RG GWAS test statistics of all SNPs within ±1 Mb region around the 133 RG signals. Then, for each RG signal, we matched the eQTLGen/TIGER results with the RG results and performed COLOC2 analysis evaluating the posterior probability of the following five hypotheses for each region: H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both GWAS and eQTL association, but not colocalized and H4, both GWAS and eQTL association and colocalized. Only GWAS signals with at least one nearby gene/probe reaching posterior probability (H4) ≥ 0.5 were reported. We considered signals to have strong evidence of colocalization if posterior probability (H4) > 0.7.

Genetic association with human blood plasma N-glycosylation

We assessed genetic associations between 133 RG signals and 113 human blood plasma N-glycome traits using previously published genome-wide association summary statistics90. The description of the analyzed traits and details of the association analysis can be found elsewhere48. We considered associations to be significant when P < 0.05/113/133 = 3.3 × 106 (after Bonferroni correction). Association was considered suggestive when P < 104.

Genetic correlation analysis

We investigate the shared genetic component between RG and other traits, including glycemic ones, by performing genetic correlation analysis using the bivariate LD score regression method (LDSC v1.0.0)91. To reduce multiple testing burden, only the GWAS results of the AS20 + AST20 model were used. We used GWAS summary statistics available in LDhub92 and the Meta-Analysis of Glucose and Insulin-related Traits Consortium (MAGIC) website (https://www.magicinvestigators.org) for several traits including FG/FI64, HOMA-B/HOMA-IR93. In total, 228 different traits were included in the genetic correlation analysis with RG. We considered P ≤ 2.2 × 104 (Bonferroni correction for 228 traits) as the statistical significant level and P ≤ 0.05 as the nominal level.

MR analysis

We applied a bidirectional two-sample MR strategy (Supplementary Note) to investigate causality between RG and lung function, as well as T2D and lung function using independent genetic variants as instruments. We looked for evidence for the presence of a causal effect of RG and T2D on the following two lung function phenotypes: FEV1 and FVC in a two-sample MR setting. Genome-wide summary statistics for the lung function phenotypes were available94, involving cohorts from the SpiroMeta consortium and the UKBB study. T2D susceptibility variants and their effects were obtained from the largest-to-date T2D GWAS4.

To avoid confounding due to sample overlap, lung function summary statistics used as outcome data were those estimated in the SpiroMeta consortium alone. Similarly, when testing the effect of lung function on RG, RG genetic effects used as outcome data were estimated in all cohorts except UK Biobank. There was no sample overlap between the lung function and the T2D GWAS, thus allowing the use of T2D effects estimated in all contributing European ancestry studies. Genome-wide T2D summary statistics were available from a previous study3 to test for the causal effect of lung function on T2D. All analyses were conducted using the R software package TwoSampleMR v0.5.4 (ref. 95).

Causal effects were estimated using the inverse-variance weighted method, which combines the causal estimates of individual instrumental variants (Wald ratios; Supplementary Note) in a random-effects meta-analysis96. Instrument heterogeneity Q statistic P values are reported. As a sensitivity analysis, we used MR-Egger regression (Supplementary Note) to test for the presence of horizontal pleiotropy and obtain causal estimates that are more robust to the inclusion of invalid instruments97.

MVMR is an extension of MR that can be applied with either individual or summary-level data to estimate the effect of multiple, potentially related, exposures on an outcome98. We used the MVMR v0.3 R package to test whether the causal effects of RG and T2D on FVC are independent of possible confounders, such as physical activity and smoking. The same instrument selection criteria as described for the main MR analysis were used. CPD was instrumented by 54 (available out of the 58 in total) independent genome-wide significant variants, obtained from the GWAS discussed in ref. 58. LST served as a continuous proxy phenotype for physical activity from the recent study discussed in ref. 59 with 66 (available out of the 88 in total) independent genome-wide significant variants.

PRS analysis

We tested the ability of the RG genetic effects to predict RG, T2D and HbA1c. We compared that to the predictive power of T2D and FG genetic instruments by computing PRS for RG, T2D and FG and assessing their performance in predicting RG, T2D and HbA1c. PRS analyses require base and target data from independent populations. The base datasets in our analyses were UKBB-only estimates from the present RG GWAS, meta-analysis estimates of 32 studies for T2D15 and meta-analysis estimates from MAGIC for FG64. We used the second largest cohort, the Vanderbilt University Medical Center, as our target dataset. PRS construction and model evaluation (Supplementary Note) were done using the software PRSice v2.2.3 (ref. 99).

Clustering of the RG signals with results for 45 other phenotypes

We looked up the z scores (regression coefficient β divided by the standard error) of the distinct 133 RG signals in publicly available summary statistics of 45 relevant phenotypes (Supplementary Table 23). All variant effects were aligned to the RG risk allele. HapMap 2-based summary statistics were imputed using SS-Imp v0.5.5 (ref. 70) to minimize missingness. Missing summary statistics values were imputed via mean imputation. The resulting variant–trait association matrix was truncated to 2 s.d. to minimize the effect of outliers. We used agglomerative hierarchical clustering with Ward’s method to partition the variants into groups by their effects on the considered outcomes. The clustering analysis was performed in R using function hclust() from in-built stats package.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.