Conventional measurements of fasting and postprandial blood glucose levels investigated in genome-wide association studies (GWAS) cannot capture the effects of DNA variability on ‘around the clock’ glucoregulatory processes. Here we show that GWAS meta-analysis of glucose measurements under nonstandardized conditions (random glucose (RG)) in 476,326 individuals of diverse ancestries and without diabetes enables locus discovery and innovative pathophysiological observations. We discovered 120 RG loci represented by 150 distinct signals, including 13 with sex-dimorphic effects, two cross-ancestry and seven rare frequency signals. Of these, 44 loci are new for glycemic traits. Regulatory, glycosylation and metagenomic annotations highlight ileum and colon tissues, indicating an underappreciated role of the gastrointestinal tract in controlling blood glucose. Functional follow-up and molecular dynamics simulations of lower frequency coding variants in glucagon-like peptide-1 receptor (GLP1R), a type 2 diabetes treatment target, reveal that optimal selection of GLP-1R agonist therapy will benefit from tailored genetic stratification. We also provide evidence from Mendelian randomization that lung function is modulated by blood glucose and that pulmonary dysfunction is a diabetes complication. Our investigation yields new insights into the biology of glucose regulation, diabetes complications and pathways for treatment stratification.
Genetic factors are important determinants of glucose homeostasis and type 2 diabetes (T2D) susceptibility. Heritability of both fasting glucose (FG) and T2D is high, at 35–40%1 and 30–60%2, respectively. To date, more than 400 genetic loci have been associated with T2D3,4. Genome-wide association studies (GWAS) for glycemic traits in individuals without diabetes have identified genetic predictors of blood glucose, insulin and other metabolic responses during fasting or after oral or intravenous glucose challenge tests5,6,7,8. However, physiological glucose regulation involves responses to diverse nutritional and other stimuli that were, by design, omitted from such studies. Blood glucose is frequently measured at different times throughout the day in clinical practice and research studies (random glucose (RG)). While RG is inherently more variable than standardized measures, we reasoned that, across a very large number of individuals, it gives a more comprehensive representation of complex glucoregulatory processes occurring in different organ systems. Therefore, to identify and functionally validate genetic effects influencing RG, explore its relationships with other traits and diseases, and use these data to provide pathways for T2D treatment stratification, we performed a large-scale cross-ancestry GWAS meta-analysis for RG in individuals without diabetes.
RG GWAS expands the catalog of glycemia-related genetic associations
We undertook RG GWAS in 476,326 individuals without diabetes of European (n = 459,772) and other ancestries (n = 16,554) with adjustment for age, sex and time since last meal (where available), along with the exclusion of extreme hyperglycemia (RG > 20 mmol l−1) and individuals with diabetes (Supplementary Table 1). The covariate selection was done upon extensive phenotype modeling (Supplementary Note, Supplementary Table 2 and Extended Data Fig. 1a). We identified 150 distinct signals (P < 10−5) by fine mapping through conditional analysis within 120 loci reaching genome-wide significance (P < 5.0 × 10−8; Fig. 1a and Supplementary Tables 3 and 4). Fifty-three RG signals are reported for glycemic traits for the first time, greatly expanding our knowledge about the genetics of glycemia (Tables 1 and 2 and Supplementary Table 3). Adjustment for last meal timing (Extended Data Fig. 1b) did not change effect size estimates while enabling better power for the analysis. Application of glycated hemoglobin (HbA1c) cut point for diagnosing diabetes (HbA1c ≥ 6.5%) highlighted stronger associations at G6PC2 and GCK lead RG loci (Extended Data Fig. 1c), suggesting their roles in glucose set-point in normoglycemia9. Neither adjustment for body mass index (BMI), nor a more stringent hyperglycemia cut-off (RG > 11.1 mmol l−1; Extended Data Fig. 1d,e) materially changed the magnitude and significance of the RG effect estimates, although when all covariate models were individually applied, 11 additional signals at genome-wide significance were identified (Table 2 and Supplementary Table 5). Despite previous misconceptions that RG is of limited value for genetic discovery because of its inherent variability, our RG GWAS demonstrates that this trait variability has a clear genetic component.
A number of signals identified in individuals of European ancestry showed nominal significance (P < 0.05) in other ancestry groups, including new loci MANSC4/KLHL42 in African, FAM46C and ACVR1C in Indian and RBMS1 in Chinese ancestry groups (Supplementary Table 3). All such signals, except rs540524 at G6PC2, rs183606969 at GCK and rs6006399 at MTMR3/HORMAD2, were directionally concordant across ancestries. At GCK, rs2908286 (r21000GenomesAllAncestries = 0.83 with rs2971670 lead in European ancestry individuals) was genome-wide significant in the African ancestry individuals alone (Supplementary Table 6). Cross-ancestry meta-analyses combining European and the other four ancestral groups revealed two new RG signals at RRNAD1 and PROX1 (Table 2 and Supplementary Table 6). Overall, while being only 16,554 individuals larger in sample size than the European ancestry meta-analysis, the cross-ancestry analysis expanded the new locus discovery for RG, confirming the potential of cross-ancestry studies for complex trait genetics.
The strongest associations with RG were detected at G6PC2 (P < 1.0 × 10−746) and GCK (P < 3.7 × 10−277), established loci for FG and with key roles in gluconeogenesis10 and glucose sensing11, respectively (Supplementary Table 3). Notably, only two-thirds of RG signals overlapped with T2D-associated loci (Extended Data Fig. 1f), including three new loci for glycemia (SCD5, RNF6 and TSHZ2). The direction of effects at these loci between RG, T2D and homeostasis model assessment of β-cell function/insulin resistance (HOMA-B/HOMA-IR)6 (Extended Data Figs. 1f,g and 2 and Supplementary Table 7) were consistent with their epidemiological correlation. We also discovered sex dimorphism at 13 RG loci, including male-specific PRDM16 and RSPO3, and female-specific SGIP1, SRRM3 and SLC43A2 (Table 2, Fig. 1a and Supplementary Tables 3 and 8). We conclude that sex dimorphism, characterizing over one-tenth of RG-associated loci, is a widespread feature of glucose metabolism.
Coding, rare and causal variants in RG variability
The lead variants at two new RG loci (NMT1 and RFX1) and three previously reported loci for FG (TET2, THADA and RREB1) were all coding common (minor allele frequency (MAF) ≥ 5 %) variants (Supplementary Table 3 and Extended Data Fig. 3). Additionally, lead RG-associated SNPs at glucagon-like peptide-1 receptor (GLP1R), neuronal differentiation 1 (NEUROD1) and ER degradation enhancing α-mannosidase like protein 3 (EDEM3) loci in our analysis were low-frequency (5% > MAF ≥ 1%) coding variants (Table 1, Supplementary Table 3 and Extended Data Fig. 3). NEUROD1 and EDEM3 are plausible candidates for glucose homeostasis, with the former reported for glucosuria12 and the latter linked to renal function13,14. Within the rare allele frequency range (1% > MAF ≥ 0.001%), we first identified 30 RG loci and validated seven in whole-exome sequencing (WES) UK Biobank (UKBB) data (Supplementary Note). These included noncoding, such as rs2096313127 at CAMK2B (Supplementary Table 9) and synonymous rs2232324 in G6PC2 variant associations (Table 2 and Supplementary Table 9). We expanded the annotation of coding nonsynonymous independent (r21000GenomesAllAncestries < 0.0010) rare variant signals associated with RG to nondeleterious new rs146886108 (Arg187Gln) in ANKH15, and deleterious, including three in G6PC2 with predicted (rs2232326) and established (rs138726309, rs2232323)16 effects (Supplementary Table 9). Thus, a range of coding and rare variants contributes to RG level variability and can be detected in very large genetic studies.
Next, we sought to pinpoint the most plausible set of causal variants by calculating 99% credible sets for each RG locus. In the European ancestry-only analysis, 15 RG signals were explained by one variant with a posterior probability of ≥99% of being causal, including low-frequency variants in GLP1R, G6PC2, MECOM and CCND2 (ref. 17), and common variants in LMO1 and CACNA2D3 (Fig. 1b and Supplementary Table 10a). For another 16 signals, such as at RMST, FOXN3 and ADRA2A, a lead variant had a posterior probability ≥80%. Credible sets at WIPI1, GCKR, TET2, RREB1 and RFX6 included coding common variants. RREB1 and RFX6 encode transcription factors implicated in the development and function of pancreatic β cells18,19. The credible sets were narrowed down for several signals in cross-ancestry RG meta-analysis (European ancestry median credible set size = 12.0 and cross-ancestry = 12.0), with improvements observed at DGKB and TP53INP1 lead signals (Supplementary Table 10b,c). These analyses highlight examples of validated and potential targets for therapeutic development15.
Characterization of RG-associated GLP1R coding variants provides a framework for T2D treatment stratification
Following annotation and definition of likely causal variants, for functional studies, we prioritized GLP1R, which encodes a class B1 GPCR (GLP-1R) important in blood glucose and appetite regulation and a well-established target of the T2D drugs exenatide (exendin-4) and semaglutide20. We used RG data to validate an experimental framework for predicting individual responses to GLP-1R agonists, as this would be a major asset in clinical practice and is currently lacking. Within GLP1R, the lead missense variant at rs10305492 (A316T) has a strong (0.058 mmol l−1 per allele) RG-lowering effect, second by size only to G6PC2 locus variants, and is also associated with FG/T2D21,22.
We functionally tested the impact of rs10305492 (A316T) and 16 other GLP1R coding variants detected in the UKBB dataset, with effect allele frequency ranging from common (G168S, rs6923761, PRG GWAS meta-analysis = 5.20 × 10−5) to rare (R421W, rs146868158, PRG GWAS meta-analysis = 0.036), by measuring GLP-1-induced recruitment of mini-Gαs23 in HEK293 cells stably expressing wild-type (WT) or variant GLP-1R. This approach captures the most proximal part of the Gαs-adenylate cyclase-cyclic adenosine monophosphate pathway, which links GLP-1R activation to insulin secretion. With correction for differences in cell surface expression determined using SNAP-tag labeling24, mini-Gs-coupling efficiency was indeed predictive of the RG effect for these variants (Fig. 2a and Supplementary Table 11), thereby linking experimentally measured GLP-1R function in vitro to blood glucose homeostasis. This relationship was assessed in UKBB WES data (Supplementary Note and Extended Data Fig. 4).
Focusing on the two directly genotyped GLP1R missense variants in UKBB, we also measured mini-Gs responses to several endogenous and pharmacological GLP-1R agonists, observing that A316T (rs10305492-A) showed increased responses and R421W (rs146868158-T) showed reduced responses, to all ligands except exendin-4 (both variants) and semaglutide (A316T only), in line with their RG effects (Fig. 2b). Interestingly, for late-stage T2D candidate tirzepatide, which has pronounced ‘biased agonism’ at GLP-1R25, the difference between A316T and R421W amounted to nearly tenfold difference in activity. The common G168S variant, with a relatively small RG-lowering effect (β = −0.0013, s.e. = 3.1 × 10−4), also showed increases in function with pharmacological agonist stimulation. As GLP-1R undergoes extensive agonist-induced endocytosis, a process that modulates the subcellular origin and temporal dynamics of receptor signaling26, we also assessed the endocytic characteristics of A316T, G168S and R421W variants using high content microscopy. Here the most notable observation was that agonist-induced GLP-1R endocytosis with R421W was normal despite its signaling deficit, suggesting a specific alteration to how this variant couples to downstream effectors24. These results, supported by RG data and clinical observations27,28, suggest that in vitro assessments can provide valuable insights into the optimal selection of GLP-1R treatment according to genotype.
Next, we performed molecular dynamics (MD) simulations of human GLP-1R bound to oxyntomodulin (OXM)29 to gain structural insights into the above-described GLP1R variant effects. A316T has a single amino acid substitution in the core of the receptor transmembrane (TM) domain (Fig. 2c) that leads to an alteration of the nearby hydrogen bond network that normally serves to stabilize the GLP-1R inactive state (Supplementary Video 1). Specifically, in A316T, residue T3165.46 replaces Y2423.45 (superscripts follow the study discussed in ref. 30 generic GPCR class B1 numbering system, where the number before the dot indicates the TM helix and the number after the dot refers to the sequence distance from the most conserved residue indicated by 50) in a persistent hydrogen bond with the backbone of P3125.42, one turn of the helix above T3165.46 (Fig. 2d,e and Supplementary Video 1). This triggers a local structural rearrangement that could transmit to the intracellular G-protein-binding site through TM3 and TM5, thereby enhancing G-protein coupling. A water molecule is close to position 5.46 in both A316T and WT (water cluster α5; Fig. 2f). Notably, the same water bridges the backbone of Y2413.44 and A3165.46 in WT or the backbone of Y2413.44 and the side chain of T3165.46 in A316T. Given the importance of conserved water networks in the activation of class A GPCRs31,32, the stability of the hydrated spot close to position 5.46 corroborates the importance of this site for GLP-1R effects. In analogy with A316T, simulations with the G168S variant indicated the formation of a stable new hydrogen bond between the side chain of residue S1681.63 and A1641.59, one turn above on the same helix (Fig. 2g and Supplementary Video 2). This moves the C-terminal end of TM1 closer to TM2 and reduces the overall flexibility of intracellular loop 1 (ICL1; Fig. 2h), altering the role of ICL1 in G-protein activation. In contrast to A316T and G168S, the site of variant R421W is consistent with persistent interactions with the G protein, and simulations predicted a propensity of R421W to interact with a different region of the G-protein β-subunit compared to WT (Fig. 2i). These results capture the full range of structural features in the current active GLP-1R models and provide clear clues about the dynamics of A316T and other GLP-1R variants, compared to early models that did not benefit from the structural insights obtained from cryo-electron microscopy22.
For a broader view of the impact of GLP1R coding variation, we screened an additional 178 missense variants identified from exome sequencing33 for exendin-4-induced mini-Gs coupling and endocytosis by transient transfection in HEK293 cells (Supplementary Note, Fig. 2j,k and Supplementary Table 12). In total, 110 variants showed a reduced response in either or both pathways (‘LoF1’) and 67 displayed a specific response deficit that was not fully explained by differences in GLP-1R surface expression (‘LoF2’). Many of these defects were larger than in the analysis in Fig. 2a, with a major loss of GLP-1R function a likely consequence, meaning that patients carrying these variants are less likely to benefit from GLP-1R agonist drug treatment.
Functional annotation of RG associations and intestinal health
Previous T2D and glycemic trait GWAS have primarily implicated pancreatic, adipose and liver tissues3. We performed a range of complementary functional annotation analyses by leveraging our RG GWAS results to identify additional cell and tissue types with etiological roles in glucose metabolism. Data-driven expression prioritized integration for complex traits (DEPICT)34, which predicts enriched tissue types from prioritized gene sets, highlighted intestinal tissues including ileum and colon, as well as pancreas, adrenal glands5, adrenal cortex and cartilage (false discovery rate, FDR < 0.20; Fig. 3a,b and Supplementary Table 13). Similarly, CELL type expression-specific integration for complex traits (CELLECT)35, which facilitates cell type prioritization based on single-cell RNA-sequencing (scRNA-seq) datasets, identified large intestinal tissue as second-ranked only to pancreatic cell types (Fig. 4 and Supplementary Table 14). Interestingly, RG variants were related particularly to enriched expression in pancreatic polypeptide cells, exceeding even the more conventionally implicated insulin-secreting β cells. Supporting evidence was obtained from transcriptome-wide association study (TWAS) analysis, where we identified a total of 216 (119 unique) significant genetically driven associations across the ten tested tissues (Supplementary Table 15a); 51 (25 unique) of highlighted genes are located at genome-wide significant RG loci (Supplementary Table 15). TWAS signals in skeletal muscle5 showed the largest overlap with RG signals, such as GPSM1 (ref. 36) and WARS. The combined results from ileum and colon also showed high enrichment, including the new NMT1 and the established FADS1/3 and MADD genes (Fig. 1a and Supplementary Table 15). Expression quantitative trait locus (eQTL) colocalization analyses, using eQTLgen whole blood expression data from 31,684 individuals37,38 and the COLOC2 approach, identified 14 loci with strong links (posterior probability >70%) to gene expression data, including TET2 (ref. 39), KCNJ11, KLHL42, IKBKAP and CAMK1D, with transcriptional effects in pancreatic islets and kidney mesangial cells (Supplementary Table 16). Similar analyses of human pancreatic islets regulatory variation in the translational human pancreatic islet genotype tissue-expression resource (TIGER) dataset38 defined 58 loci with strong statistical support for colocalization of the effects on RG and tissue expression of ADCY5, RNF6, FADS1, MADD and STARD10 (ref. 40), in addition to KLHL42 and CAMK1D, with the latter overlapping in whole blood. Moreover, epigenetic annotations using the GARFIELD tool highlighted significant (P < 2.5 × 10−5) enrichment of RG-associated variants in the fetal large intestine, as well as blood, liver and other tissues (Extended Data Fig. 5 and Supplementary Table 17). Adult intestinal tissues are not available in GARFIELD except for colon. Prompted by multiple analyses highlighting a potential role for the digestive tract in glucose regulation, we assessed the overlap between our signals and those from the latest gut-microbiome GWAS41 and identified two genera sharing signals and direction of effect with RG at one locus: Collinsella and Lachnospiraceae-FCS020 at ABO-FUT2 (Fig. 1a and Supplementary Table 18). The ABO-FUT2 locus effects on RG could be mediated by the abundance of Collinsella/Lachnospiraceae-FCS02, producing glucose from lactose and galactose42. Collinsella genus affects gut permeability via interleukin-17A43 and shows higher abundance in individuals with T2D compared to those with normal glucose tolerance and individuals with prediabetes44. Moreover, weight loss decreases Collinsella among obese individuals with T2D45. Higher prevalence of the Lachnospiraceae family is associated with metabolic disorders, while genus Lachnospiraceae-FCS02 abundance shows an inverse correlation with serum triglycerides46. However, the mechanism of their enrichment has yet to be studied. This multi-omics annotation provided strong evidence for links between RG and intestinal health.
Finally, we observed associations at HNF1A47 with nine total plasma N-glycome traits48 at a Bonferroni-corrected threshold (Fig. 1a and Supplementary Table 19). These traits represent highly branched galactosylated sialylated glycans (attached to an α1-acid protein, an acute-phase protein49), known to lead to chronic low-grade inflammation50,51 and an increased risk of T2D52,53,54 that might be explained by the role of N-glycan branching of the glucagon receptor in glucose homeostasis55. In addition, ten glycans showed association with five RG loci (HNF1A, BAG1, PLUT) at a suggestive level of significance (Fig. 1a). Among them, three are attached to immunoglobulin G molecules49, and their increased relative abundances are associated with a lower risk of T2D56 and diminished inflammation status57. These observations suggest an overlap between networks regulating RG homeostasis and plasma-protein N-glycosylation.
Genetic relationships between RG and other metabolic or nonmetabolic traits
Using linkage-disequilibrium score regression analyses, we estimated the genetic correlations between RG and other phenotypes to quantify the shared genetic contribution. We detected positive genetic correlations between RG and squamous cell lung cancer (rg = 0.28, P = 0.0015) and lung cancer (rg = 0.12, P = 0.037; Fig. 5 and Supplementary Table 20), as well as inverse genetic correlations with lung function related traits, such as forced vital capacity (FVC, rg = −0.090, P = 0.0059) and forced expiratory volume in 1 second (FEV1, rg = −0.054, P = 0.017; Figs. 3a and 5 and Supplementary Table 20). To investigate this further, we conducted bidirectional Mendelian randomization (MR) analysis, which suggested a causal effect of RG and T2D on lung function, including FEV1 (βMR–RG = −0.66, P = 9.6 × 10−5; βMR–T2D = −0.049, P = 1.3 × 10−13) and FVC (βMR–RG = −0.60, P = 1.5 × 10−4; βMR–T2D = −0.062, P = 1.4 × 10−21), but not vice versa (RG βMR–FEV1 = −0.0048, P = 0.42; βMR–FVC = −0.01, P = 0.17 and T2D (βMR–FEV1 = −0.18, P = 0.040; βMR–FVC = −0.21, P = 0.040; Supplementary Table 21a,b). External factors, such as smoking or sedentary lifestyle, could cause lung function to decline, independent of RG and T2D effects. We implemented multivariable MR (MVMR) and found (Supplementary Table 21c) that RG and T2D causal effects on FVC are independent of both cigarettes smoked per day (CPD; that is, proxy for smoking58) and leisure screen time (LST; that is, proxy for physical activity59). This is important as previous observational studies have highlighted worsening lung function, as defined by FVC, in patients with T2D, but whether this was a causal relationship was not clear60,61. More recently, it was shown that patients with diabetes are at an increased risk of death from the viral infection COVID-19 (ref. 62), with pulmonary dysfunction contributing to mortality63. Our data confirm the causal effect of glycemic dysregulation on a decline in lung function as a new complication of diabetes.
Genome-wide genetic correlation analyses also showed a strong positive genetic correlation of RG with FG (rg = 0.88, P = 6.93 × 10−61; Fig. 4 and Supplementary Table 20). We meta-analyzed RG studies other than UKBB with FG GWAS summary statistics64, observing 79 signals reaching nominal significance that were directionally consistent in both UKBB and RG + FG (Supplementary Table 3), providing additional support to our RG findings. Given the large genetic overlap between RG, other glycemic traits and T2D, we evaluated the ability of a trait-specific polygenic risk score (PRS) to predict RG, T2D and HbA1c levels using UKBB effect estimates and the Vanderbilt cohort. The RG PRS explained 0.58% of the variance in RG levels when individuals with T2D were included (Supplementary Table 22), and 0.71% of the variance after excluding those who developed T2D within 1 year of their last RG measurement. The RG PRS performance was comparable to that of the FG loci PRS (0.38% versus 0.42% for T2D; 0.40% versus 0.44% for HbA1c), indicating shared genetic variability determining glycemic traits.
We previously highlighted diverse effects of FG and T2D loci on pathophysiological processes related to T2D development by grouping associated loci in relation to their effects on multiple phenotypes6. Cluster analysis of the RG signals with 45 related phenotypes identified three separate clusters (Fig. 1a, Supplementary Table 23 and Extended Data Figs. 6 and 7), including ‘metabolic syndrome’ cluster 1, with 28 loci also leading to higher waist-to-hip ratio, blood pressure, plasma triglycerides, insulin resistance (HOMA-IR) and coronary artery disease risk, as well as lower sex hormone binding globulin levels in both sexes and testosterone in males. Cluster 3 was characterized, in particular, by insulin secretory defects6. Cluster 2 showed a primary effect on insulin release versus insulin action3, but included a subcluster of 11 loci, which exert protective effects on inflammatory bowel disease, a relationship not previously reported. Moreover, cluster 2 was notable for generally reduced T2D risk in comparison to clusters 1 and 3, shaping the partial overlap between genetic determinants of glycemia and T2D that is known to exist65. This RG loci grouping gave innovative insights into the etiology of glucose regulation and associated disease states.
Leveraging data from 476,326 individuals, we have expanded by 44 the number of loci associated with glycemic traits. By using RG, our analysis integrates genetic contributions into a wider range of physiological stages, which thus far was not possible with standardized glycemic measures. Moreover, the greater statistical power obtained from large cross-ancestry meta-analysis improves confidence in identifying potentially causal variants, thereby helping to prioritize genes for more detailed functional analyses in the future. Our comprehensive functional characterization of GLP1R coding variation validates its role in blood glucose regulation and, more importantly, shows how GLP-1R-targeting drug responses depend on genetic variation. Notably, additional islet-expressed class B1 GPCRs identified in our current analysis and other glycemic trait/T2D GWAS, including GIPR, GLP2R (refs. 3,66) and SCTR21, are investigational targets for T2D treatment, which should be subjected to similar analysis. Our functional annotation analyses point to underexplored tissue mediators of glycemic regulation, with new evidence highlighting the role of the intestine. This observation supports the profound effects of gastric bypass surgery on T2D resolution67, as well as links between the intestinal microbiome and responses to several diabetes drugs68. In the near future, larger well-phenotyped datasets will enable high-dimensional GWAS investigations, disentangling the role of diet composition, physical activity and lifestyle on RG level variability in relation to genetic effects. Finally, through MR, we identified a causal effect of glucose levels and T2D on lung function, demonstrating the utility of this approach for corroborating findings from observational studies and elevating lung dysfunction as a new complication of diabetes.
All participating studies were approved by their appropriate institutional review boards or committees, and written informed consent was obtained from all study participants. For all the participating studies, approval was received to use their data in the present work. Study-specific ethics statements are provided in the references listed in Supplementary Table 1.
Phenotype definition and model selection for RG GWAS
We used RG (mmol l−1) measured in plasma or in whole blood (corrected to plasma level using the correction factor of 1.13). Individuals were excluded from the analysis if they had a diagnosis of T2D or were on diabetes treatment (oral or insulin). Individual studies applied further sample exclusions, including pregnancy, fasting plasma glucose ≥7 mmol l−1 in a separate visit, when available, and having type 1 diabetes (Supplementary Table 1). Details about RG modeling in the first set of six available cohorts (Supplementary Table 2) can be found in the Supplementary Note. For the GWAS, we included individuals based on the following two RG cut-offs: <20 mmol l−1 (20) to account for the effect of extreme RG values and <11.1 mmol l−1 (11), which is an established threshold for T2D diagnosis. We then evaluated the following six different models in GWAS according to covariates included and cut-offs used: (1) age (A) and sex (S), RG < 20 mmol l−1 (AS20); (2) age, sex and BMI (B), RG < 20 mmol l−1 (ASB20); (3) age and sex, RG < 11.1 mmol l−1 (AS11); (4) age, sex and BMI, RG < 11.1 mmol l−1 (ASB11); (5) age, sex, time since last meal (accounted for as T, T2 and T3), RG < 20 mmol l−1 (AST20) and (6) age, sex, T, T2 and T3 and BMI, RG < 20 mmol l−1 (ASTB20). Apart from the above, additional adjustments for study site and geographical covariates were also applied.
The GWAS meta-analysis of RG consisted of the following five components: (1) 37,239 individuals from ten European ancestry GWAS imputed up to the HapMap 2 reference panel; (2) 3,156 individuals from three European ancestry GWAS with Metabochip coverage; (3) 21,083 individuals from two European ancestry GWAS imputed up to 1000 Genomes reference panel; (4) 380,432 individuals of white European ancestry from the UKBB and (5) 16,983 individuals from the Vanderbilt cohort imputed to the HRC panel (Supplementary Note). We imputed the GWAS meta-analysis summary statistics of each component to all-ancestries 1000 Genomes reference panel69 using the summary statistics imputation method implemented in the SS-Imp v0.5.5 software70. SNPs with imputation quality scores <0.7 were excluded. We then conducted inverse-variance meta-analyses to combine the association summary statistics from all components using METAL v2011-03-25 (ref. 71). We focused our meta-analyses on models AS20 (17 cohorts, nmax = 459,772) and AST20 (when time from last meal was available in the cohort; 12 cohorts, nmax = 417,290). For the FHS cohort, where no information was available for individuals with RG > 11.1 (an established threshold for 2hGlu concentration, which is a criterion for T2D diagnosis), AS11 model results were used. We also performed a meta-analysis using cohorts with time from the last meal available (AST20 model, 12 cohorts) combined with those lacking this information (AS20, five cohorts) to maximize the association power while taking into account T. We termed this analysis as AS20 + AST20 in the following text (17 cohorts, nmax = 458,862). A signal was considered to be associated with RG if it reached genome-wide significance (P < 5.0 × 10−8) in the meta-analysis of UKBB and other cohorts in either of our two models of interest (AS20) or (AST20) or in their combination (AS20 + AST20).
Of 133 signals detected in the European ancestry subset (Supplementary Note), 105 were directionally consistent in the UK Biobank and other contributing studies grouped together, providing the discovery validation (Supplementary Table 3). We report the P value from the combined model unless otherwise stated. Full results from all models are provided in Supplementary Table 3. We checked for nominal significance (P < 0.05) and directional consistency of the effect sizes for the selected lead SNPs in the combined model in UKBB results versus other cohort results. We further extended the check between UKBB results and meta-analysis of other cohorts including FG GWAS meta-analysis64, excluding overlapping cohorts. This meta-analysis conducted in METAL v2011-03-25 was sample size and P value based due to the measures being at different scales (natural logarithm-transformed RG and untransformed FG).
Cross-ancestry analyses and meta-analysis
We performed GWAS in non-European ancestry populations within UKBB that had a sample size of at least 1,500 individuals. These were Black (n = 7,644), Indian (n = 5,660), Pakistani (n = 1,747) and Chinese (n = 1,503). We further meta-analyzed our European ancestry cohorts with the cross-ancestry UKBB cohorts. The analyses were performed with BOLT-LMM v2.3 (ref. 72) and METAL v2011-03-25.
To evaluate sex dimorphism in our results, we meta-analyzed the UKBB and the Vanderbilt cohort with the GWAMA v2.1 software73, which provides a 2 degrees of freedom (df) test of association assuming different effect sizes between the sexes. We evaluated the evidence for heterogeneity of allelic effects between sexes using Cochran’s Q statistic73,74. We considered a signal to show evidence of sex dimorphism if the sex-dimorphic P value was <5.0 × 10−8 and if the sex heterogeneity P value (1 df) was <0.05.
Clumping and conditional analysis
We performed a standard clumping analysis (PLINK v1.90 (ref. 75) criteria—P ≤ 5 × 10−8, r2 = 0.01, window-size = 1 Mb, 1000 Genomes Phase 3 data as linkage disequilibrium (LD) reference panel) to select a list of near-independent signals. We then performed a stepwise model selection analysis (approximate conditional analysis) to replicate the analysis using GCTA v1.93.0 (ref. 76) with the following parameters: P ≤ 5 × 10−8 and window-size = 1 Mb. We further checked for additional distinct signals by using a region-wide threshold of P ≤ 1.0 × 10−5 for statistical significance. For validation and comparison, we also performed direct conditional analyses using BOLT-LMM v2.3 (Supplementary Note). We filtered the direct conditional analysis results and BOLT-LMM results by checking the LD between all the variants within the same locus and keeping only independent signals (r2 < 0.01). LD was calculated from European reference haplotypes from the 1000 Genomes Project using LDlinkR v1.1.2 library.
GLP-1R pharmacological and structural analysis
Mini-Gs recruitment assay
Where stable cell lines were used (that is, Fig. 2a,b), WT or variant T-REx-SNAP-GLP-1R-SmBiT cells (Supplementary Note) were seeded in 12-well plates and transfected with 1 µg per well LgBiT-mini-Gs23 (a gift from N. Lambert, Medical College of Georgia). The following day, GLP-1R expression was induced by the addition of tetracycline (0.2 µg ml−1) to the culture medium for 24 h. For transient transfection assays (that is, Fig. 2j), HEK293T cells in poly-d-lysine-coated white 96-well plates were transfected using Lipofectamine 2000 with 0.05 µg per well WT or variant SNAP-GLP-1R-SmBiT plus 0.05 µg per well LgBiT-mini-Gs and the assay performed 24 h later. Cells were then resuspended in Hank’s balanced salt solution + furimazine (Promega) diluted 1:50 and seeded in 96-well half-area white plates, or the same reagent added to adherent cells for transient transfection assays. Baseline luminescence was measured over 5 min using a Flexstation 3 plate reader at 37 °C before the addition of ligand or vehicle. Agonists were applied at a series of concentrations spanning the response range. After agonist addition, luminescent signal was serially recorded over 30 min, and ligand-induced effects were quantified by subtracting individual well baselines. Signals were corrected for differences in cell number as determined by bicinchoninic acid assay.
Analysis of pharmacological data
Technical replicates within the same assay were averaged to give one biological replicate. For concentration-response assays (Fig. 2a,b), ligand-induced responses were analyzed by three-parameter fitting in Prism 8.0 (GraphPad Software). As a composite measure of agonism77, log10-transformed Emax/half maximal effective concentration (EC50) values were obtained for each ligand/variant response. The WT response was subtracted from the variant response to give ∆log(max/EC50), a measure of gain- or loss-of-function for the variant relative to WT. Log10-transformed surface expression levels were obtained for each variant relative to WT; these were then used to correct mini-Gs ∆log(max/EC50) values for differences in variant GLP-1R surface expression levels, by subtraction with error propagation. GLP-1R internalization responses were already normalized to surface expression within each assay. Statistical significance between WT and variant responses was inferred if the 95% confidence intervals for ∆log(max/EC50) did not cross zero77. Changes to the profile of receptor response between mini-Gs recruitment and GLP-1R internalization were inferred if P < 0.05 with unpaired t test analysis, with Holm–Sidak correction for multiple comparisons. For transient transfection assays (Fig. 2j), responses were normalized to WT response and log10 transformed to give Log ∆ responses. Additionally, the impact of differences in the surface expression on functional responses was determined by subtracting the log-transformed normalized expression level from the log-transformed normalized response.
Variance explained in RG effects by mini-Gs recruitment at coding GLP1R variants
RG (AST20 model) effects estimated in the UKBB study at 16 independent (r2 < 0.02) coding GLP1R variants (Supplementary Table 11) were regressed on mini-Gs coupling in response to glucagon-like peptide-1 (GLP-1) stimulation (corrected for surface expression) giving more weight to variants with higher minor allele frequency.
Computational methods including MD simulations
The active state structure of GLP-1R in complex with OXM29 and Gs protein was used to simulate WT GLP-1R and G168S, A316T and R421W. The WT systems and variants were prepared for MD simulations and equilibrated as reported78. AceMD3 3.3.0 (ref. 79) was used for production runs (four MD replicas of 500 ns each). AquaMMapS v1 analysis80 was performed on 10 ns-long MD simulations of GLP-1R(WT) and GLP-1R(A316T) in complex with OXM, with all the α carbons restrained; coordinates were written every 10 ps of simulation.
Credible set analysis
After selecting the signals with each region based on different meta-analysis results from AS20, AST20 and AS20 + AST20 models, we further performed a credible set analysis to obtain a list of potential causal variants for each of the 133 selected signals (Supplementary Note). We also calculated credible sets for the cross-ancestry meta-analysis and compared the results between the European ancestry-only and cross-ancestry meta-analyses.
DEPICT uses GWAS summary statistics and computes a prioritization of genes in associated loci, which are used to prioritize tissues via enrichment analysis. DEPICT v1_rel 194 was used with default settings and RG GWAS summary statistics as input against a genetic background of SNPsnap data81 derived from the 1000 Genomes Project Phase 3 (ref. 82) to prioritize genes (Supplementary Note).
CELLECT35 v1.0.0 and Cell type EXpression-specificity35 v1.0.0 are two toolkits for genetic identification of likely etiologic cell types using GWAS summary statistics and scRNA-seq data. Tabula Muris gene expression data83, a scRNA-seq dataset derived from 20 organs from adult male and female mice, was preprocessed as described in the Supplementary Note.
Genetically regulated gene expression analysis
We used MetaXcan (S-PrediXcan) v0.6.10 (ref. 84) to identify genes whose genetically predicted gene expression levels are associated with RG in a number of tissues. The tested tissues were chosen based on their involvement in glucose metabolism. Those were adipose visceral omentum, adipose subcutaneous, skeletal muscle, liver, pancreas and whole blood. Additionally, we tested ileum, transverse colon, sigmoid colon and adrenal gland because they were highlighted by DEPICT analysis. The models for the tissues of interest were trained with GTEx Version 7 transcriptome data from individuals of European ancestry85. The tissue transcriptome models and 1000 Genomes86 based covariance matrices of the SNPs used within each model were downloaded from PredictDB Data Repository. The association statistics between predicted gene expression and RG were estimated from the effects and their standard errors coming from the AS20 + AST20 model. Only statistically significant associations after Bonferroni correction for the number of genes tested across all tissues (P ≤ 9.0 × 10−7) were included in the table. Genes, where less than 80% of the SNPs used in the model were found in the GWAS summary statistics, were excluded due to the low reliability of the association result.
We applied the GWAS analysis of regulatory or functional information enrichment with LD correction (GARFIELD) tool v2 (ref. 87) on the RG AS20 + AST20 meta-analysis results to assess the enrichment of the RG-associated variants within functional and regulatory features. GARFIELD integrates various types of data from a number of publicly available cell lines. Those include genetic annotations, chromatin states, DNaseI hypersensitive sites, transcription factor binding sites, FAIRE-seq elements and histone modifications. We considered enrichment to be statistically significant if the RG GWAS P value reached 1 × 10−8 and the enrichment analysis P value was <2.5 × 10−5 (Bonferroni corrected for 2,040 annotations).
Genetic association with gut microbiome
We assessed the genetic overlap between RG GWAS results and those for gut microbiome. GWAS of microbiome profiles were publicly available and downloaded from https://mibiogen.gcc.rug.nl/. For each of the 210 taxa, the corresponding P values for the 133 RG GWAS SNPs and their proxies were extracted.
Genetic association with GLP-1 and gastric inhibitory polypeptide (GIP)
We assessed the genetic overlap between RG GWAS results and those for GLP-1 and GIP measured at 0 min and 120 min. We extracted the results for the 133 RG signals from the GWAS summary statistics for GLP-1 and GIP88.
eQTL colocalization analysis
We further performed colocalization analysis using whole blood gene eQTL data provided by eQTLGen37 and human pancreatic islets eQTLs provided by TIGER38 for all 133 RG signals. We used meta-analysis results from AS20, AST20 or AS20 + AST20 depending on the degree of association of each signal. Only cis-eQTL data from eQTLGen/TIGER were incorporated to reduce the computational burden. The COLOC2 Bayesian-based method89 was used to interrogate the potential colocalization between RG GWAS signals and the genetic control of gene expression. First, for each signal, depending on which model (AS20, AST20 or AS20 + AST20) had the lowest GWAS P value, we extracted the RG GWAS test statistics of all SNPs within ±1 Mb region around the 133 RG signals. Then, for each RG signal, we matched the eQTLGen/TIGER results with the RG results and performed COLOC2 analysis evaluating the posterior probability of the following five hypotheses for each region: H0, no association; H1, GWAS association only; H2, eQTL association only; H3, both GWAS and eQTL association, but not colocalized and H4, both GWAS and eQTL association and colocalized. Only GWAS signals with at least one nearby gene/probe reaching posterior probability (H4) ≥ 0.5 were reported. We considered signals to have strong evidence of colocalization if posterior probability (H4) > 0.7.
Genetic association with human blood plasma N-glycosylation
We assessed genetic associations between 133 RG signals and 113 human blood plasma N-glycome traits using previously published genome-wide association summary statistics90. The description of the analyzed traits and details of the association analysis can be found elsewhere48. We considered associations to be significant when P < 0.05/113/133 = 3.3 × 10−6 (after Bonferroni correction). Association was considered suggestive when P < 10−4.
Genetic correlation analysis
We investigate the shared genetic component between RG and other traits, including glycemic ones, by performing genetic correlation analysis using the bivariate LD score regression method (LDSC v1.0.0)91. To reduce multiple testing burden, only the GWAS results of the AS20 + AST20 model were used. We used GWAS summary statistics available in LDhub92 and the Meta-Analysis of Glucose and Insulin-related Traits Consortium (MAGIC) website (https://www.magicinvestigators.org) for several traits including FG/FI64, HOMA-B/HOMA-IR93. In total, 228 different traits were included in the genetic correlation analysis with RG. We considered P ≤ 2.2 × 10−4 (Bonferroni correction for 228 traits) as the statistical significant level and P ≤ 0.05 as the nominal level.
We applied a bidirectional two-sample MR strategy (Supplementary Note) to investigate causality between RG and lung function, as well as T2D and lung function using independent genetic variants as instruments. We looked for evidence for the presence of a causal effect of RG and T2D on the following two lung function phenotypes: FEV1 and FVC in a two-sample MR setting. Genome-wide summary statistics for the lung function phenotypes were available94, involving cohorts from the SpiroMeta consortium and the UKBB study. T2D susceptibility variants and their effects were obtained from the largest-to-date T2D GWAS4.
To avoid confounding due to sample overlap, lung function summary statistics used as outcome data were those estimated in the SpiroMeta consortium alone. Similarly, when testing the effect of lung function on RG, RG genetic effects used as outcome data were estimated in all cohorts except UK Biobank. There was no sample overlap between the lung function and the T2D GWAS, thus allowing the use of T2D effects estimated in all contributing European ancestry studies. Genome-wide T2D summary statistics were available from a previous study3 to test for the causal effect of lung function on T2D. All analyses were conducted using the R software package TwoSampleMR v0.5.4 (ref. 95).
Causal effects were estimated using the inverse-variance weighted method, which combines the causal estimates of individual instrumental variants (Wald ratios; Supplementary Note) in a random-effects meta-analysis96. Instrument heterogeneity Q statistic P values are reported. As a sensitivity analysis, we used MR-Egger regression (Supplementary Note) to test for the presence of horizontal pleiotropy and obtain causal estimates that are more robust to the inclusion of invalid instruments97.
MVMR is an extension of MR that can be applied with either individual or summary-level data to estimate the effect of multiple, potentially related, exposures on an outcome98. We used the MVMR v0.3 R package to test whether the causal effects of RG and T2D on FVC are independent of possible confounders, such as physical activity and smoking. The same instrument selection criteria as described for the main MR analysis were used. CPD was instrumented by 54 (available out of the 58 in total) independent genome-wide significant variants, obtained from the GWAS discussed in ref. 58. LST served as a continuous proxy phenotype for physical activity from the recent study discussed in ref. 59 with 66 (available out of the 88 in total) independent genome-wide significant variants.
We tested the ability of the RG genetic effects to predict RG, T2D and HbA1c. We compared that to the predictive power of T2D and FG genetic instruments by computing PRS for RG, T2D and FG and assessing their performance in predicting RG, T2D and HbA1c. PRS analyses require base and target data from independent populations. The base datasets in our analyses were UKBB-only estimates from the present RG GWAS, meta-analysis estimates of 32 studies for T2D15 and meta-analysis estimates from MAGIC for FG64. We used the second largest cohort, the Vanderbilt University Medical Center, as our target dataset. PRS construction and model evaluation (Supplementary Note) were done using the software PRSice v2.2.3 (ref. 99).
Clustering of the RG signals with results for 45 other phenotypes
We looked up the z scores (regression coefficient β divided by the standard error) of the distinct 133 RG signals in publicly available summary statistics of 45 relevant phenotypes (Supplementary Table 23). All variant effects were aligned to the RG risk allele. HapMap 2-based summary statistics were imputed using SS-Imp v0.5.5 (ref. 70) to minimize missingness. Missing summary statistics values were imputed via mean imputation. The resulting variant–trait association matrix was truncated to 2 s.d. to minimize the effect of outliers. We used agglomerative hierarchical clustering with Ward’s method to partition the variants into groups by their effects on the considered outcomes. The clustering analysis was performed in R using function hclust() from in-built stats package.
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Meta-analysis summary statistics for the GWAS presented in this manuscript are available on the MAGIC website (magicinvestigators.org) and through the NHGRI-EBI GWAS Catalog (https://www.ebi.ac.uk/gwas/downloads/summary-statistics, GCP ID: GCP000666; with study accession codes for Europeans-only meta-analysis: GCST90271557; cross-ancestry meta-analysis: GCST90271558; and sex-dimorphic meta-analysis: GCST90271559). UK Biobank individual-level data can be obtained through a data access application available at https://www.ukbiobank.ac.uk/. In this study, we made use of data made available by: 1000 Genomes project (https://www.genome.gov/27528684/1000-genomes-project); SNPsnap (https://data.broadinstitute.org/mpg/snpsnap/index.html); Tabula Muris (https://www.czbiohub.org/tabula-muris/); GTEx Consortium (https://gtexportal.org/home/); microbiome GWAS (https://mibiogen.gcc.rug.nl/); Human Gut Microbiome Atlas (https://www.microbiomeatlas.org); eQTLGen Consortium (https://www.eqtlgen.org/); TIGER expression data (http://tiger.bsc.es/) and LDHub database (http://ldsc.broadinstitute.org/ldhub/).
Santos, R. L. et al. Heritability of fasting glucose levels in a young genetically isolated population. Diabetologia 49, 667–672 (2006).
Almgren, P. et al. Heritability and familiality of type 2 diabetes and related quantitative traits in the Botnia study. Diabetologia 54, 2811–2819 (2011).
Scott, R. A. et al. An expanded genome-wide association study of type 2 diabetes in Europeans. Diabetes 66, 2888–2902 (2017).
Vujkovic, M. et al. Discovery of 318 new risk loci for type 2 diabetes and related vascular outcomes among 1.4 million participants in a multi-ancestry meta-analysis. Nat. Genet. 52, 680–691 (2020).
Chen, J. et al. The trans-ancestral genomic architecture of glycemic traits. Nat. Genet. 53, 840–860 (2021).
Dimas, A. S. et al. Impact of type 2 diabetes susceptibility variants on quantitative glycemic traits reveals mechanistic heterogeneity. Diabetes 63, 2158–2171 (2014).
Ingelsson, E. et al. Detailed physiologic characterization reveals diverse mechanisms for novel genetic loci regulating glucose and insulin metabolism in humans. Diabetes 59, 1266–1275 (2010).
Scott, R. A. et al. Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nat. Genet. 44, 991–1005 (2012).
Bahl, V. et al. G6PC2 controls glucagon secretion by defining the setpoint for glucose in pancreatic α-cells. Preprint at bioRxiv https://doi.org/10.1101/2023.05.23.541901 (2023).
Bosma, K. J. et al. Pancreatic islet β cell-specific deletion of G6pc2 reduces fasting blood glucose. J. Mol. Endocrinol. 64, 235–248 (2020).
Rutter, G. A., Georgiadou, E., Martinez-Sanchez, A. & Pullen, T. J. Metabolic and functional specialisations of the pancreatic β cell: gene disallowance, mitochondrial metabolism and intercellular connectivity. Diabetologia 63, 1990–1998 (2020).
Benonisdottir, S. et al. Sequence variants associating with urinary biomarkers. Hum. Mol. Genet. 28, 1199–1211 (2019).
Teumer, A. et al. Genome-wide association meta-analyses and fine-mapping elucidate pathways influencing albuminuria. Nat. Commun. 10, 4130 (2019).
Wuttke, M. et al. A catalog of genetic loci associated with kidney function from analyses of a million individuals. Nat. Genet. 51, 957–972 (2019).
Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50, 1505–1513 (2018).
Mahajan, A. et al. Identification and functional characterization of G6PC2 coding variants influencing glycemic traits define an effector transcript at the G6PC2-ABCB11 locus. PLoS Genet. 11, e1004876 (2015).
Pullen, T. J. & Rutter, G. A. Roles of lncRNAs in pancreatic β cell identity and diabetes susceptibility. Front. Genet. 5, 193 (2014).
Deng, Y. N., Xia, Z., Zhang, P., Ejaz, S. & Liang, S. Transcription factor RREB1: from target genes towards biological functions. Int. J. Biol. Sci. 16, 1463–1473 (2020).
Piccand, J. et al. Rfx6 maintains the functional identity of adult pancreatic β cells. Cell Rep. 9, 2219–2232 (2014).
Tomkin, G. H. Treatment of type 2 diabetes, lifestyle, GLP1 agonists and DPP4 inhibitors. World J. Diabetes 5, 636–650 (2014).
Spracklen, C. N. et al. Identification of type 2 diabetes loci in 433,540 East Asian individuals. Nature 582, 240–245 (2020).
Wessel, J. et al. Low-frequency and rare exome chip variants associate with fasting glucose and type 2 diabetes susceptibility. Nat. Commun. 6, 5897 (2015).
Wan, Q. et al. Mini G protein probes for active G protein-coupled receptors (GPCRs) in live cells. J. Biol. Chem. 293, 7466–7473 (2018).
Jones, B. et al. Targeting GLP-1 receptor trafficking to improve agonist efficacy. Nat. Commun. 9, 1602 (2018).
Willard, F. S. et al. Tirzepatide is an imbalanced and biased dual GIP and GLP-1 receptor agonist. JCI Insight 5, e140532 (2020).
Marzook, A., Tomas, A. & Jones, B. The interplay of glucagon-like peptide-1 receptor trafficking and signalling in pancreatic β cells. Front. Endocrinol. (Lausanne) 12, 678055 (2021).
Chedid, V. et al. Allelic variant in the glucagon-like peptide 1 receptor gene associated with greater effect of liraglutide and exenatide on gastric emptying: a pilot pharmacogenetics study. Neurogastroenterol. Motil. 30, e13313 (2018).
De Luis, D. A., Diaz Soto, G., Izaola, O. & Romero, E. Evaluation of weight loss and metabolic changes in diabetic patients treated with liraglutide, effect of rs6923761 gene variant of glucagon-like peptide 1 receptor. J. Diabetes Complications 29, 595–598 (2015).
Deganutti, G. et al. Dynamics of GLP-1R peptide agonist engagement are correlated with kinetics of G protein activation. Nat. Commun. 13, 92 (2022).
Wootten, D., Simms, J., Miller, L. J., Christopoulos, A. & Sexton, P. M. Polar transmembrane interactions drive formation of ligand-specific and signal pathway-biased family B G protein-coupled receptor conformations. Proc. Natl Acad. Sci. USA 110, 5211–5216 (2013).
Venkatakrishnan, A. J. et al. Diverse GPCRs exhibit conserved water networks for stabilization and activation. Proc. Natl Acad. Sci. USA 116, 3288–3293 (2019).
Yuan, S., Filipek, S., Palczewski, K. & Vogel, H. Activation of G-protein-coupled receptors correlates with the formation of a continuous internal water pathway. Nat. Commun. 5, 4733 (2014).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Pers, T. H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat. Commun. 6, 5890 (2015).
Timshel, P. N., Thompson, J. J. & Pers, T. H. Genetic mapping of etiologic brain cell types for obesity. eLife 9, e55851 (2020).
Ding, Q. et al. Genome-wide meta-analysis associates GPSM1 with type 2 diabetes, a plausible gene involved in skeletal muscle function. J. Hum. Genet. 65, 411–420 (2020).
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
Alonso, L. et al. TIGER: the gene expression regulatory variation landscape of human pancreatic islets. Cell Rep. 37, 109807 (2021).
Yang, L. et al. Effect of TET2 on the pathogenesis of diabetic nephropathy through activation of transforming growth factor β1 expression via DNA demethylation. Life Sci. 207, 127–137 (2018).
Van de Bunt, M. et al. Transcript expression data from human islets links regulatory signals from genome-wide association studies for type 2 diabetes and glycemic traits to their downstream effectors. PLoS Genet. 11, e1005694 (2015).
Kurilshikov, A. et al. Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat. Genet. 53, 156–165 (2021).
Lopera-Maya, E. A. et al. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project. Nat. Genet. 54, 143–151 (2022).
Carmichael, A. J., Arroyo, C. M. & Cockerham, L. G. Reaction of disodium cromoglycate with hydrated electrons. Free Radic. Biol. Med. 4, 215–218 (1988).
Zhang, X. et al. Human gut microbiota changes reveal the progression of glucose intolerance. PLoS ONE 8, e71108 (2013).
Frost, F. et al. A structured weight loss program increases gut microbiota phylogenetic diversity and reduces levels of Collinsella in obese type 2 diabetics: a pilot study. PLoS ONE 14, e0219489 (2019).
Vojinovic, D. et al. Relationship between gut microbiota and circulating metabolites in population-based cohorts. Nat. Commun. 10, 5813 (2019).
Segerstolpe, A. et al. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 24, 593–607 (2016).
Sharapov, S. Z. et al. Defining the genetic control of human blood plasma N-glycome using genome-wide association study. Hum. Mol. Genet. 28, 2062–2077 (2019).
Clerc, F. et al. Human plasma protein N-glycosylation. Glycoconj. J. 33, 309–343 (2016).
Novokmet, M. et al. Changes in IgG and total plasma protein glycomes in acute systemic inflammation. Sci. Rep. 4, 4347 (2014).
Schmidt, M. I. et al. Markers of inflammation and prediction of diabetes mellitus in adults (Atherosclerosis Risk in Communities study): a cohort study. Lancet 353, 1649–1652 (1999).
Dotz, V. et al. Plasma protein N-glycan signatures of type 2 diabetes. Biochim. Biophys. Acta Gen. Subj. 1862, 2613–2622 (2018).
Keser, T. et al. Increased plasma N-glycome complexity is associated with higher risk of type 2 diabetes. Diabetologia 60, 2352–2360 (2017).
Wittenbecher, C. et al. Plasma N-glycans as emerging biomarkers of cardiometabolic risk: a prospective investigation in the EPIC-Potsdam cohort study. Diabetes Care 43, 661–668 (2020).
Johswich, A. et al. N-glycan remodeling on glucagon receptor is an effector of nutrient sensing by the hexosamine biosynthesis pathway. J. Biol. Chem. 289, 15927–15941 (2014).
Lemmers, R. F. H. et al. IgG glycan patterns are associated with type 2 diabetes in independent European populations. Biochim. Biophys. Acta Gen. Subj. 1861, 2240–2249 (2017).
Liu, D. et al. Ischemic stroke is associated with the pro-inflammatory potential of N-glycosylated immunoglobulin G. J. Neuroinflammation 15, 123 (2018).
Liu, M. et al. Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use. Nat. Genet. 51, 237–244 (2019).
Wang, Z. et al. Genome-wide association analyses of physical activity and sedentary behavior provide insights into underlying mechanisms and roles in disease prevention. Nat. Genet. 54, 1332–1344 (2022).
Kopf, S. et al. Breathlessness and restrictive lung disease: an important diabetes-related feature in patients with type 2 diabetes. Respiration 96, 29–40 (2018).
Sonoda, N. et al. A prospective study of the impact of diabetes mellitus on restrictive and obstructive lung function impairment: the Saku study. Metabolism 82, 58–64 (2018).
Abdi, A., Jalilian, M., Sarbarzeh, P. A. & Vlaisavljevic, Z. Diabetes and COVID-19: a systematic review on the current evidences. Diabetes Res. Clin. Pract. 166, 108347 (2020).
Zhu, L. et al. Association of blood glucose control and outcomes in patients with COVID-19 and pre-existing type 2 diabetes. Cell Metab. 31, 1068–1077 (2020).
Lagou, V. et al. Sex-dimorphic genetic effects and novel loci for fasting glucose and insulin variability. Nat. Commun. 12, 24 (2021).
Marullo, L., El-Sayed Moustafa, J. S. & Prokopenko, I. Insights into the genetic susceptibility to type 2 diabetes from genome-wide association studies of glycaemic traits. Curr. Diab. Rep. 14, 551 (2014).
Saxena, R. et al. Genetic variation in GIPR influences the glucose and insulin responses to an oral glucose challenge. Nat. Genet. 42, 142–148 (2010).
Mingrone, G. et al. Metabolic surgery versus conventional medical therapy in patients with type 2 diabetes: 10-year follow-up of an open-label, single-centre, randomised controlled trial. Lancet 397, 293–304 (2021).
Whang, A., Nagpal, R. & Yadav, H. Bi-directional drug-microbiome interactions of anti-diabetics. EBioMedicine 39, 591–602 (2019).
1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Rueger, S., McDaid, A. & Kutalik, Z. Evaluation and application of summary statistic imputation to discover new height-associated loci. PLoS Genet. 14, e1007371 (2018).
Willer, C. J., Li, Y. & Abecasis, G. R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Loh, P. R. et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47, 284–290 (2015).
Magi, R. & Morris, A. P. GWAMA: software for genome-wide association meta-analysis. BMC Bioinformatics 11, 288 (2010).
Magi, R., Lindgren, C. M. & Morris, A. P. Meta-analysis of sex-specific genome-wide association studies. Genet. Epidemiol. 34, 846–853 (2010).
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Yang, J. et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat. Genet. 44, 369–375 (2012).
Kenakin, T. A scale of agonism and allosteric modulation for assessment of selectivity, bias, and receptor mutation. Mol. Pharmacol. 92, 414–424 (2017).
Zhao, P. et al. Activation of the GLP-1 receptor by a non-peptidic agonist. Nature 577, 432–436 (2020).
Harvey, M. J., Giupponi, G. & Fabritiis, G. D. ACEMD: accelerating biomolecular dynamics in the microsecond time scale. J. Chem. Theory Comput. 5, 1632–1639 (2009).
Cuzzolin, A., Deganutti, G., Salmaso, V., Sturlese, M. & Moro, S. AquaMMapS: an alternative tool to monitor the role of water molecules during protein-ligand association. ChemMedChem 13, 522–531 (2018).
Pers, T. H., Timshel, P. & Hirschhorn, J. N. SNPsnap: a web-based tool for identification and annotation of matched SNPs. Bioinformatics 31, 418–420 (2015).
1000 Genomes Project Consortium. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Tabula Muris Consrtium. et al. Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris. Nature 562, 367–372 (2018).
Barbeira, A. N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9, 1825 (2018).
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Delaneau, O. & Marchini, J., the 1000 Genomes Project Consortium. Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nat. Commun. 5, 3934 (2014).
Iotchkova, V. et al. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps. Nat. Genet. 48, 1303–1312 (2016).
Almgren, P. et al. Genetic determinants of circulating GIP and GLP-1 concentrations. JCI Insight 2, e93306 (2017).
Wallace, C. A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS Genet. 17, 1–11 (2021).
Sharapov, S. et al. Genome-wide association summary statistics for human blood plasma glycome. Zenodo. https://doi.org/10.5281/zenodo.1298406 (2018).
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Zheng, J. et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33, 272–279 (2017).
Fedko, I. O. et al. Genetics of fasting indices of glucose homeostasis using GWIS unravels tight relationships with inflammatory markers. Preprint at bioRxiv https://doi.org/10.1101/496802 (2018).
Shrine, N. et al. New genetic signals for lung function highlight pathways and chronic obstructive pulmonary disease associations across multiple ancestries. Nat. Genet. 51, 481–493 (2019).
Hemani, G. et al. The MR-base platform supports systematic causal inference across the human phenome. eLife 7, e34408 (2018).
Burgess, S. et al. Guidelines for performing Mendelian randomization investigations. Wellcome Open Res. 4, 186 (2019).
Bowden, J., Davey Smith, G. & Burgess, S. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44, 512–525 (2015).
Sanderson, E., Spiller, W. & Bowden, J. Testing and correcting for weak and pleiotropic instruments in two-sample multivariable Mendelian randomization. Stat. Med. 40, 5434–5452 (2021).
Choi, S. W. & O’Reilly, P. F. PRSice-2: Polygenic Risk Score software for biobank-scale data. GigaScience 8, giz082 (2019).
This research has been conducted using the UK Biobank Resource, project number 37685. We are supported by the following: the Medical Research Council (grants MR/L01341X/1 to P.E., MR/R010676/1 to B.J., MR/R010676/1 to A.T.); PHE (to P.E.); the UK Dementia Research Institute (to P.E. and I.T.); the Alzheimer’s Society (to P.E. and I.T.); the Alzheimer’s Research UK (to P.E. and I.T.); the National Health and Medical Research Council (NHMRC) Fellowship Schemes (552498 to B.B., 339446 and 619667 to G.M.); the NHMRC Ideas (grant 1184726 to P.M.S. and D.W.); the NHMRC (grants 1150083 to P.M.S. and D.W., 1154434 to P.M.S., 1155302 to D.W.); the Swedish Research Council (grants 2017-02688, 2020-02191 to E. Ahlqvist and 2019-01417 to M.d.H.); the Swedish Heart-Lung Foundation (grants 20200781 and 20200602 to M.d.H.); the British Heart Foundation (to A.G.), the European Commission (grants LSHM-CT-2007-037273 and HEALTH-F2-2013-601456 to A.G.); the Novo Nordisk Foundation (grants NNF15CC0018486 to A.G. and NNF18CC0034900 to T.H.P.); the Lundbeck Foundation (grant R190-2014-3904 to T.H.P.); VIAgenomics (grant SP/19/2/344612 to A.G.); the Wellcome Trust (grants 090532/Z/09/Z, 203141/Z/16/Z to H.W., 104955/Z/14/Z to A. David, 090532 to M.I.M., 098381 to M.I.M., 106130 to M.I.M., 203141 to M.I.M., 212259 to M.I.M., 205915/Z/17/Z to I.P.); UKRI Innovation-HDR-UK Fellowship (grant MR/S003061/1 to R.J.S.); European Union’s Horizon 2020 research and innovation program LONGITOOLS (grant H2020-SC1-2019-874739 to M.A.K., A.U., Z.B. and I.P.); the European Foundation for the Study of Diabetes (to B.J. and M.A.K.); the Imperial Post-CCT Post-Doctoral Fellowship (to B.J.); the Academy of Medical Sciences (to B.J.); the National Institute for Health Research Imperial NIHR Biomedical Research Center (to B.J. and T.M.T.); the Engineering and Physical Sciences Research Council (to B.J.); the Society for Endocrinology (to B.J.); the British Society for Neuroendocrinology (to B.J.); Research England ‘Expanding excellence in England’ (to I.B.); the Research Foundation-Flanders (to V.L.); the Diabetes UK (to V. Salem, A.T.; BDA, 20/0006307 to I.P.); the Russian Science Foundation (grant 19-15-00115 to S.S.); the NIDDK (grant U01-DK105535 to M.I.M.); European Federation for the Study of Diabetes (to A.T.); the Agence Nationale de la Recherche (PreciDIAB, grant ANR-18-IBHU-0001 to J.G.M. and I.P.); the University of Lille mobility grant (to J.G.M.); the People-Centered Artificial Intelligence Institute, University of Surrey (Z.B., M.A.K., A. Demirkan and I.P.); the World Cancer Research Fund (to I.P.); the World Cancer Research Fund International (grant 2017/1641 to I.P.); the Royal Society (grant IEC\R2\181075 to I.P. and C.A.R.); the European Union through the ‘Fonds européen de développement regional’ (FEDER; to I.P.); the ‘Conseil Régional des Hauts-de-France’ (Hauts-de-France Regional Council; to I.P.); the ‘Métropole Européenne de Lille’ (MEL, European Metropolis of Lille; to I.P.).
A.T. has received grant funding from Sun Pharmaceuticals and Eli Lilly. J.B.M. is an academic associate for Quest Diagnostics. They make an HbA1c assay. I.R.C. is an employee of New England Biolabs, a manufacturer and vendor of reagents for life science research. M.J.C. is Chief Scientist for Genomics England, a UK Government company. The views expressed in this article are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health. M.I.M. has served on advisory panels for Pfizer, Novo Nordisk and Zoe Global, has received honoraria from Merck, Pfizer, Novo Nordisk and Eli Lilly and research funding from Abbvie, AstraZeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, Novo Nordisk, Pfizer, Roche, Sanofi Aventis, Servier and Takeda. As of June 2019, M.I.M. is an employee of Genentech and a holder of Roche stock. P.M.S. received grant funding from Laboratoires Servier. P.M.S. and D.W. receive funding from Astex Pharmaceuticals and Novo Nordisk. They are both shareholders of Septerna, where P.M.S. is also a founder. P.M.S. is the director and D.W. the Monash Node leader of the Australian Research Council of Australia Center for Cryo-Electron Microscopy of Membrane Proteins that includes the following as Partner Organizations who provide cash or in-kind funding: Astex Pharmaceuticals, AstraZeneca, Boehringer Ingelheim, Catalyst Therapeutics, Dimerix Bioscience, Genentech, Novo Nordisk, Pfizer, Sanofi Aventis, Servier and Thermo Fisher Scientific. T.J. is now a GSK employee. W. Koenig reports consulting fees from AstraZeneca, Novartis, Pfizer, The Medicines Company, DalCor, Kowa, Amgen, Corvidia, Daiichi-Sankyo, Genentech, Novo Nordisk, Esperion, OMEICOS, LIB Therapeutics; speaker honoraria from Amgen, Novartis, Berlin-Chemie, Sanofi and Bristol-Myers Squibb; grants and nonfinancial support from Abbott, Roche Diagnostics, Beckmann and Singulex, all outside the submitted work. Y.S.A. is the owner of Maatschap PolyOmica and PolyKnomics BV, private organizations providing services, research and development in the field of computational and statistical, quantitative and computational (gen)omics. G.T., U.T. and K.S. are employees of deCODE genetics/Amgen. The other authors declare no competing interests.
Peer review information
Nature Genetics thanks Marijana Vujkovic and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended Data Fig. 1 RG trait models tested and sensitivity plots showing the correlations between association analyses beta coefficients and Z-scores from RG models in UKBB.
a, The models were labeled according to covariates included and RG cut-offs used. Individuals were included based on two RG cut-offs: <20 mmol/l to account for the effect of extreme RG values (20) and <11.1 mmol/l (11), which is an established threshold for T2D diagnosis. Hence, model 1 – AS20 refers to adjustment for age and sex, using a cut-off of <20 mmol/l, and so forth. b-e, For c, 4,138 individuals were excluded based on HbA1c ≥ 6.5%, in addition to the self-reported or diagnosed T2D cases. Variants with a heterogeneity P-value ≤ 0.05 (beta-coefficient plot) or a Z-score difference between the two models compared >3 (Z-score plots) are annotated. f, An enrichment plot showing the effect of RG signals (AS20 + AST20 model) on T2D. RG and T2D effect sizes are plotted along the y- and x-axes, respectively. Point size is proportional to the statistical significance of the variant for T2D, with red color indicating previously established signals and blue novel signals, respectively. The dashed line represents the line of best fit. Variants with T2D P-value in the lowest decile are labeled. g, An enrichment plot showing the effects of RG signals (AS20 + AST20 model) on HOMA-B and HOMA-IR. The effect sizes on HOMA-B and HOMA-IR are plotted along the y- and x-axes, respectively. Point size is proportional to the significance of the variant either in HOMA-B or HOMA-IR, depending on which trait has the smaller P value. Red color indicates previously established signals and blue indicates novel signals, respectively. Variants with suggestive significance (P < 5.0 × 10−6) are labeled.
Extended Data Fig. 2 Enrichment plots showing the effect of RG signals (AS20 + AST20 model) on glycemic and respiratory-related phenotypes.
a-i, Look-up of effects was done in previously published genome-wide association studies for HbA1c (a), fasting glucose (b), fasting insulin (c), type 2 diabetes (d), forced expiratory volume in one second (FEV1) (e), forced vital capacity (FVC) (f), FEV1/FVC (g), lung cancer (h) and squamous cell lung cancer (i). RG and other phenotype effect sizes are plotted along the y- and x-axes, respectively. Point size and color are proportional to the significance of the variant in each phenotype, with red indicating higher and blue lower significance, respectively. The dashed line represents the line of best fit. P < 5.0 × 10−8 was considered statistically significant after adjusting for multiple testing. Two-tailed P-values are reported. Variants with P-values in the lowest decile are labeled.
a-h, Plots are shown for GCKR (a), TET2 (b), RREB1 (c), NMT1 (d) and WIPI1 (e) loci and low-frequency coding variants at EDEM3 (f), NEUROD1 (g) and GLP1R (h) loci. The x-axis shows the chromosomal position, and the y-axis shows the uncorrected two-sided −log10 P values from the UKBB GWAS conducted using linear mixed-modeling in BOLT-LMM. Horizontal line corresponds to P = 5 × 10−8 and blue peaks show the recombination rate.
Extended Data Fig. 4 Association analysis of GLPR1 receptor function and random glucose effects of coding variants.
Minor allele frequency-weighted linear regression was used to test if mini-Gs response to GLP-1 stimulation significantly predicted point estimates of GLP1R variant effect on RG levels (AST20 βRG as estimated in whole-exome sequencing data from the UKBB study). Mini-Gs response to GLP-1 stimulation was corrected for variant surface expression (nmax = 22, exact n for each variant is provided in Supplementary Table 11). Error bars extend one standard error above and below the point estimate. Size of the dots is proportional to the weight applied in the regression model (Methods). The regression results (coefficient of determination R2 = 0.56, F(1, 14) = 20.1, P = 5.2 × 10−4) suggest that mini-Gs coupling in response to GLP-1 stimulation predicts the effect of these coding variants on RG levels (AST20 βRG = − 0.028; 95% CI = −0.042 to −0.015; P = 5.2 × 10−4). The gray shaded area around the regression line corresponds to the 95% confidence interval of predictions from the model. Variants in red showed no detectable surface expression (NDE) and are not included in regression analysis.
The analyses were performed using generalized linear modeling in GARFIELD software. We considered enrichment to be statistically significant if the RG GWAS P-value reached P = 1 × 10−8 and the enrichment analysis P-value was < 2.5 × 10−5 (Bonferonni corrected for 2,040 annotations).
Extended Data Fig. 6 Cluster analysis of effects (as Z-scores) of the distinct 143 RG signals on 45 relevant phenotypes.
All variant effects were aligned to the RG risk allele. HapMap2 based summary statistics were imputed using SS-Imp v0.5.565 to minimize missingness. Missing summary statistics values were imputed via mean imputation. The heatmap was produced using the Pheatmap package. For visualization, the Z-scores were truncated to the value corresponding to genome-wide significance (Z = 5.45), and 11 phenotypes with the lowest median absolute Z-scores were excluded.
Extended Data Fig. 7 Scatter plots of the standardized allelic effect estimates for selected trait pairs.
In each scatter plot, loci were assigned to the groups defined from the cluster analysis and highlighted by different colors. a, Corrected insulin response (CIR) vs. type 2 diabetes (T2D) (clusters 1a/b related to metabolic syndrome). b, Glycated hemoglobin (HbA1c) vs. inflammatory bowel disease (IBD) (cluster 2a) highlights the effects of loci with a protective role in IBD. c, Plasminogen activator inhibitor-1 (PAI-1) vs. CIR (cluster 3) highlights loci linked to insulin secretion defects.
Superpositions between WT (blue) and A316T (red) GLP-1R during MD simulations. GLP-1R residue position 5.46: in WT (blue), Y2423.45 persistently forms a hydrogen bond with the backbone of P3125.42; in A316T (red), this interaction is replaced by the hydrogen bond with the side chain of T3165.46. Red dotted lines indicate hydrogen bonds.
Superpositions between WT (blue) and G168S (gray) GLP-1R during MD simulations. GLP-1R residue position 1.63 and ICL1: G168S forms a persistent hydrogen bond between the S1681.63 side chain and the backbone of A1641.59 (red dotted line), not present in the WT (G168ICL1).
Supplementary Tables 1–23.
About this article
Cite this article
Lagou, V., Jiang, L., Ulrich, A. et al. GWAS of random glucose in 476,326 individuals provide insights into diabetes pathophysiology, complications and treatment stratification. Nat Genet 55, 1448–1461 (2023). https://doi.org/10.1038/s41588-023-01462-3
This article is cited by
Nature Reviews Endocrinology (2023)