Main

Diabetes mellitus is a huge public-health burden, with an estimated prevalence of 537 million adults worldwide in 2021, of whom more than 90% are affected by T2D6. The biological processes through which T2D develops are diverse and include impaired insulin secretion and insulin resistance. This aetiological heterogeneity leads to substantial variability in patient phenotypes, including age of disease onset, manifestation of disease complications and response to management strategies1,2. Although environment and lifestyle are well-established risk factors for T2D, heritability has been estimated to be 69% amongst individuals of 35–60 years of age7. Previous genome-wide association studies (GWASs) of T2D have identified more than 500 risk loci8,9, which showed variable patterns of association with clinical features mediated by effector genes acting through distinct molecular mechanisms that are often cell-type specific3,4. Through the newly established Type 2 Diabetes Global Genomics Initiative, we present findings from a very large meta-analysis of T2D GWAS data, comprising more than 2.5 million individuals of diverse ancestry—an increase of nearly threefold in the effective sample size compared with previous efforts8,9. We take advantage of the power afforded by this increased sample size and combine the GWAS data with emerging single-cell functional genomics data derived from disease-relevant tissues to uncover the aetiological heterogeneity of T2D. Furthermore, we construct partitioned polygenic scores (PSs)5 across multiple ancestry groups, and assess their association with T2D-related macrovascular outcomes and progression to microvascular complications.

Study overview

We assembled GWAS data, including 428,452 cases of T2D and 2,107,149 controls (Supplementary Fig. 1 and Supplementary Tables 1 and 2). We organized these GWASs into six subsets of genetically similar studies, which we refer to as ‘ancestry groups’ (Extended Data Fig. 1). Specifically, we considered: a European ancestry group (EUR, 60.3% of the effective sample size); an East Asian ancestry group (EAS, 19.8%); an admixed African American group with ancestry predominantly from West Africa and Europe (AFA, 10.5%); an admixed Hispanic group with ancestry predominantly from the Americas, West Africa and Europe (HIS, 5.9%); a South Asian ancestry group (SAS, 3.3%); and a South African ancestry group (SAF, 0.2%). Association analyses accounted for study-level population structure and relatedness, and adjusted for age and sex, where appropriate, and additional study-specific covariates (Supplementary Table 3 and Methods).

Discovery of T2D loci

We aggregated association summary statistics across GWASs through multi-ancestry meta-regression, implemented in MR-MEGA (ref. 10), which allows for allelic effect heterogeneity that is correlated with ancestry. We included three axes of genetic variation as covariates in the meta-regression model that separated GWASs from different ancestry groups (Extended Data Fig. 1 and Methods), which resulted in lower genomic control inflation than did a fixed-effects meta-analysis (λGC = 1.120 and λGC = 1.396, respectively).

The DIAMANTE Consortium previously advocated the use of a multi-ancestry genome-wide significance threshold (P < 5 × 10−9) to define loci, which takes account of the weaker linkage disequilibrium (LD) between single-nucleotide variants (SNVs) expected after multi-ancestry meta-analysis9. To gain insight into true positive signals meeting conventional genome-wide significance (P < 5 × 10−8) that would be overlooked at this more stringent threshold, we considered loci reported by the DIAMANTE Consortium, which contributed 39.5% of the effective sample size of the current study. Of 39 loci with association signals meeting 5 × 10−9 ≤ P < 5 × 10−8 in the DIAMANTE Consortium analysis, 36 (92.3%) attained multi-ancestry genome-wide significance with the larger sample size available to us in the current study (Supplementary Text). We therefore focused our downstream analyses on SNVs that met the conventional genome-wide significance threshold.

We identified a total of 1,289 distinct T2D association signals (P < 5 × 10−8) that were represented by independent (r2 < 0.05) index SNVs (Supplementary Fig. 2, Supplementary Table 4 and Methods). The 1,289 association signals mapped to 611 loci, of which 145 (23.7%) loci have not to our knowledge been previously reported in GWASs of T2D. At association signals that mapped to loci not previously reported for T2D, index SNVs were predominantly common (minor allele frequency (MAF) higher than 5% in at least one ancestry group) with odds ratios (ORs) lower than 1.05 (Supplementary Fig. 3).

Mechanistic clusters of T2D index SNVs

To understand the genetic contribution to phenotypic heterogeneity in T2D, we classified the 1,289 index SNVs according to their profile of associations (aligned to the T2D risk allele) with 37 cardiometabolic phenotypes. These included glycaemic traits, anthropometric measures, body fat and adipose tissue volume, blood pressure, levels of circulating plasma lipids, and biomarkers of liver function and lipid metabolism11,12,13,14,15,16,17,18,19 (Supplementary Table 5). We applied an unsupervised ‘hard clustering’ approach with imputation of missing phenotype associations, which identified eight non-overlapping but exhaustive subsets of index SNVs with similar cardiometabolic profiles (Fig. 1, Table 1, Extended Data Fig. 2, Supplementary Fig. 4, Supplementary Tables 6 and 7 and Methods).

Fig. 1: Heat map of associations of 37 cardiometabolic phenotypes with 8 mechanistic clusters of index SNVs for T2D association signals.
figure 1

Each column corresponds to a cluster. Each row corresponds to a cardiometabolic phenotype. The ‘temperature’ of each cell represents the z-score (aligned to the T2D risk allele) of association of the phenotype with index SNVs assigned to the cluster. *Phenotype is adjusted for body mass index.

Table 1 Cardiometabolic profile, example loci and physiological effect of index SNVs at T2D association signals allocated to eight mechanistic clusters

We observed that the cardiometabolic features and loci of five of our identified clusters overlapped with those reported in previous efforts3,4,20,21, representing beta-cell dysfunction with a positive or negative association with proinsulin (PI), and insulin resistance mediated through obesity, lipodystrophy, and liver and lipid metabolism (Supplementary Table 8). T2D risk alleles at index SNVs in the two beta-cell-dysfunction clusters are associated with increased fasting glucose, two-hour glucose and glycated haemoglobin, and with decreased fasting insulin. Index SNVs in both clusters are also associated with PI, but with opposite directions of effect for the T2D risk allele. The clusters reflecting mechanisms of insulin resistance mediated through obesity, lipodystrophy, and liver and lipid metabolism include index SNVs that are associated with anthropometric measures and levels of circulating plasma lipids. T2D risk alleles at index SNVs in the obesity cluster are associated with increased body mass index (BMI), waist–hip ratio (WHR), body fat percentage and basal metabolic rate, and with decreased high-density lipoprotein (HDL) cholesterol. The lipodystrophy cluster comprises index SNVs for which T2D risk alleles are associated with increased fasting insulin, WHR, blood pressure and triglycerides, and with decreased body fat percentage, gluteofemoral adipose tissue (GFAT) volume and HDL cholesterol. T2D risk alleles at index SNVs assigned to the liver and lipid metabolism cluster are associated with increased liver fat and liver-related biomarkers, and with decreased low-density lipoprotein (LDL) cholesterol and total cholesterol.

By increasing the number of index SNVs in the clustering by nearly fourfold relative to previous efforts, we provide a more granular view of the biological processes through which T2D associations affect disease, and highlight three previously unreported clusters of signals with cardiometabolic profiles that are representative of metabolic syndrome, body fat and residual glycaemic effects. We observed significantly weaker allelic effects on T2D in these three clusters than in those previously reported (mean OR of 1.028 versus 1.033, P = 2.2 × 10−7), but there was no noticeable difference in disparity around the centroid between clusters (Extended Data Fig. 3, Supplementary Table 9 and Supplementary Fig. 5). T2D risk alleles at index SNVs assigned to the metabolic syndrome cluster are associated with increased fasting glucose, WHR, triglycerides and blood pressure, and with decreased HDL cholesterol, which together are used to define metabolic syndrome. T2D risk alleles in this cluster are also associated with increased fasting insulin, with accumulations of unhealthy fat depots (increased visceral adipose tissue (VAT) volume and liver fat) and with decreased GFAT volume. Previous investigations have shown that individuals with metabolic syndrome are at increased risk of T2D22, although Mendelian randomization studies indicate that a causal effect is driven by increased waist circumference and increased fasting glucose23. T2D risk alleles at index SNVs assigned to the body fat cluster are associated with increased abdominal subcutaneous adipose tissue volume, VAT volume and body fat percentage. Although the body fat cluster profile of associations with cardiometabolic phenotypes shares these features in common with obesity-mediated insulin resistance, index SNVs in the body fat cluster are not strongly associated with BMI, lipid levels or basal metabolic rate. Previous investigations have highlighted that body fat percentage is predictive of abnormal blood glucose in individuals with a healthy BMI24. Finally, T2D risk alleles at index SNVs assigned to the residual glycaemic cluster are most strongly associated with increased fasting glucose and glycated haemoglobin, but, unlike the two beta-cell-dysfunction clusters, are not associated with PI or decreased fasting insulin.

Clustering provides a framework to better understand the diverse physiological processes through which T2D develops and the shared biological pathways that drive genetic correlations with other insulin-resistance-related disorders, including gestational diabetes mellitus (GDM) and polycystic ovary syndrome (PCOS). T2D risk alleles at index SNVs showed a gradient of effects on insulin-related endophenotypes across clusters (Supplementary Text, Extended Data Fig. 4 and Supplementary Tables 10 and 11), representing a cline from insulin production and processing in the two beta-cell-dysfunction clusters through to insulin resistance that was most extreme in the lipodystrophy cluster. Index SNVs in the beta cell +PI cluster showed the strongest associations with GDM, whereas those in the obesity cluster were most strongly associated with PCOS (Supplementary Text, Extended Data Fig. 5 and Supplementary Table 12).

Regulatory processes underlying clusters

To gain insight into tissue-specific regulatory processes underpinning mechanistic clusters, we integrated T2D association signals with assay for transposase-accessible chromatin using sequencing (ATAC-seq) peaks from single-cell atlases of chromatin accessibility (CATLAS and DESCARTES) for 222 cell types derived from 30 human adult and 15 human fetal tissues25,26 and an additional 106 cell types from the human brain27 (Fig. 2, Supplementary Tables 13 and 14 and Methods).

Fig. 2: Heat map of cluster-specific enrichments of T2D associations for cell-type-specific regions of open chromatin derived from single-cell ATAC-seq peaks in adult and fetal tissue.
figure 2

a, Cell types (222 types) from 30 human adult tissues and 15 human fetal tissues. b, Cell types (106 types) from the human brain. In each panel, columns represent mechanistic clusters. Each row represents a cell type that was significantly enriched (Bonferroni correction for the number of cell types) for T2D associations in at least one cluster (indicated by an asterisk). The ‘temperature’ of each cell defines the magnitude of the log fold enrichment. The liver and lipid metabolism cluster is not presented because it includes only three T2D association signals and the model parameter estimates were unstable.

We observed significant enrichment for regions of open chromatin in fetal islets and adult neuroendocrine cells in pancreatic islets (alpha, beta, gamma and delta) in the beta cell +PI, beta cell −PI and residual glycaemic clusters. In addition, the residual glycaemic cluster was enriched in fetal and adult pancreatic ductal cells, whereas the beta cell −PI cluster was enriched in adult enterochromaffin cells—a type of enteroendocrine cell that has an essential role in regulating intestinal motility and secretion in the gastrointestinal tract28. Enterochromaffin cells are a major target for glucagon-like peptide 1 (GLP-1) and highly express the GLP-1 receptor, agonists of which are widely used as medications for T2D29 (Supplementary Text).

The obesity cluster was also significantly enriched for regions of open chromatin in adult pancreatic islets, although not as strongly as were the beta-cell-dysfunction clusters. Enrichment was observed only for alpha, gamma and delta cells, suggesting that there are alternative pathways through which islets affect the development of T2D, other than through the secretion of insulin from beta cells. The obesity cluster was further enriched in fetal adrenal gland cells (chromaffin cells and adrenal neurons), fetal heart cells (ventricular cardiomyocytes) and fetal kidney cells (metanephric cells). Previous studies have reported an enrichment of BMI loci or heritability for epigenomic annotations in pancreatic islets and adrenal gland30,31, consistent with our findings. In the human brain, the obesity cluster was significantly enriched for regions of open chromatin in cell types including intratelencephalic (IT) projecting neurons, somatostatin-positive (SST+) GABAergic inhibitory neurons and D1 medium spiny neurons. SST+ GABAergic neurons exist in the hypothalamus and regulate food intake32. D1 medium spiny neurons are a type of GABAergic neuron in the human striatum that expresses D1-type dopamine receptors; these neurons have been implicated in food motivation and the development of diet-induced obesity in mice33.

The remaining four clusters (lipodystrophy; metabolic syndrome; body fat; and liver and lipid metabolism) were not significantly enriched for regions of open chromatin in pancreatic islets. The lipodystrophy cluster was enriched only in adult adipocytes, which confirms previous reports in bulk adipose tissue4,20. Consistent with these results, association signals for WHR, triglycerides and HDL cholesterol, which are strongly affected by index SNVs in the lipodystrophy cluster, have been shown to be enriched in candidate cis-regulatory elements in adipocytes26. The metabolic syndrome cluster was enriched in cells that reside in the walls of blood vessels (adult pericytes and fetal endothelial cells), fetal kidney cells (mesangial cells) and fetal fibroblasts. Association signals for systolic and diastolic blood pressure, a key component of metabolic syndrome, have been shown to be enriched in candidate cis-regulatory elements in these cell types26. Endothelial dysfunction is not only a consequence of insulin resistance, but also impairs insulin signalling to further reduce insulin sensitivity, thereby providing a pathophysiological mechanism that links the metabolic and cardiovascular components of metabolic syndrome34. In human brain, the metabolic syndrome cluster was significantly enriched for regions of open chromatin in cell types including IT projecting neurons and SST+ GABAergic inhibitory neurons. IT projecting neurons are a type of glutamatergic excitatory pyramidal neuron in the cerebral cortex, and metabolic syndrome was previously associated with pyramidal neurons and GABAergic neurons in cell-type specificity analyses in a GWAS that examined genetic factors in metabolic syndrome35. We observed no significant enrichments in the body fat cluster or in the liver and lipid metabolism cluster.

Ancestry-correlated heterogeneity

Previous multi-ancestry GWASs have shown widespread heterogeneity in allelic effects at T2D association signals across ancestry groups9,36. We took advantage of the meta-regression model to partition heterogeneity into an ancestry-correlated component explained by three axes of genetic variation, and a residual component reflecting differences in environmental exposures (that are not correlated with ancestry) and/or study design (Supplementary Table 15). We observed 127 (9.9%) independent T2D association signals with significant evidence for ancestry-correlated heterogeneity (PHET < 3.9 × 10−5, Bonferroni correction for 1,289 signals). We would expect less than one signal to meet this threshold of significance, highlighting that ancestry-correlated heterogeneity is strongly enriched at T2D associations (one-sided binomial test P < 2.2 × 10−16). By contrast, we observed significant evidence of residual heterogeneity at only four (0.3%) association signals (one-sided binomial test P = 0.031). These results therefore suggest that differences in allelic effects at index SNVs are more strongly correlated with genetic ancestry than other factors that vary between GWASs.

We next sought to better understand the impact of genetic diversity on differences in allelic effects between GWASs at the 127 association signals with significant evidence of ancestry-correlated heterogeneity (Methods). For 118 (92.9%) signals, allelic effect sizes were most strongly associated with the first two axes of genetic variation, which reflect differences between AFA/EUR and EAS GWASs (AFA–EAS axis), and between AFA/EAS and EUR GWASs (AFA–EUR axis), respectively (Supplementary Text, Extended Data Figs. 1 and 6 and Supplementary Table 16).

We observed significant differences in mean z-scores for association between clusters for both the AFA–EAS axis (P = 4.1 × 10−6) and the AFA–EUR axis (P = 1.5 × 10−6). Index SNVs in the two beta-cell-dysfunction clusters were most positively associated with the AFR–EAS axis, indicating allelic effects on T2D that were greater in EAS GWASs than in AFA and EUR GWASs (Extended Data Fig. 7 and Supplementary Table 17). By contrast, index SNVs in the lipodystrophy and obesity clusters were most positively associated with the AFA–EUR axis, indicating allelic effects on T2D that were greater in EUR GWASs than in EAS and AFA GWASs. These results indicate that ancestry-correlated heterogeneity varies between mechanistic clusters, with allelic effects greatest for EAS GWASs at association signals assigned to clusters acting through beta-cell dysfunction and greatest for EUR GWASs at those assigned to clusters operating through insulin resistance.

Ancestry-correlated heterogeneity in allelic effects between GWASs is not driven by differences in allele frequency between ancestry groups, but can occur because of interaction between index SNVs and environmental and lifestyle factors, if not accounted for in the association analysis37. We observed substantial variation in the distribution of study-level mean BMI in T2D cases and controls across ancestry groups (Supplementary Fig. 6). Such variation could affect ancestry-correlated heterogeneity because, when cases and controls are selected from the extremes of the BMI distribution, the magnitude of allelic effect estimates at T2D signals acting through beta-cell dysfunction can be inflated38. We therefore extended the MR-MEGA meta-regression model to allow for allelic effect heterogeneity at index SNVs due to mean BMI in T2D cases and controls, in addition to axes of genetic variation (Methods).

After adjustment for study-level mean BMI in cases of T2D and in controls, only 24 association signals retained significant evidence of ancestry-correlated heterogeneity (P < 3.9 × 10−5), compared with 127 signals without adjustment (Supplementary Text and Supplementary Table 18). After adjustment for BMI, significant differences in mean z-scores for association between clusters for the AFA–EUR axis were maintained (P = 3.2 × 10−5 versus P = 1.5 × 10−6 without adjustment), whereas those for the AFA–EAS axis were not (P = 0.18 versus P = 4.1 × 10−6 without adjustment). Furthermore, after adjustment for BMI, the two beta-cell-dysfunction clusters were no longer strongly positively associated with the AFA–EAS axis (Extended Data Fig. 7 and Supplementary Table 19). Together, these results suggest that heterogeneity in allelic effects between EAS GWASs and EUR/AFA GWASs, which occur most often at association signals assigned to the beta-cell-dysfunction clusters, can be mostly accounted for by differences in the distributions of mean BMI in T2D cases and in controls between these ancestry groups.

Associations of partitioned PS with outcomes

The major complications in individuals with T2D are macrovascular outcomes including coronary artery disease (CAD), ischaemic stroke and peripheral artery disease, and microvascular outcomes, including end-stage diabetic nephropathy (ESDN) and proliferative diabetic retinopathy. We tested for association of a cluster-specific partitioned PS with these vascular outcomes in up to 279,552 individuals (including 30,288 cases of T2D) across five ancestry groups (AFA, EAS, EUR, HIS and SAS) from the All of Us Research Program, Biobank Japan and the Genes & Health study (Methods). These individuals were not included in the multi-ancestry meta-analysis, thus avoiding potential inflated type I error rates owing to overlap between the discovery and the testing datasets. To maximize sample size, we tested macrovascular outcomes in all individuals, adjusted for T2D status, and microvascular complications only in individuals with T2D (Methods and Supplementary Table 20). To assess the additional information afforded by the partitioned PS over an overall T2D PS, agnostic to cluster membership, we tested for association of each cluster-specific component of the partitioned PS after adjustment for the overall PS. Figure 3 provides an overview of the associations of each cluster-specific component of the partitioned PS with the five vascular outcomes across ancestry groups.

Fig. 3: Associations of cluster-specific components of the partitioned PS with five T2D-related vascular outcomes in up to 279,552 individuals from multiple ancestry groups.
figure 3

Summaries of the associations of each cluster-specific component of the partitioned PS with CAD, ischaemic stroke (IS), peripheral artery disease (PAD), ESDN and proliferative diabetic retinopathy (PDR). The height of each bar corresponds to the log-odds ratio (beta) per standard deviation of the PS, and the grey bar shows the 95% confidence interval. Analyses of T2D-related macrovascular complications (CAD, PAD and IS) were undertaken in all individuals, with adjustment for T2D status. Analyses of microvascular complications were undertaken in individuals with T2D only. *P < 0.05, nominal association; **P < 0.0063, Bonferroni correction for eight clusters. Exact P values are provided in Supplementary Table 21.

We observed a significant association (P < 0.0063, Bonferroni correction for eight clusters) of two components of the partitioned PS with CAD: a negative association with the beta cell +PI cluster (OR = 0.96 per standard deviation of the PS, P = 1.3 × 10−6) and a positive association with the obesity cluster (OR = 1.04, P = 0.00019). There was no evidence of heterogeneity in the effects of these two clusters on CAD across ancestry groups (Supplementary Fig. 7 and Supplementary Table 21). Notably, after adjustment for a CAD PS derived from a previously published multi-ancestry meta-analysis of CAD GWASs39, the positive CAD association with both components of the partitioned PS remained significant (Extended Data Fig. 8 and Supplementary Table 22): beta cell +PI cluster (OR = 0.96, P = 4.4 × 10−5) and obesity cluster (OR = 1.04, P = 0.00065). We also observed a significant positive association of the obesity cluster from the partitioned PS with peripheral artery disease (OR = 1.05, P = 0.00045), with no evidence of heterogeneity in effects across ancestry groups (Supplementary Fig. 8 and Supplementary Table 21). Across all three macrovascular outcomes, there was a general trend of negative association with the beta cell +PI cluster and positive association with the obesity cluster, although no cluster-specific components of the partitioned PS attained significance for ischaemic stroke (Supplementary Fig. 9 and Supplementary Table 21). There was no strong association of the overall T2D PS with CAD (P = 0.17), ischaemic stroke (P = 0.022) or peripheral artery disease (P = 0.77) after meta-analysis across ancestry groups. Together, these results highlight the advantages of the partitioned PS over an overall T2D PS for detecting associations with macrovascular outcomes, and provide insight into the biological processes that lead to their development.

We observed significant associations of two components of the partitioned PS with ESDN: a negative association with the beta cell +PI cluster (OR = 0.83, P = 0.00024) and a positive association with the obesity cluster (OR = 1.19, P = 0.00050). There was no evidence of heterogeneity in the effects of these two clusters across ancestry groups, (Supplementary Fig. 10 and Supplementary Table 21), and the overall PS was not strongly associated with ESDN (P = 0.048). By contrast, none of the cluster-specific components of the partitioned PS were associated with proliferative diabetic retinopathy. However, there was a strong positive association of the overall PS with this microvascular outcome (OR = 1.32, P = 1.1 × 10−9), with no evidence of heterogeneity in effects across ancestry groups (Supplementary Fig. 11 and Supplementary Table 21). Together, these results suggest that ESDN is associated with obesity and beta-cell dysfunction with opposite directions of effect, and confirm previous reports that proliferative diabetic retinopathy is driven by hyperglycaemia40 and therefore strongly associated with the overall burden of T2D risk variants.

Finally, we tested for associations of the cluster-specific components of the partitioned PS and the overall T2D PS with age of onset of T2D (Extended Data Fig. 9 and Methods). The overall PS was strongly associated with an earlier age of onset (1.15 years per standard deviation of the PS, P = 5.1 × 10−8), although the effects were highly heterogeneous across ancestry groups (Supplementary Fig. 12 and Supplementary Table 23). However, even after adjustment for the overall PS, the obesity cluster was significantly associated with an earlier age of onset (0.38 years, P = 1.4 × 10−7), with no evidence of heterogeneity across ancestry groups. These findings highlight the importance of obesity-related processes for the onset of T2D, in addition to the development of vascular complications.

Associations with vascular outcomes in clinical trials

To gain insight into the associations of the obesity and beta cell +PI clusters with a broader range of vascular outcomes, we assessed the performance of the partitioned PS (after adjustment for the overall PS) in prospective GWASs in up to 29,827 EUR individuals with T2D from six clinical trials from the Thrombolysis in Myocardial Infarction (TIMI) Study Group (Methods and Supplementary Table 24). We observed the strongest associations of cluster-specific components of the partitioned PS with risk of hospitalization for heart failure: positive with the obesity cluster (hazard ratio (HR) = 1.15 per standard deviation of the PS, P = 4.8 × 10−6) and negative with the beta cell +PI cluster (HR = 0.90, P = 0.00092). Amongst macrovascular outcomes, the beta cell +PI cluster was also negatively associated with cardiovascular death (HR = 0.90, P = 0.0020), major cardiovascular events (HR = 0.94, P = 0.0050) and myocardial infarction (HR = 0.94, P = 0.027). For microvascular outcomes, the two clusters showed associations with opposite directions of effect for albuminuria: obesity cluster (HR = 1.06, P = 0.012) and beta cell +PI cluster (HR = 0.95, P = 0.047). Across all outcomes, there was a general trend of positive association with the obesity cluster and negative association with the beta cell +PI cluster (Extended Data Fig. 10), consistent with the associations observed from our analyses of retrospective GWASs across ancestry groups.

Discussion

To better understand the aetiological heterogeneity of T2D across diverse populations, we assembled a large collection of T2D GWASs for six ancestry groups through the Type 2 Diabetes Global Genomics Initiative. By increasing the effective sample size by almost threefold compared with previous efforts, we identified a total of 611 loci attaining the conventional threshold of genome-wide significance (P < 5 × 10−8), 145 (23.7%) of which have not to our knowledge been previously reported. This conventional threshold is equivalent to a Bonferroni correction for the effective number of independent SNVs in EUR reference data41. Using empirical data from the 1000 Genomes Project, the DIAMANTE Consortium and others have advocated more stringent thresholds for multi-ancestry meta-analysis because the structure of LD is broken down across ancestry groups and the effective number of independent SNVs is increased9,42. In fact, our analyses suggest that loci meeting conventional genome-wide significance are unlikely to be false positive association signals, but instead are driven by index SNVs that have modest effects that require larger sample sizes to meet more stringent thresholds. We therefore recommend the use of this conventional threshold but advocate careful review of reported signals to ensure that associations are not driven by single studies or poorly imputed variants to protect against false positives.

Multi-ancestry meta-regression maximizes power to detect associations that are shared across ancestry groups by allowing for heterogeneity in allelic effects at index SNVs. MR-MEGA is not restricted to broad continental ancestry labels that can be used to reinforce the concept of fundamental genetic differences between groups43, but instead represents ancestry as continuous axes of genetic variation, which better reflect the continuum of human genetic diversity and demographic history44. Still, it is important to emphasize that our meta-analysis does not fully capture global genetic diversity, in particular underrepresented populations across Africa, South and Central America, the Middle East and Oceania. For example, 98.2% of the total effective sample size of individuals with the highest proportion of ancestry from Africa are African Americans. The ancestry of these individuals represents a cline of admixture that is predominantly from West Africa and is therefore not representative of other regions in Africa, where the level of genetic variation is equivalent to the differences observed between other continental groups43. Bolstering GWAS collections in these underrepresented populations remains an urgent priority for the human genetics research community and highlights the need for careful interpretation of results that does not generalize findings across ancestry groups that are sensitive to biased representation.

Within the landscape of the genetic architecture of T2D, we identified eight clusters of index SNVs with distinct profiles of associations with 37 cardiometabolic phenotypes, which defined pathophysiology-relevant groupings. The addition of previously unreported T2D signals identified through the multi-ancestry meta-analysis helped define three clusters that were not detected in previous clustering efforts3,4,20,21, with cardiometabolic profiles that are consistent with residual glycaemic effects, accumulations of body fat and metabolic syndrome. These previous efforts have implemented ‘soft clustering’ approaches, such as Bayesian non-negative matrix factorization, that generate weights for cluster membership for each index SNV4. The assignment of index SNVs to clusters is then determined given a threshold weight for cluster membership, allowing for the possibility that a T2D association signal affects disease through multiple pathophysiological pathways. However, depending on the threshold for cluster membership, some index SNVs will be unassigned. Bayesian non-negative matrix factorization also considers positive and negative associations with the same phenotype as independent variables, and most clustering methods cannot directly accommodate missing phenotype associations. To address these potential limitations, we implemented methodology that jointly conducts k-means clustering of index SNVs with powerful iterative multiple imputation of missing phenotype associations. In this ‘hard clustering’ approach, each index SNV is assigned to exactly one cluster. This has the potential disadvantage, therefore, that index SNVs with outlying or intermediate profiles of trait associations are ‘forced’ into a cluster that does not fit well. However, the previously unreported clusters that we identified in our hard clustering were not noticeably more disparate than the clusters reported previously, suggesting that we have not introduced substantial noise by forcing all SNVs into exactly one cluster. Ultimately, the choice of clustering approach may depend on the objectives of any downstream investigations.

Our analyses highlighted a significant excess of T2D association signals with ancestry-correlated heterogeneity, which is driven mainly by differences in allelic effects between AFA, EAS and EUR GWASs. The two beta-cell-dysfunction clusters are most strongly associated with the AFA–EAS axis, in which effects are typically larger in EAS GWASs than in those for other ancestry groups. These two clusters are also most strongly associated with reduced insulin secretion and lower insulin resistance. By contrast, the lipodystrophy and obesity clusters, which are characterized by reduced insulin sensitivity and higher insulin resistance, are most strongly associated with the AFA–EUR axis, in which effects are typically larger in EUR than in other ancestry groups. These observations are consistent with studies reporting differences in the pathogenesis of T2D between ancestry groups, whereby T2D is initiated mainly through increased insulin resistance in EUR individuals, but is characterized by reduced insulin secretion with lower insulin resistance in EAS individuals45,46. We have shown that most signals with ancestry-correlated heterogeneity can be explained by differences in the distribution of BMI in T2D cases and controls between ancestry groups. Furthermore, after adjustment for study-level mean BMI, we observe no difference in allelic effects between clusters along the AFA–EAS axis. This is consistent with previous studies that reported that body composition is the main determinant of variation in T2D pathogenesis between EAS and EUR individuals, because insulin sensitivity and beta-cell response are similar in the two ancestry groups after accounting for differences in BMI45,47.

We reveal—across multiple ancestry groups—significant associations of vascular outcomes with cluster-specific components of the partitioned PS after adjustment for the overall PS, which suggests that disease trajectories are associated with genetic burden in certain biological pathways that are consistent across diverse populations. Although the effect sizes of the cluster-specific components of the partitioned PS were small, they motivate future work to strengthen these effects through the identification of further T2D associations in larger sample sizes. Through integration with single-cell chromatin accessibility data across diverse cell types, they also enhance understanding of key biological processes driving heterogeneity in the clinical features of T2D phenotypes. For example, the obesity-cluster-specific component of the PS was positively associated with CAD and ESDN, and included index SNVs that were enriched for regions of open chromatin in fetal ventricular cardiomyocytes, fetal adrenal neuron, adult chromaffin cells in the adrenal gland and fetal metanephric cells. These findings are in line with the reported enrichments of CAD association signals for transcriptomic and epigenomic annotations in bulk tissues including the aorta and arteries, the heart and the adrenal gland39,48,49, and of renal function association signals in kidney-tissue-specific regulatory annotations50. Together, these findings provide a clear link to shared biological mechanisms that drive the development of T2D and other vascular diseases.

In conclusion, our findings show the value of integrating multi-ancestry GWASs of T2D and cardiometabolic traits with single-cell epigenomics across diverse tissues to disentangle the aetiological heterogeneity driving the development and progression of T2D across population groups. Improved understanding of the varied pathophysiological processes that link T2D to vascular outcomes could offer a route to genetically informed diabetes care and global opportunities for the clinical translation of findings from T2D GWASs.

Methods

Study-level analyses

Within each study, we assigned individuals to ancestry groups using self-report and genetic background (Supplementary Tables 1 and 2). Any individuals not assigned to an ancestry group were excluded as population outliers. Within each ancestry group-specific GWAS, we conducted quality control of genotype data and imputed up to reference panels from the Trans-Omics for Precision Medicine Program51, Haplotype Reference Consortium52, 1000 Genomes Project (phase 1, March 2012 release; phase 3, October 2014 release)53,54, or population-specific whole-genome sequencing55,56,57,58,59,60,61 (Supplementary Table 3). Studies imputed to reference panels mapped to GRCh38 (hg38) were lifted back to hg19 using the UCSC LiftOver tool (https://genome.ucsc.edu/cgi-bin/hgLiftOver). We excluded SNVs with poor imputation quality and/or minor allele count (MAC) < 5 (Supplementary Table 3).

Within each ancestry group-specific GWAS, we tested for association of each SNV with T2D through generalized linear (mixed) modelling, under an additive dosage of the minor allele, with adjustment for age and sex (where appropriate), and additional study-specific covariates (Supplementary Table 3). We used different strategies to account for population stratification and/or kinship: (i) exclude closely related individuals and adjust for principal components derived from a genetic relatedness matrix (GRM) as additional covariates; or (ii) incorporate a random effect for the GRM (Supplementary Table 3). Allelic effects and corresponding standard errors that were estimated from a linear mixed model were converted to the log-odds scale62. We corrected study-level association summary statistics for residual structure by the LD-score regression intercept63 (Supplementary Table 3) using an LD reference that we derived from ancestry-matched haplotypes from continental groups in the 1000 Genomes Project (phase 3, October 2014 release)54. We matched AFA GWASs to the ‘African’ continental group and HIS GWASs to the ‘American’ continental group.

Multi-ancestry meta-analyses

We analysed autosomal bi-allelic SNVs that overlap reference panels from the 1000 Genomes Project (phase 3, October 2014 release)54 and the Haplotype Reference Consortium52. We considered SNVs with MAF > 0.5% in at least one of the five continental groups in the 1000 Genomes Project (phase 3, October 2014 release)54. We excluded SNVs that differed in allele frequency by more than 20% when comparing reference panels in the same subsets of haplotypes.

We used meta-regression, implemented in MR-MEGA10, to aggregate association summary statistics across GWASs. MR-MEGA models allelic effect heterogeneity that is correlated with genetic ancestry by including axes of genetic variation as covariates in the meta-regression model to capture diversity between GWASs. We used SNVs reported in all studies to construct a distance matrix of differences in mean effect allele frequency between each pair of GWASs. We implemented multi-dimensional scaling of the distance matrix to obtain three principal components that represent axes of genetic variation to separate GWASs across ancestry groups (Extended Data Fig. 1).

For each SNV, we aggregated inverse-variance weighted allelic effects across GWASs through linear regression, including three axes of genetic variation as covariates. We tested for: (i) association with T2D allowing for ancestry-correlated allelic effect heterogeneity between GWASs; (ii) ancestry-correlated allelic effect heterogeneity between GWASs (defined by the axes of genetic variation); and (iii) residual allelic effect heterogeneity between GWASs. MR-MEGA is a meta-regression approach, and therefore does not produce an allelic effect estimate because this is allowed to vary with the axes of genetic variation. Consequently, we also aggregated association summary statistics across GWASs through fixed-effects meta-analysis (inverse-variance weighting of allelic effects) using METAL64. To assess the extent of residual structure between GWASs, we calculated the genomic control inflation factor65 for the multi-ancestry meta-regression and the fixed-effects meta-analysis. We considered only those SNVs reported in at least five GWASs for downstream interrogation.

Defining T2D signals and loci

We identified all SNVs attaining genome-wide significance (P < 5 × 10−8) for association with T2D from the multi-ancestry meta-regression. Clumps were formed around index variants, which were selected using a greedy algorithm in PLINK v.1.9 (ref. 66), after ranking SNVs by ascending P value. SNVs less than 5 Mb from an index SNV were assigned to the clump if r2 > 0.05 in at least one of the five continental groups from the 1000 Genomes Project (phase 3, October 2014 release)54. Index SNVs separated by less than 1 Mb were assigned to the same locus. Each locus was then defined as mapping 500 kb up- and downstream of index SNVs contained within it. We considered the locus to have been previously reported if it contained variants discovered in published large-scale T2D GWASs at genome-wide significance.

Ancestry-group-specific meta-analyses

We aggregated association summary statistics across GWASs from the same ancestry group through fixed-effects meta-analysis (inverse-variance weighting of allelic effects) using METAL64. We estimated the mean effect allele frequency across GWASs from each ancestry group, weighted by the effective sample size of the study. We generated forest plots of association summary statistics of index SNVs across ancestry groups using the R package meta (https://cran.r-project.org/package=meta/).

Defining clusters of T2D index SNVs with distinct cardiometabolic profiles

We considered cardiometabolic-related quantitative phenotypes that are used to define T2D status and/or are associated with risk of T2D or complications. We excluded phenotypes for which GWAS summary statistics were available only after imputation to reference panels from the International HapMap Project67 because they did not provide sufficient coverage of SNVs included in the multi-ancestry meta-analysis. We considered the largest available GWAS meta-analysis (ancestry-specific or multi-ancestry) that provided the following association summary statistics for each SNV: effect allele, other allele, allelic effect and corresponding standard error (Supplementary Table 5). We re-aligned the effect estimate to the T2D risk allele from the fixed-effects multi-ancestry meta-analysis, denoted βij for the jth index SNV and the ith phenotype. We then calculated a sample size corrected z-score, given by \({Z}_{ij}={\beta }_{ij}/\left(\sqrt{{N}_{i}}{s}_{ij}\right)\), where sij is the standard error of the effect estimate of the jth index SNV and the ith phenotype, and Ni is the maximum sample size reported for the ith phenotype. Where association summary statistics were not reported, the z-score was set as ‘missing’.

We conducted k-means clustering of index SNVs with imputation of missing z-scores using the R package ClustImpute (https://cran.r-project.org/package=ClustImpute). For a pre-defined number of clusters, ClustImpute replaces missing z-scores at random from the marginal distribution for the phenotype in the first iteration and performs k-means clustering. In subsequent iterations, missing z-scores are updated, conditional on the current cluster assignment, so that correlations between phenotypes are considered. At each iteration, penalizing weights are imposed on imputed values and successively decreased (to zero) as the missing data imputation improves. Finally, we determined the ‘optimal’ number of clusters according to the majority rule across 27 indices of cluster performance68, implemented in the R package NbClust (https://cran.r-project.org/package=NbClust).

We tested for association of the ith phenotype with index SNVs across clusters in a linear regression model, given by \(E\left({Z}_{ij}\right)={\sum }_{k}{\gamma }_{ik}{C}_{jk}\), where Cjk is an indicator variable that takes the value 1 if the jth index SNV was assigned to the kth cluster and 0 otherwise. The strength or direction of the association of each phenotype with each cluster was then presented in a heat map, in which the ‘temperature’ was defined by the direction of the regression coefficient γik and the corresponding −log10 P value. Regression models were fitted using the glm function in R.

We extracted cardiometabolic phenotype z-scores from the final imputed dataset from ClustImpute. We calculated the Euclidean distance between the jth SNV and kth cluster centroid as

$${\delta }_{jk}=\sqrt{{\sum }_{i}{\left({Z}_{ij}-{\mu }_{ik}\right)}^{2}},$$

where Zij and μik are the z-score of the jth SNV and the location of the kth cluster centroid for the ith cardiometabolic phenotype. To assess cluster disparity, we also performed principal components analysis of cardiometabolic phenotype z-scores from the final imputed dataset using the R package factoextra (https://cran.r-project.org/package=factoextra).

Cluster-specific associations of index SNVs with T2D

We tested for association of T2D with index SNVs across clusters in a linear regression model, given by \(E\left({\beta }_{j}\right)={\sum }_{k}{\gamma }_{k}{C}_{jk}\), where Cjk is an indicator variable that takes the value 1 if the jth index SNV was assigned to the kth cluster and 0 otherwise, and weighted by the inverse of the variance of the allelic effect. We tested for heterogeneity in cluster effects on T2D by comparing the deviance of this model with that of \(E\left({\beta }_{j}\right)={\gamma }_{0}\), again weighted by the inverse of the variance of the allelic effect. To compare associations between previously reported clusters and previously unreported clusters, we replaced Cjk with an indicator variable that takes the value 1 if the jth index SNV was assigned to a previously reported cluster and 0 otherwise. Regression models were fitted using the glm function in R.

Enrichment of T2D associations for cell-type-specific regions of open chromatin within clusters

For each T2D association signal, we defined ‘null’ SNVs that mapped within 50 kb of the index SNV and were not in LD (r2 > 0.05) with the index SNV in any of the five continental groups from the 1000 Genomes Project (phase 3, October 2014 release)54. We defined an indicator variable, Yj, taking the value 1 if the jth SNV is an index SNV and 0 if the jth SNV is a null SNV. We mapped index SNVs and null SNVs to genic regions defined by the Ensembl Project (release 104)69, including protein-coding exons, and 3′ UTRs and 5′ UTRs. We defined indicator variables, \({G}_{j}^{{\rm{EXON}}}\), \({G}_{j}^{3{\rm{UTR}}}\) and \({G}_{j}^{5{\rm{UTR}}}\), which each take the value 1 if the jth SNV mapped to the respective genic annotation and 0 otherwise. We also mapped index SNVs and null SNVs to ATAC-seq peaks from single-cell atlases of chromatin accessibility (CATLAS and DESCARTES) for: 222 cell types derived from 30 human adult and 15 human fetal tissues25,26; and 106 cell types derived from human brain27. We defined an indicator variable, Xij, that takes the value 1 if the jth SNV mapped to an ATAC-seq peak for the ith cell type and 0 otherwise.

Within each cluster, we modelled enrichment of T2D associations for ATAC-seq peaks in the ith cell type, after accounting for genic annotations, in a Firth bias-reduced logistic regression, given by

$${f}^{-1}\left({Y}_{j}\right)={\alpha }_{0}+{\alpha }_{{\rm{EXON}}}{G}_{j}^{{\rm{EXON}}}+{\alpha }_{3{\rm{UTR}}}{G}_{j}^{3{\rm{UTR}}}+{\alpha }_{5{\rm{UTR}}}{G}_{j}^{5{\rm{UTR}}}+{\theta }_{i}{X}_{ij},$$

where f is the logit link function. In this expression, α0 is an intercept, αEXON, α3UTR and α5UTR are log fold enrichments of genic annotations, and θi is the log fold enrichment of ATAC-seq peaks in the ith cell type. We conducted a test of enrichment of the ith cell type by comparing the deviances of models in which θi= 0 and θi is unconstrained. We identified cell types with significant evidence of enrichment (P < 0.00023, Bonferroni correction for 222 cell types in adult and fetal tissues; P < 0.00047, Bonferroni correction for 106 cell types in the brain). All models were fitted using the R package logistf (https://cran.r-project.org/package=logistf).

Contribution of each axis of genetic variation to ancestry-correlated heterogeneity

For each index SNV, we calculated a z-score (beta/SE) for association with each axis of variation by aligning the effect from the meta-regression model to the T2D risk allele. For each index SNV, we identified the axis of genetic variation with the strongest association (greatest magnitude z-score).

Differences in ancestry-correlated heterogeneity between mechanistic clusters

We tested for differences in z-scores (beta/SE) for association of index SNVs in each cluster with the ith axis of genetic variation by comparing two linear models by ANOVA: (i) \({f}^{-1}\left({Z}_{ij}\right)={\tau }_{0i}\); and (ii) \({f}^{-1}\left({Z}_{ij}\right)={\sum }_{k}{\tau }_{ki}{C}_{jk}\). In these expressions: f is the identity link function; Zij is the z-score for the jth index SNV; Cjk is an indicator variable that takes the value 1 if the jth index SNV was assigned to the kth cluster and 0 otherwise; and τ0i and τki are regression coefficients. Regression models were fitted using the glm function in R.

Effect of BMI on ancestry-correlated and residual heterogeneity in allelic effects between GWASs

For each index SNV, we aggregated inverse-variance weighted allelic effects across GWASs by linear regression, implemented in MR-MEGA10, including as covariates: (i) three axes of genetic variation; (ii) mean BMI in controls; and (iii) mean BMI in T2D cases. After adjustment for BMI, we tested for: (i) ancestry-correlated allelic effect heterogeneity between GWASs; and (ii) residual allelic effect heterogeneity between GWASs. After adjustment, as outlined above, we re-assessed: (i) the contribution of each axis of genetic variation to ancestry-correlated heterogeneity; and (ii) the difference in ancestry-correlated heterogeneity between mechanistic clusters.

Cluster-specific partitioned PS analyses of vascular outcomes and age of T2D onset

We tested for association of cluster-specific components of the partitioned PS and an overall PS with T2D-related macrovascular outcomes (CAD, ischaemic stroke and peripheral artery disease), microvascular complications (ESDN and proliferative diabetic retinopathy) and age of T2D onset in participants from the All of Us Research Program (AoURP; AFA, EUR and HIS ancestry groups), Biobank Japan (BBJ; EAS ancestry group), and Genes & Health (G&H; SAS ancestry group). Cohort descriptions and details of sequencing and genotyping, quality control and phenotype derivation are provided in the Supplementary Methods.

We conducted analyses separately for each ancestry group in AoURP, BBJ and G&H. For each ancestry, we performed analyses for macrovascular outcomes using all individuals, irrespective of T2D status, and for microvascular complications in individuals with T2D only. For each analysis, we calculated the overall PS and cluster-specific partitioned PS for each individual, with each index SNV weighted by the allelic log-OR from the ancestry-specific meta-analyses. We did not include index SNVs with MAF < 1% in the PS. We also excluded index SNVs with poor imputation quality (r2 < 0.7) in BBJ and G&H, and those with extreme deviation from Hardy–Weinberg equilibrium (P < 10−6) in AoURP. We standardized the overall PS and each cluster-specific component of the partitioned PS to have mean zero and unit variance. We tested for association with each vascular outcome through generalized linear regression and with age of T2D onset through linear regression. For each outcome, we considered a model including the overall PS and then each cluster-specific component the partitioned PS adjusted for the overall PS. All association analyses were conducted using the glm function in R.

We adjusted association analyses with vascular outcomes for age, sex and the first 20 principal components. In BBJ, we also adjusted for recruitment phase and status of the registered common diseases (other than T2D) to account for ascertainment. We further adjusted analyses of macrovascular outcomes for T2D status. We also further adjusted analyses of microvascular complications for duration of T2D. In AoURP, we defined age as age at last hospital visit. In BBJ, we defined age as age at first record. In G&H, we defined age as age at diagnosis for T2D cases and age at last follow-up for controls. For CAD, we also conducted sensitivity analyses by including, as an additional covariate, a CAD PS from the largest published multi-ancestry CAD GWAS39. The PS was constructed from index SNVs for 241 conditionally independent CAD associations, weighted by the multi-ancestry allelic log-OR (ancestry-specific effects were not available), and standardized to have mean zero and unit variance. We adjusted association analyses with age of T2D onset for sex and the first 20 principal components. In BBJ, we also adjusted for recruitment phase and status of the registered common diseases (other than T2D) to account for ascertainment.

For each outcome, we aggregated association summary statistics from each cluster-specific component of the partitioned PS and the overall PS across ancestries through random-effects meta-analyses. All meta-analyses were conducted using the R package meta (https://cran.r-project.org/package=meta).

Cluster-specific partitioned PS analyses of clinical outcomes

We tested for association of cardiovascular and kidney-related clinical outcomes in EUR individuals with T2D in prospective GWASs from six clinical trials from the Thrombolysis in Myocardial Infarction (TIMI) Study Group (https://timi.org/). Trial descriptions and details of genotyping and quality control are provided in the Supplementary Methods.

Within each trial, we calculated the overall PS and cluster-specific components of the partitioned PS for each individual, with each index SNV weighted by the allelic log-OR from the European ancestry-specific meta-analysis. We standardized the overall PS and each cluster-specific component of the partitioned PS to have mean zero and unit variance. Data from the six trials were subsequently pooled, and we considered the following clinical outcomes in patients with T2D only: myocardial infarction, ischaemic stroke, cardiovascular death, hospitalization for heart failure, atrial fibrillation, acute limb ischaemia, peripheral revascularization, end-stage renal disease or renal death and albuminuria. We tested for association of each cluster-specific component of the partitioned PS with each clinical outcome under a Cox proportional hazards model, including age, sex, the first ten principal components and the overall PS as covariates. All association analyses were conducted using the coxph function with Efron ties handling from the R package survival (https://cran.r-project.org/package=survival).

Ethics statement

Study-level ethics statements are provided in the Supplementary Note.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.