Introduction

A large number of genome-wide association studies have been conducted in various populations (mostly on Europeans, Americans, and East Asians), resulting in the identification of more than 100 loci conferring susceptibility to type 2 diabetes mellitus1,2,3,4. Meta-analysis and genotype imputations from diverse ethnic populations help identify novel markers and causal loci. However, despite the observed high prevalence of type 2 diabetes in Arab countries5,6, their populations were not included in global studies.

The Arabian Peninsula is at the nexus of Africa, Europe, and Asia; and has been assumed to be an early human migration route out of Africa. Consanguineous marriage (especially among first or second cousins) is an established practice among the Arabian Peninsula population. Consanguinity results in increased homozygosity, and accumulation of deleterious recessive alleles in the gene pool, creating the potential for certain variants to become more common in these endogamous population groups; these features can influence the etiology of complex disorders7. Therefore, elucidating novel risk variants is realistically possible in this population.

The Kuwaiti population consists of settlers from Saudi Arabia, Iran, and other neighboring countries within the Peninsula. Such settlement and subsequent admixture shaped the genetics of the Kuwaiti population. Our earlier work showed that the Kuwaiti population is heterogeneous, but structured, and carries a large burden of homozygosity8. Kuwaiti population groups practice consanguineous marriage; a survey in Kuwait reported that the rate of consanguineous marriages was as high as 54% and the average inbreeding coefficient was 0.02199. These practices indicate that groups live in isolation by community leading to genetic isolates in extended families and Bedouin tribes10. Using these small population isolates can reduce the complexity of polygenic disorders by reducing the number of loci involved in disorder etiology11. In the present study, we performed a genome-wide association study (GWAS) on native Arab individuals from Kuwait to delineate novel risk variants for fasting plasma glucose (FPG). We further examined associations between glucose-related traits and insulin resistance traits in individuals with genotypes, heterozygous or homozygous, for the risk allele at the identified risk variants.

Results

Marker and sample sets

Quality control analyses resulted in a marker set of 632,375 SNPs (reduced down from 730,525), discovery cohort of 1,353 samples (reduced down from 1913), and replication cohort of 1,176 samples. The discovery cohort was estimated to have 80% power to detect associations (under additive and recessive models) with a genetic effect that explained 0.6% of the variance in the trait. The acceptable effect sizes at different allele frequencies for associations with FPG (in discovery phase) are presented in Supplementary Table S1.

Characteristics of study participants

The study cohorts were described in our previous reports12,13. Participants (comprising almost equal proportions of men and women) were largely middle-aged (mean age in discovery cohort, 46.8 ± 13.8 years) (Table 1) and were largely obese (mean body mass index, 32.4 ± 7.4 kg/m2) with high waist circumference (102.21 ± 16.35 cm). The proportions of participants afflicted with type 2 diabetes from the discovery and replication cohorts were 45% and 39%, respectively. A total of 216 of the participants from the discovery cohort were being administered glucose-lowering medication. Mean FPG values in the discovery and replication cohorts were 7.3 ± 3.57 and 5.86 ± 2.27 mmol/L, respectively, and were in the range of the ADA-defined threshold of 5.5–6.9 mmol/L for diagnosing impaired fasting glucose. Mean HbA1c values in the discovery and replication cohorts were 7.1 ± 2.1%, and 6.00 ± 1.4%, respectively. While FPG measurements were available for all participants of the discovery cohort, HbA1c values were available for only 750; hence, markers associated with only HbA1c were excluded from further analyses.

Table 1 Demographic characteristics of the study participants.

Scatterplots presenting the first three principal components derived from a merged data set of the discovery cohort and representative populations from the Human Genome Diversity Project (HGDP) are presented in Supplementary Figure S1; the scatterplots depict three genetic substructures and agree with the PCA plot (reproduced in Supplementary Figure S2) that we derived earlier using a set of native Kuwaiti individuals whose Arab ethnicity was confirmed through surname lineage analysis8.

Associations observed in discovery and replication phases

Upon examining the association test results from discovery phase for at least nominal p-values of <1.0E-05 and acceptable beta values, we short-listed 22 markers (21 associated with FPG and 1 with both FPG and HbA1c) to carry forward to the replication phase; Table 2 presents their quality assessment values in the replication phase. Intensity maps displaying the quality of the three called genotypes at these markers are presented in Supplementary Figure S3. Quantile–quantile plots depicting the expected and observed −log10(p-values) for association of the markers with FPG are presented in Fig. 1. Genomic-control inflation factors for FPG were (λ = 1.047, recessive model; λ = 1.077, additive model) in tests with regular corrections and (λ = 1.031, recessive model; λ = 1.069, additive model) in tests corrected further for glucose-lowering medication. Similar values were obtained for HbA1c. These values at close to 1.0 and differing only over a small range of 1.03–1.08 do not necessitate correcting association statistics for genomic-control inflation. Manhattan plots depicting the −log10(p-values) from the GWAS for the FPG trait are presented in Supplementary Figure S4. Four markers (i.e., rs12488539, rs6762914, rs1199028, rs7329697) failed the SNP quality assessment tests for Hardy–Weinberg equilibrium quality control (HWE >10−6); and none failed the test for allele frequency consistency (between discovery and replication phases). Table S2 lists, for all 22 markers, results of association tests (with regular corrections and additionally corrected for diabetes medication) from the discovery and replication phases as well as meta-analysis of the combined results from both phases. The analysis produced a short-list of four associations for FPG that showed significant p-values in discovery phase (one at a genome-wide significant p-value of <1.8E-08 and three at nominal p-values of <1.0E-05) and that passed the p-value threshold in the replication phase; three of them reached genome-wide significance in the meta-analysis that combines and jointly analyze the data from both the discovery and replication phases (Table 3). Such markers were rs1002487/[intronic from RPS6KA1] (p-discovery = 1.64E-08, p-replication = 3.71E-04, p-combined = 5.72E-11), rs487321/[intronic from CADPS] (p-discovery = 1.53E-07, p-replication = 2.25E-06, p-combined = 1.83E-12), rs707927/[intronic from VARS and 2 Kb upstream of VWA7] (p-discovery = 8.24E-06, p-replication = 8.25E-05, p-combined = 3.07E-09), and rs12600570/[intronic from DHX58] (p-discovery = 7.49E-06, p-replication = 4.67E-03, p-combined = 2.72E-07); the former two were recessive and the latter were additive markers. Further corrections for glucose-lowering medication retained significant p-values and effect sizes. Upon performing inverse normal transformation on the FPG traits, p-values for the association of rs707927 improved to 1.26E-07 (effect size = 0.33). The RPS6KA1 marker was also associated with HbA1c at close to the p-value threshold for genome-wide significance (p-discovery = 4.91E-08; p-replication = 2.71E-03; p-combined = 7.27E-09).

Table 2 SNP quality assessment statistics for the 22 markers assessed in the replication phase.
Figure 1
figure 1

Quantile–quantile plots of the expected and observed −log10(p-values) for the association of markers with FPG under additive (λ = 1.077) and recessive (λ = 1.047) models upon regular correction.

Table 3 List of the four identified risk variants associated with FPG either at genome-wide significant p-values (<1.8E-08) or at nominal p-values of 1.0 < E-06.

Considering the diabetes and obesity status of the participants as covariates for adjustments on the association models

45% of participants in the discovery phase, and 38% of participants in the replication phase, respectively, were diagnosed for diabetes (see Table 1). It is further the case that obesity seems to be a major driver of diabetes in the whole sample – 59% of participants in the discovery phase and 46% of participants in the replication phase, respectively, were obese. Thus, it is important to perform corrections for the association tests for diabetes and obesity status along with corrections for diabetes and lipid lowering medications (lipid lowering medications can influence FPG levels). Upon performing the corrections for these 4 covariates along with the regular corrections, it was found that the p-values remained significant at p-combined = 1.38E-10 (for RPS6KA1 marker), 1.88E-13 (for CADPS marker), 1.23E-08 (for VARS marker) and 2.78E-05 (for DHX58 marker) (Table 4).

Table 4 Results from the analysis of correcting the observed associations for the additional covariates of obesity and diabetes status of the participants.

Sensitivity analysis

A concern arises as to whether the FPG values measured in individuals receiving glucose-lowering medication represent “naturally” observed values in the population. We addressed this concern by way of performing a sensitivity analysis to add a value of 2.5 mmol/L to the FPG values of the participants taking diabetes medication and then performing the association tests; the value of 2.5 mmol/L is an average effect size (p-value < 0.001) that we observed in an in-house clinical database of diabetic patients visiting clinics in our institute. The results of association tests with the preadjusted FPG values for the four identified associations (with corrections for regular confounders and BMI) are presented in Table 5. The associations retained the p-values.

Table 5 Results from sensitivity analysis of preadjusting the FPG measurements by a fixed value (2.5 mmol/L) per diabetes medication status.

Assessing the identified associations in sub-cohorts of entirely diabetic or of entirely non-diabetic participants

The discovery and replication cohorts used in this study included both diabetic patients and healthy participants; as mentioned above, the identified associations retained significance when the models were adjusted for the covariate of diabetes status. It is often the case that quantitative trait associations are done on entirely non-diabetic participants or on entirely diabetic patients (which gives a higher chance of translating the findings to clinical utility). We distributed the discovery cohort (n = 1353) and replication cohort (n = 1176) onto four sub-cohorts: (i) Discovery_diabetic (n = 605); (ii) Discovery_non-diabetic (n = 748); (iii) Replication_diabetic (n = 452); and (iv) Replication_non-diabetic (n = 724). We performed association tests with each of the four sub-cohorts followed by three meta-analysis (Meta_diabetic: combining results from Discovery_diabetic and Replication_diabetic), (Meta_non-diabetic: combining results from Discovery_non-diabetic and Replication_non-diabetic) and (Meta_all: combining results from all the four sub-cohorts). With regular corrections performed on the association tests, the effect sizes and p-values remained significant in the Meta_diabetic analysis (Table 6) for the markers from the RPS6KA1 (β = 6.01; p = 1.84E-09), CADPS (β = 5.13; p = 2.86E-07) and VARS (β = 4.68; p = 2.83E-06) genes and in the Meta_non-diabetic analysis for the marker from the DHX58 gene (β = 3.81; p = 1.30E-04); considering that the sizes of the sub-cohorts reduced considerably, these values can be considered significant. In addition, the p-values for Meta_all analysis (β = 5.46; p = 4.82E-08) remained significant for the VARS marker.

Table 6 Results from the analysis of examining the identified associations in sub-cohorts of entirely diabetic patients or of entirely healthy participants.

Examining the NHGRI-EBI GWAS catalog for previous association reports on the identified risk variants

While none of the identified risk variants was associated with any trait in previous GWAS, the gene loci were often associated with traits related to diabetes: RPS6KA1 with glucose homeostasis traits14, sporadic amyotrophic lateral sclerosis15, and the symptom of rosacea16; DHX58 with coronary artery disease (CAD)17; VARS with blood plasma proteome18, autism spectrum disorder (ASD)19, and inflammatory bowel disease (IBD)20; VWA7 with blood protein levels18, ASD19, and IBD20; and CADPS with treatment interaction of sulfonylurea (a glucose-lowering drug)21, heart failure-related metabolite levels22, and obsessive-compulsive symptoms23.

LD markers and regional associations

Figure 2 presents regional association plots for regions of 500 Kb centered at the identified four risk variants; these regions (other than for the CADPS marker) were gene-dense. The (VARS, VWA7) and DHX58 markers had 21 and 7 LD partners (r2 > 0.59), respectively. Several LD partners were associated with FPG at suggestive p-values of <1E-04 (Supplementary Table S3). Examination of the NHGRI-EBI GWAS catalog listed the following two LD partners (that associated in our study population at a p-value of E-05): (i) rs2074158-T (missense) (LD [r2 = 0.56] partner of DHX58 risk variant) associated with CAD (p-value = 2.0E-10) in UK BioBank populations17; and (ii) rs9469054-A (intronic) (LD [r2 = 0.85] partner of [VARS, VWA7] risk variant) associated with monocyte count (p-value = 1.0E-20)24; shared genetic pathways linking blood cell counts with complex pathologies (including CAD) have been reported24.

Figure 2
figure 2

Regional association plots showing the 4 identified risk variants (A) rs1002487, (B) rs487321, (C) rs707927, (D) rs12600570) and the markers in LD (from a 500 Kb genome region centered at the risk variants) with the risk variants in their respective gene regions and their association with FPG. The SNPs are color-coded as per the r2 value for the SNP with the identified risk variant (Blue dots: r2 ≤ 0.2; Purple dots: r2 > 0.2 & ≤ 0.4; Green dots: r2 > 0.4 & ≤ 0.6; Orange dots: r2 > 0.6 & ≤ 0.8; Red dots: r2 > 0.8 & ≤ 1.0). The X-axis represents the gene region in physical order; the Y-axis represents −log10 P-value of the associations with FPG for all the SNPs. The dashed horizontal line represents a p-value of 3.60E-08. To generate regional association plot for a SNP-trait association, all the genotyped SNPs (passing the quality control analyses) from a region of around 500 Kb centered on the SNP were tested for association with the trait; the resultant statistics and the SNPs were displayed in the regional association plot. Region-plot tool (https://github.com/pgxcentre/region-plot) was used to produce regional plots.

ROH segments overlaying the identified risk variants

All of the four reported risk variants were in ROH (Table 7). The observed maximum values for of the ROH region lengths (mean ± SD of the ROH groups) were 8 Mb (RPS6KA1 marker), 15.5 Mb (CADPS), 9.9 Mb (VARS, VWA7), and 6.96 Mb (DHX58). The two recessive risk variants from RPS6KA1 and CADPS were in “known” ROH segments, while the two additive markers from (VARS, VWA7) and DHX58 were in “novel” segments. However, LD partners of the additive risk variants lay in “known” ROH segments – one such marker (i.e., rs2074158/DHX58) in LD with the DHX58 risk variant is listed in the GWAS catalog as being associated with CAD (see Table 7). Presence of the identified ROH segments (to which the associated variant overlaps) is found more often in sub-cohort of diabetic participants than in sub-cohort of non-diabetic participants, though the size of the former sub-cohort (n = 605) is smaller than that of the latter sub-cohort (n = 748); however, the differences are not seen statistically significant.

Table 7 ROH regions overlaying the identified risk variants.

Gene expression regulation by the identified risk variants

Examination of Genotype-Tissue Expression (GTeX) data (https://www.gtexportal.org) revealed that all four risk variants regulate the expression of their own or other genes. The RPS6KA1 marker regulates the DHDDS gene in the heart’s left ventricle; the CADPS marker regulates itself in the artery-tibial and adipose-subcutaneous tissues; the (VARS, VWA7) marker regulates a number of genes [LY6G5B (artery-tibial, testis, muscle-skeletal, thyroid); GPANK1 (esophagus-mucosa, skin); AIF1 (whole blood), C6orf25 (skin), SAPCD1-AS1 (skin); and TNXA (skin)]; the DHX58 marker regulates itself (in artery-tibial, adipose-subcutaneous, adipose-visceral, pancreas, and heart) and other genes such as KCNH4 (esophagus-muscularis), HSPB9 (testis), and RAB5C (adipose-subcutaneous, pancreas, muscle-skeletal).

Associations between glucose-related traits and insulin resistance traits at the risk variants

Allelic association test statistics (Supplementary Table S4) for the identified risk variants in the third cohort of 283 samples considered for insulin resistance analysis indicated that the RPS6KA1, (VARS, VWA7), and CADPS markers passed the p-value threshold (<0.05) for associations with insulin resistance traits of HOMA-IR and HOMA-β and with the glucose-related traits of FPG and HbA1c; in addition, the association of the RPS6KA1 marker with TG was replicated.

Results from multivariate analysis to examine relationships between glucose-related (FPG, HbA1C, TG) and insulin resistance (HOMA-IR, HOMA-β, C-peptide, HOMA-S) traits in the context of observed genotypes at risk variants (Table 8) indicated possible associations of the identified risk variants with insulin resistance:

  1. (I)

    RPS6KA1 marker: With genotypes homozygous for the risk allele, interactions between (TG, FPG and HbA1c) and insulin resistance traits (HOMA-β, C-peptide, HOMA-S) were observed at the multiple testing significance threshold of <0.003. With the heterozygous genotype, TG was associated with HOMA-S at a p-value < 0.05.

  2. (II)

    [VARS, VWA7] marker: With genotypes that are heterozygous or homozygous for the risk allele, interactions between FPG and insulin resistance traits (HOMA-β and HOMA-S) were observed at the multiple testing significance threshold of <0.003. With a heterozygous genotype, interactions between HbA1c and insulin resistance traits (HOMA-IR and HOMA-S) were observed at the multiple testing significance threshold of <0.003. TG also interacted with HOMA-S at a p-value = 0.007 with a heterozygous genotype.

  3. (III)

    CADPS marker: With a heterozygous genotype, associations between TG and HOMA-IR were observed at the multiple testing significance threshold of <0.003. Interaction between FPG and HOMA-β with a heterozygous genotype could be seen at a p-value < 0.003.

  4. (IV)

    DHX58 marker: With genotypes homozygous for the risk allele, TG and FPG were seen to be associated with (HOMA-IR and C-peptide levels) and HOMA-β, respectively, at the multiple testing significance threshold of <0.003. With heterozygous genotypes, FPG was associated with both HOMA-IR and HOMA-β at p-values < 0.05.

Table 8 Interactions between (TG, FPG, HbA1c) and Insulin Resistance traits (HOMA-IR, HOMA-β, C-peptide, HOMA-S) with respect to genotypes at the risk variants.

Discussion

This study identified a novel recessive marker (rs1002487) from RPS6KA1 (encoding Ribosomal Protein S6 Kinase A1) associated with high FPG (and HbA1c) at genome-wide significance in native Kuwaiti people of Arab descent. S6K1 signaling has distinct roles in regulating glucose homeostasis in pro-opiomelanocortin and agouti-related protein neurons, key regulators of energy homeostasis25; and can potentially regulate insulin resistance through phosphorylating insulin receptor substrate 1 (IRS-1)26. It participates in the NOTCH pathway, an effector of mTOR, and is sensitive to both insulin and certain nutrients. Our previous GWAS, using the same cohort12, demonstrated that the marker was also recessively associated with high TG at genome-wide significance. FPG was directly correlated with TG and inversely correlated with HDL. Adiposity, high FPG, and TG are hallmarks of insulin resistance27 and high FPG within the normoglycemic range can increase the risk for type 2 diabetes28. The presented results indicate interactions between (TG, FPG, and HbA1c) and insulin resistance traits (HOMA-β, HOMA-S, C-peptide) at multiple testing significance with genotypes homozygous for the risk allele at the risk variant; even for the heterozygous genotype, TG was associated with HOMA-S (at p-value < 0.05). Thus, the present study, reporting for the first time that the RPS6KA1 marker is a risk variant for TG and glucose-related traits, is of considerable interest. Furthermore, in the GWAS catalog, the RPS6KA1 gene is associated with glucose homeostasis traits, sclerosis, and the symptom of rosacea. Reports have suggested that the rare homozygous (CC) state at the marker is involved in schizophrenia29. The GTeX resource annotates this marker as having the potential to regulate expression of the DHDDS gene, a locus associated with developmental delay and seizures (with or without movement abnormalities); patients with schizophrenia are also more prone to seizures. Patients with mental disorders, especially schizophrenia, are often afflicted by diabetes. Glucose homeostasis is altered upon the onset of schizophrenia, indicating that patients are at increased risk of diabetes30.

This study identified three further risk variants associated with FPG at nominal p-values of < 8.20E-06. These are rs487321 (recessive, intronic, CADPS), rs707927 (additive, intronic in VARS, and 2 Kb upstream of VWA7), and rs12600570 (additive, intronic, DHX58). Of these three suggestive markers, the CADPS and [VARS, VWA7] markers reached genome-wide significance (p-combined = 1.83E-12 and 3.07E-09, respectively) in meta-analysis that jointly analyzes the data from both the phases.

(i) CADPS encodes a calcium-dependent secretion activator involved in the exocytosis of vesicles filled with neurotransmitters and neuropeptides. Interestingly, the activator regulates the recruitment of insulin granules and beta-cell function31,32; previous global GWAS associated CADPS loci with treatment interaction of sulfonylurea (a glucose-lowering drug) and heart failure-related metabolite levels21,22; and GTeX annotates the marker as regulating the expression of its own gene (CADPS) in adipose-subcutaneous and tibial artery tissues. Furthermore, as indicated in our results, with a heterozygous genotype at the risk variant, TG was significantly associated with HOMA-IR (p < 0.003) and FPG with HOMA-β (p < 0.003).

(ii) VARS encodes valyl-tRNA synthetase and is associated with diabetic cataract, neurodevelopmental disorder, microcephaly, seizures, and cortical atrophy. VWA7 encodes Von Willebrand Factor A Domain-Containing Protein 7; previous global GWAS associated the VWA7 locus with IBD, blood plasma proteome, blood protein levels, and schizophrenia. Furthermore, the risk variant and its 26 strong LD partners are from a gene-dense region, commonly referred to as the HLA “class III” region33, containing a large number of genes (i.e., TNF, AIF1, PRRC2A, APOM, BAG6, C6orf47, CSNK2B, GPANK1, LY6G5B, LY6G5C, ABHD16A, LOC105375018, LY6G6F-LY6G6D, LY6G6F, LY6G6E, LY6G6D, C6orf25, LY6G6C, MSH5-SAPCD1, MSH5, VARS, VWA7, C6orf48, NEU1, HSPA1A, EHMT2, and C2) (Fig. 2 and Supplementary Table S3). Markers and genes from the HLA region are associated with risk for type 1 diabetes34 and type 2 diabetes35: TNF mediates obesity-related insulin resistance36; the HSPA1A gene (encoding HSP70) gets upregulated and correlates with HbA1c levels in pregnant women with gestational diabetes37; people with type 2 diabetes have higher HSP70 levels in serum correlating with diabetes duration38; and an upstream variant of HSPA1A (i.e., rs17201192, an LD partner (r2 = 0.83) of the reported [VARS, VWA7] marker) showed an association with FPG, albeit at a nominal p-value of 3.3E-05, in our analysis (see Supplementary Table S3). Our results imply, with genotypes of heterozygosity or homozygosity for the risk allele, significant interactions between FPG and HOMA-β and HOMA-S; and with a heterozygous genotype, interactions between HbA1c and HOMA-IR and HOMA-S. TG was also seen to interact with HOMA-S at p = 0.007 with a heterozygous genotype. The [VARS, VWA7] variant appeared to regulate the expression of LY6G5B, GPANK1, AIF1, C6orf25, SAPCD1-AS1, and TNXA; previous global GWAS associated these genes with ASD and IBD, which are known to co-occur with type 2 diabetes39.

(iii) The DHX58 gene encodes DExH-box helicase 58. Previous global GWA studies associated a missense variant (i.e., rs2074158-T/DHX58), which is in LD (r2 = 0.56) with the reported DHX58 risk variant, with CAD (p-value = 2.0E-10) in UK BioBank populations17. We further noticed that the identified ROH region (17:36532212–43490282) (see Table 7) covering the DHX58 marker overlaps with the ROH (17: 36839131–38938944) (see Table 4 from our previous publication12) covering a marker (rs9972882 from PGAP3) that is associated with high triglyceride levels12.

The presented results indicate that the DHX58 risk variant regulates DHX58, RAB5C, KCNH4, and HSPB9; interestingly, previous global GWAS implicated these four genes in CAD17. Furthermore, markers from RAB5C were associated with fibrinogen levels, which are known to be elevated in diabetic patients, especially those with foot ulcers40. Our results pointed to significant (p-value < 0.003) interactions between TG and (HOMA-IR and C-peptide levels) and between FPG and HOMA-β at genotypes homozygous for a risk allele.

All the four identified risk variants are intronic; however, as discussed above, genotype-tissue expression data revealed that each of the four variants can regulate genes that are associated with diabetes-related or comorbid disorders. Given that a large burden of homozygosity and excess of recessive alleles are attributed to Arab population from Kuwait8, the observations that two of the four identified risk variants appeared when genetic model based on the recessive mode of inheritance was used and that all four variants were in ROH segments are not surprising.

Association tests were examined with both raw and inverse normal transformed FPG values. The reported four associations remained significant when co-variate adjustments were done for diabetes medication, obesity and diagnosis for diabetes. The four associations remained significant when FPG values were preadjusted by a fixed amount per diabetes medication status. Further examination of the identified associations in the sub-cohorts of entirely diabetic patients or of entirely healthy participants revealed that the RPS6KA1, CADPS and VARS markers performed better in terms of retaining significance in cohorts of diabetic patients and the DHX58 marker in the cohort of participants free of diabetes.

Consideration of ethnic populations in association studies is supposed to help in enlarging the global catalog of risk loci by way of indicating novel risk loci (not seen in major continental populations). Previous studies from the region on Arab cohorts demonstrated this aspect by way of identifying novel risk loci for type 2 diabetes (T2DM) at either genome-wide significant or suggestive p-values for associations – such loci include KIF12, DVL1, EPB41L3, DTNB, DLL1, CTNNB1, JAG1, MLXIP, CDKLAL1, TCF7L2, KCTD8, GABRG1, GABRA2, COX7B2, GABRA4, ZNF106 and OTX2-AS1 (Supplementary Table S5)41,42,43,44,45. Our study now adds RPS6KA1, CADPS, (VARS, VWA7), and DHX58 to this list of novel T2DM risk loci in Arab population.

Because of the nature of the study design that uses HumanOmniExpress BeadChip, the study does not consider genetic variants that are seen only in the Arab population. However, we find that there are statistically significant differences in genotype distributions at the risk variants between the Arab population and continental populations (Supplementary Table S6). The risk allele frequencies also differ substantially across the populations (Supplementary Figure S5). In order to identify Arab-population-specific risk variants (that are not polymorphic in continental population), we need to perform large-scale genome-wide surveys (a combination of GWAS, exome, and genome sequencing and imputation) of the Arab population with diabetes46.

Our earlier studies identified three population subgroups in Kuwait8. the first group (Kuwait P) is largely of West Asian ancestry, representing Persians with European admixture; the second group (Kuwait S) is predominantly of city-dwelling Saudi Arabian tribe ancestry, and the third group (Kuwait B) includes most of the tent-dwelling Bedouin and is characterized by the presence of 17% African ancestry. Allele frequency assessment of the identified 4 risk variants among these substructures (Fig. 3) suggests that the variant rs1002487/RPS6KA1 is enriched in Persian ancestry, rs12600570/DHX58 in nomadic Bedouin ancestry, rs707927/(VARS, VWA7) in Saudi Arabian ancestry followed by nomadic Bedouin ancestry while the frequency of rs487321/CADPS is almost equal among the three population substructures of Kuwait.

Figure 3
figure 3

Assessment of allele frequencies at the identified 4 risk variants among the three population substructures of Kuwait. Saudi: Kuwait S subgroup that is predominantly of city-dwelling Saudi Arabian tribe ancestry; Persian: Kuwait P subgroup that is largely of West Asian ancestry, representing Persians; Bedouin: Kuwait B subgroup that is of tent-dwelling Bedouin ancestry46.

Limitations of the study include the following: (i) Among study cohorts, there are many subjects assuming hypoglycemic therapy – which we took care by way of adjusting the association tests for medication and by performing sensitivity analysis; however, the such individuals at risk for hyperglycemia might have introduced corrective actions (such as exercise, hypocaloric diet and food supplements) affecting FPG; unfortunately, data relating to these corrective measures were not available and hence we were unable to consider them in association test models or in sensitivity analysis. (ii) The study cohort is relatively small, a limitation which might have hindered the ability to identify more than just the four reported risk variants and to observe any of the established risk variants for glucose-related traits. There is an urgent need to carry out much larger studies on the genetics of diabetes in Arab populations which are notorious for high prevalence of obesity and diabetes46.

Conclusions

This study identified novel risk variants for high FPG in the Arab population of Kuwait. The RPS6KA1 gene (associated with FPG at genome-wide significance) is known to be involved in glucose homeostasis. Gene loci of CADPS, (VARS, VWA7), and DHX58 exhibiting nominal associations with FPG were often found to be associated with CAD in previous global GWAS. The identified four associations remained significant when the regression models were adjusted for various confounders (such as medication, obesity and diabetes status) and when the FPG levels were preadjusted by a fixed value per diabetes medication status. The RPS6KA1, CADPS and VARS markers performed better in terms of retaining significance in cohorts of entirely diabetic patients and the DHX58 marker in the cohort of participants free of diabetes. With heterozygous or homozygous risk allele genotypes at these risk variants, significant interactions appear to occur between glucose-related and insulin resistance traits. The identified gene loci were previously associated with various other disorders (including IBD, schizophrenia, and autism) that appear to share risk factors with diabetes. This study presents, for the first time, potential associations between the RPS6KA1 gene loci and high TG, FPG, and HbA1c.

Methods

Ethics approval and consent to participate

This study was reviewed and approved by the institutional Ethical Review Committee at Dasman Diabetes Institute, Kuwait. Participant recruitment and blood sample collection were conducted under protocols adopted by the Ethical Review Committee. Signed informed consent was obtained from each participant.

Study participants

Details on participant recruitment and a description of the study cohorts are presented in our previous paper12 (for details, see Supplementary Material: Methods section on Study participants). Briefly, 3,145 participants were recruited from two cohorts in Kuwait. A representative sample of Kuwaiti native adults randomly selected from each of the six governorates of Kuwait formed the first group. Native Kuwaitis visiting our institutional clinics for tertiary medical care or our campaigns formed the second group; such visitors interested in participating were invited later to give blood samples after overnight fasting. We confirmed ethnicity through detailed questioning on parental lineage up to three generations. Data on age, sex, medical history, and medication were also recorded, as were baseline characteristics and vital signs. The discovery cohort was drawn largely from the second group and the replication cohort from the first group. 1,913 of the recruited participants were used for the discovery phase and 1,176 for the replication phase.

Power calculation

We adopted the “gene only” hypothesis and performed two types of power calculation (for details, see Supplementary Material: Methods section on Power calculation): (Type i): Quanto47 was implemented to evaluate sample size and the potential to detect FPG trait variance with 80% power and p-value < 5.0E-08. Marginal genetic effect estimates (RG2) were made to increment from 0.001 to 0.04 in steps of 0.001 in order to detect genetic effects explaining at least 0.1%–4% of trait variance could be detected. (Type ii): QPowR (https://msu.edu/~steibelj/JP_files/QpowR.html) was used to determine the sample size for achieving 80% power for the study design of two phases (discovery and replication) with total sample size of 2,529, total heritability of 0.05, samples genotyped each of the two phases as ~50% of 2,529, markers typed in the second phase as ~0.2% of the markers typed in the first phase, and type I error rate of 5.0E-08.

Genotyping in the discovery and replication phases

Genome-wide genotyping was performed on an Illumina HumanOmniExpress Array. Top associating markers in the discovery phase were genotyped in replication phase using TaqMan® SNP Genotyping Assays (Applied Biosystems, Foster City, CA, USA) and ABI 7500 Real-Time PCR System (Applied Biosystems) (for details, see Supplementary Material: Methods section on Sample processing: Discovery phase and Replication phase).

Quality control analyses

Raw intensity data from all samples were pooled and genotype calling was performed using GenomeStudio software. A series of quality metric thresholds was applied to derive a high-quality set of SNPs and samples (for details, see Supplementary Material: Methods section on Quality control analysis). Samples with a call rate >95% were retained. SNPs with inappropriate call quality were removed. Sex was estimated using GenomeStudio and removed mismatched samples. Strand designations were corrected to the forward strand, and REF/ALT designations were corrected using the design files for HumanOmniExpress BeadChip. Markers with allele frequency (–maf 0.01), and deviation from Hardy–Weinberg equilibrium (HWE <10−6) were removed. We derived a set of LD-pruned markers (n = 340,299) by removing markers in LD (r2 > 0.5) with others in a sliding window of 50-SNP and the LD-pruned marker set was used to measure relatedness among participants to the extent of third-degree relatives, to perform ancestry estimation (using ADMIXTURE48), and principal component analysis (using EIGENSTRAT49). One sample per pair of related participants was randomly removed. Samples with abnormal deviations, in the extents of component ancestry elements, from what we had established for the three Kuwaiti population subgroups8 were removed as samples of ethnicity mismatch. Outliers in PCA were identified and the corresponding samples were removed.

Quantitative trait association tests

In discovery phase, all the 632,375 SNPs that passed quality control were used in association tests. Selected markers from discovery phase were tested in the replication phase. Both the additive and recessive genetic models were used in tests for associations with FPG and HbA1c. Two types of corrections were made to the associations tests – “Regular Corrections” involved adjustments for age, sex, and the first 10 principal components; and “Additional Corrections” involved further adjustment for glucose-lowering medication.

Joint analysis with results from discovery and replication phases

The METAL tool50 was used to perform meta-analysis with association test statistics from both the discovery and replication phases. Combined analysis of data from both the phases is believed to enable detecting genetic associations with increased power51.

P-value thresholds to assess significance of associations

Threshold for genome-wide significant p-values were calibrated for the counts of LD-pruned markers (n = 340,299), quantitative traits (n = 2, FPG and HbA1c), genetic models (n = 2, additive or recessive), and correction models for the association tests (n = 2, regular correction and further correction for glucose-lowering medication). The “stringent” p-value threshold to keep the type I error rate at 5% got calibrated to 1.84E-08. We further defined a “nominal” p-value threshold of (>1.84E-08 and <E-05) to identify “suggestive” associations. P-value threshold for significant associations in replication phase was set at 0.05.

Identifying runs of homozygosity (ROH)

Runs of Homozygosity (ROH) were identified, using PLINK-1.9, through two approaches: (Approach-1): Markers that passed quality control were pruned for LD (r2 > 0.9) (n = 568,670) and employed to detect ROH segments using parameters recommended by Howrigan et al.52 (Approach-2): The unpruned marker set (n = 632,375) was employed and parameters deployed by Christofidou et al.53 were used. Consensus ROH regions were derived for the identified groups of overlapping ROH segments and mean ± SD was calculated (by considering the midpoint of each individual ROH falling in the group). Delineated ROH segments were classified as “known” or “novel” by comparison with ROH signatures discovered in global populations54.

Derivation of insulin resistance traits and association with glucose-related traits

We considered a subset of 283 samples, randomly selected from the replication cohort, and measured C-peptide levels in plasma (for details, see Supplementary Material: Methods section on Derivation of insulin resistance traits). Insulin resistance traits (i.e., HOMA-IR, HOMA-β, and HOMA-S) were calculated using the FPG (mmol/l) and C-peptide (nmol/l) values with the HOMA2 calculator (https://www.dtu.ox.ac.uk/homacalculator/). Multivariate linear regression, corrected for age and sex, was performed to assess interactions between (TG, FPG, HbA1c) and insulin resistance traits with respect to the genotypes at risk variants; standardized beta-coefficients (β1) and test significance (p-values) were derived using the R Project for Statistical Computing software (https://www.r-project.org/). The p-value threshold calibrated for multiple testing was 0.003 (=0.05/16); the denominator corresponds to four interaction models on each of the four risk variants.