Genome-wide association study identifies novel risk variants from RPS6KA1, CADPS, VARS, and DHX58 for fasting plasma glucose in Arab population

Consanguineous populations of the Arabian Peninsula, which has seen an uncontrolled rise in type 2 diabetes incidence, are underrepresented in global studies on diabetes genetics. We performed a genome-wide association study on the quantitative trait of fasting plasma glucose (FPG) in unrelated Arab individuals from Kuwait (discovery-cohort:n = 1,353; replication-cohort:n = 1,196). Genome-wide genotyping in discovery phase was performed for 632,375 markers from Illumina HumanOmniExpress Beadchip; and top-associating markers were replicated using candidate genotyping. Genetic models based on additive and recessive transmission modes were used in statistical tests for associations in discovery phase, replication phase, and meta-analysis that combines data from both the phases. A genome-wide significant association with high FPG was found at rs1002487 (RPS6KA1) (p-discovery = 1.64E-08, p-replication = 3.71E-04, p-combined = 5.72E-11; β-discovery = 8.315; β-replication = 3.442; β-combined = 6.551). Further, three suggestive associations (p-values < 8.2E-06) with high FPG were observed at rs487321 (CADPS), rs707927 (VARS and 2Kb upstream of VWA7), and rs12600570 (DHX58); the first two markers reached genome-wide significance in the combined analysis (p-combined = 1.83E-12 and 3.07E-09, respectively). Significant interactions of diabetes traits (serum triglycerides, FPG, and glycated hemoglobin) with homeostatic model assessment of insulin resistance were identified for genotypes heterozygous or homozygous for the risk allele. Literature reports support the involvement of these gene loci in type 2 diabetes etiology.

Associations observed in discovery and replication phases. Upon examining the association test results from discovery phase for at least nominal p-values of <1.0E-05 and acceptable beta values, we short-listed 22 markers (21 associated with FPG and 1 with both FPG and HbA1c) to carry forward to the replication phase; Table 2 presents their quality assessment values in the replication phase. Intensity maps displaying the quality of the three called genotypes at these markers are presented in Supplementary Figure S3. Quantile-quantile plots depicting the expected and observed −log 10 (p-values) for association of the markers with FPG are presented in Fig. 1. Genomic-control inflation factors for FPG were (λ = 1.047, recessive model; λ = 1.077, additive model) in tests with regular corrections and (λ = 1.031, recessive model; λ = 1.069, additive model) in tests corrected further for glucose-lowering medication. Similar values were obtained for HbA1c. These values at close to 1.0 and differing only over a small range of 1.03-1.08 do not necessitate correcting association statistics for genomic-control inflation. Manhattan plots depicting the −log 10 (p-values) from the GWAS for the FPG trait are presented in Supplementary Figure S4. Four markers (i.e., rs12488539, rs6762914, rs1199028, rs7329697) failed the SNP quality assessment tests for Hardy-Weinberg equilibrium quality control (HWE >10 −6 ); and none failed the test for allele frequency consistency (between discovery and replication phases). Table S2 lists, for all 22 markers, results of association tests (with regular corrections and additionally corrected for diabetes medication) from the discovery and replication phases as well as meta-analysis of the combined results from both phases. The analysis produced a short-list of four associations for FPG that showed significant p-values in discovery phase (one at a genome-wide significant p-value of <1.8E-08 and three at nominal p-values of <1.0E-05) and that passed the p-value threshold in the replication phase; three of them reached genome-wide significance in the meta-analysis that combines and jointly analyze the data from both the discovery and replication phases (Table 3). Such markers were rs1002487/[intronic from RPS6KA1] (p-discovery = 1.64E-08, p-replication = 3.71E-04, p-combined = 5.72E-11), rs487321/[intronic from CADPS] (p-discovery = 1.53E-07, p-replication = 2.25E-06, p-combined = 1.83E-12), rs707927/[intronic from VARS and 2 Kb upstream of VWA7] (p-discovery = 8.24E-06, p-replication = 8.25E-05, p-combined = 3.07E-09), and rs12600570/[intronic from DHX58] (p-discovery = 7.49E-06, p-replication = 4.67E-03, p-combined = 2.72E-07); the former two were recessive and the latter were additive markers. Further corrections for glucose-lowering medication retained significant p-values and effect sizes. Upon performing inverse normal transformation on the FPG traits, p-values for the association of rs707927 improved to 1.26E-07 (effect size = 0.33). The RPS6KA1 marker was also associated with HbA1c at close to the p-value threshold for genome-wide significance (p-discovery = 4.91E-08; p-replication = 2.71E-03; p-combined = 7.27E-09).
Considering the diabetes and obesity status of the participants as covariates for adjustments on the association models. 45% of participants in the discovery phase, and 38% of participants in the replication phase, respectively, were diagnosed for diabetes (see Table 1). It is further the case that obesity seems to be a major driver of diabetes in the whole sample -59% of participants in the discovery phase and 46% of participants in the replication phase, respectively, were obese. Thus, it is important to perform corrections for the association tests for diabetes and obesity status along with corrections for diabetes and lipid lowering medications (lipid lowering medications can influence FPG levels). Upon performing the corrections for these 4 covariates along with the regular corrections, it was found that the p-values remained significant at p-combined = 1.38E-10 (for RPS6KA1 marker), 1.88E-13 (for CADPS marker), 1.23E-08 (for VARS marker) and 2.78E-05 (for DHX58 marker) ( Table 4).

Sensitivity analysis.
A concern arises as to whether the FPG values measured in individuals receiving glucose-lowering medication represent "naturally" observed values in the population. We addressed this concern by way of performing a sensitivity analysis to add a value of 2.5 mmol/L to the FPG values of the participants taking diabetes medication and then performing the association tests; the value of 2.5 mmol/L is an average effect size (p-value < 0.001) that we observed in an in-house clinical database of diabetic patients visiting clinics in our institute. The results of association tests with the preadjusted FPG values for the four identified associations (with corrections for regular confounders and BMI) are presented in Table 5. The associations retained the p-values.
Assessing the identified associations in sub-cohorts of entirely diabetic or of entirely non-diabetic participants. The discovery and replication cohorts used in this study included both diabetic patients and healthy participants; as mentioned above, the identified associations retained significance when the models were adjusted for the covariate of diabetes status. It is often the case that quantitative trait associations are done on entirely non-diabetic participants or on entirely diabetic patients (which gives a higher chance of translating the findings to clinical utility). We distributed the discovery cohort (n = 1353) and replication cohort (n = 1176) onto four sub-cohorts: (i) Discovery_diabetic (n = 605); (ii) Discovery_non-diabetic (n = 748); (iii) Replication_diabetic (n = 452); and (iv) Replication_non-diabetic (n = 724). We performed association tests with each of the four sub-cohorts followed by three meta-analysis (Meta_diabetic: combining results from Discovery_ diabetic and Replication_diabetic), (Meta_non-diabetic: combining results from Discovery_non-diabetic and Replication_non-diabetic) and (Meta_all: combining results from all the four sub-cohorts). With regular corrections performed on the association tests, the effect sizes and p-values remained significant in the Meta_diabetic Examining the NHGRI-EBI GWAS catalog for previous association reports on the identified risk variants. While none of the identified risk variants was associated with any trait in previous GWAS, the gene loci were often associated with traits related to diabetes: RPS6KA1 with glucose homeostasis traits 14 , sporadic amyotrophic lateral sclerosis 15 , and the symptom of rosacea 16 ; DHX58 with coronary artery disease (CAD) 17 ; VARS with blood plasma proteome 18 , autism spectrum disorder (ASD) 19 , and inflammatory bowel disease (IBD) 20 ; VWA7 with blood protein levels 18 , ASD 19 , and IBD 20 ; and CADPS with treatment interaction of sulfonylurea (a glucose-lowering drug) 21 , heart failure-related metabolite levels 22 , and obsessive-compulsive symptoms 23 .
LD markers and regional associations. Figure 2 presents regional association plots for regions of 500 Kb centered at the identified four risk variants; these regions (other than for the CADPS marker) were gene-dense. The (VARS, VWA7) and DHX58 markers had 21 and 7 LD partners (r 2 > 0.59), respectively. Several LD partners were associated with FPG at suggestive p-values of <1E-04 (Supplementary ROH segments overlaying the identified risk variants. All of the four reported risk variants were in ROH ( Table 7). The observed maximum values for of the ROH region lengths (mean ± SD of the ROH groups) were 8 Mb (RPS6KA1 marker), 15.5 Mb (CADPS), 9.9 Mb (VARS, VWA7), and 6.96 Mb (DHX58). The two recessive risk variants from RPS6KA1 and CADPS were in "known" ROH segments, while the two additive markers from (VARS, VWA7) and DHX58 were in "novel" segments. However, LD partners of the additive risk variants lay in "known" ROH segments -one such marker (i.e., rs2074158/DHX58) in LD with the DHX58 risk variant is listed in the GWAS catalog as being associated with CAD (see Table 7). Presence of the identified ROH segments (to which the associated variant overlaps) is found more often in sub-cohort of diabetic participants than in www.nature.com/scientificreports www.nature.com/scientificreports/ sub-cohort of non-diabetic participants, though the size of the former sub-cohort (n = 605) is smaller than that of the latter sub-cohort (n = 748); however, the differences are not seen statistically significant.

Gene expression regulation by the identified risk variants. Examination of Genotype-Tissue
Expression (GTeX) data (https://www.gtexportal.org) revealed that all four risk variants regulate the expression of their own or other genes. The RPS6KA1 marker regulates the DHDDS gene in the heart's left ventricle; the CADPS marker regulates itself in the artery-tibial and adipose-subcutaneous tissues; the (VARS, VWA7) marker regulates a number of genes [LY6G5B (artery-tibial, testis, muscle-skeletal, thyroid); GPANK1 (esophagus-mucosa, skin); www.nature.com/scientificreports www.nature.com/scientificreports/ AIF1 (whole blood), C6orf25 (skin), SAPCD1-AS1 (skin); and TNXA (skin)]; the DHX58 marker regulates itself (in artery-tibial, adipose-subcutaneous, adipose-visceral, pancreas, and heart) and other genes such as KCNH4 (esophagus-muscularis), HSPB9 (testis), and RAB5C (adipose-subcutaneous, pancreas, muscle-skeletal). Table S4) for the identified risk variants in the third cohort of 283 samples considered for insulin resistance analysis indicated that the RPS6KA1, (VARS, VWA7), and CADPS markers passed the p-value threshold (<0.05) for associations with insulin resistance traits of HOMA-IR and HOMA-β and with the glucose-related traits of FPG and HbA1c; in addition, the association of the RPS6KA1 marker with TG was replicated.

Associations between glucose-related traits and insulin resistance traits at the risk variants. Allelic association test statistics (Supplementary
Results from multivariate analysis to examine relationships between glucose-related (FPG, HbA1C, TG) and insulin resistance (HOMA-IR, HOMA-β, C-peptide, HOMA-S) traits in the context of observed genotypes at risk variants (Table 8) indicated possible associations of the identified risk variants with insulin resistance: (I) RPS6KA1 marker: With genotypes homozygous for the risk allele, interactions between (TG, FPG and HbA1c) and insulin resistance traits (HOMA-β, C-peptide, HOMA-S) were observed at the multiple testing significance threshold of <0.003. With the heterozygous genotype, TG was associated with HOMA-S at a p-value < 0.05. (II) [VARS, VWA7] marker: With genotypes that are heterozygous or homozygous for the risk allele, interactions between FPG and insulin resistance traits (HOMA-β and HOMA-S) were observed at the multiple testing significance threshold of <0.003. With a heterozygous genotype, interactions between HbA1c and insulin resistance traits (HOMA-IR and HOMA-S) were observed at the multiple testing significance threshold of <0.003. TG also interacted with HOMA-S at a p-value = 0.007 with a heterozygous genotype.  www.nature.com/scientificreports www.nature.com/scientificreports/ (III) CADPS marker: With a heterozygous genotype, associations between TG and HOMA-IR were observed at the multiple testing significance threshold of <0.003. Interaction between FPG and HOMA-β with a heterozygous genotype could be seen at a p-value < 0.003. (IV) DHX58 marker: With genotypes homozygous for the risk allele, TG and FPG were seen to be associated with (HOMA-IR and C-peptide levels) and HOMA-β, respectively, at the multiple testing significance threshold of <0.003. With heterozygous genotypes, FPG was associated with both HOMA-IR and HO-MA-β at p-values < 0.05.

Discussion
This study identified a novel recessive marker (rs1002487) from RPS6KA1 (encoding Ribosomal Protein S6 Kinase A1) associated with high FPG (and HbA1c) at genome-wide significance in native Kuwaiti people of Arab descent. S6K1 signaling has distinct roles in regulating glucose homeostasis in pro-opiomelanocortin and agouti-related protein neurons, key regulators of energy homeostasis 25 ; and can potentially regulate insulin resistance through phosphorylating insulin receptor substrate 1 (IRS-1) 26 . It participates in the NOTCH pathway, an effector of mTOR, and is sensitive to both insulin and certain nutrients. Our previous GWAS, using the same cohort 12 , demonstrated that the marker was also recessively associated with high TG at genome-wide significance. FPG was directly correlated with TG and inversely correlated with HDL. Adiposity, high FPG, and TG are   www.nature.com/scientificreports www.nature.com/scientificreports/ hallmarks of insulin resistance 27 and high FPG within the normoglycemic range can increase the risk for type 2 diabetes 28 . The presented results indicate interactions between (TG, FPG, and HbA1c) and insulin resistance traits (HOMA-β, HOMA-S, C-peptide) at multiple testing significance with genotypes homozygous for the risk allele at the risk variant; even for the heterozygous genotype, TG was associated with HOMA-S (at p-value < 0.05). Thus, the present study, reporting for the first time that the RPS6KA1 marker is a risk variant for TG and glucose-related traits, is of considerable interest. Furthermore, in the GWAS catalog, the RPS6KA1 gene is associated with glucose homeostasis traits, sclerosis, and the symptom of rosacea. Reports have suggested that the rare homozygous (CC) state at the marker is involved in schizophrenia 29 . The GTeX resource annotates this marker as having the potential to regulate expression of the DHDDS gene, a locus associated with developmental delay and seizures (with or without movement abnormalities); patients with schizophrenia are also more prone to seizures. Patients with mental disorders, especially schizophrenia, are often afflicted by diabetes. Glucose homeostasis is altered upon the onset of schizophrenia, indicating that patients are at increased risk of diabetes 30 .
(i) CADPS encodes a calcium-dependent secretion activator involved in the exocytosis of vesicles filled with neurotransmitters and neuropeptides. Interestingly, the activator regulates the recruitment of insulin granules and  Table 6. Results from the analysis of examining the identified associations in sub-cohorts of entirely diabetic patients or of entirely healthy participants. & In the case of the RPS6KA1 marker, all the individuals with genotype homozygous for risk allele were seen with the sub-cohort of Discovery_diabetic) and hence results for Discovery_ non-diabetic sub-cohort were unavailable. # Association with the trait was observed under the genetic model based on recessive mode of inheritance; @ association with the trait was observed under the genetic model based on additive mode of inheritance. EffectSize Effect size represents beta value for discovery and replication phases, and Z-score for meta-analysis. R Regular correction -Corrected for age, sex and the top 10 principal components that resulted from the Principal Components Analysis of the genotype data;

BMI+LM
Corrected for BMI and lipid medication in addition to the regular correction; BMI+LM+DM Corrected for BMI and lipid & diabetes medications in addition to the regular correction. (2020) 10:152 | https://doi.org/10.1038/s41598-019-57072-9 www.nature.com/scientificreports www.nature.com/scientificreports/  ≤ 1.0). The X-axis represents the gene region in physical order; the Y-axis represents −log 10 P-value of the associations with FPG for all the SNPs. The dashed horizontal line represents a p-value of 3.60E-08. To generate regional association plot for a SNP-trait association, all the genotyped SNPs (passing the quality control analyses) from a region of around 500 Kb centered on the SNP were tested for association with the trait; the resultant statistics and the SNPs were displayed in the regional association plot. Region-plot tool (https://github.com/pgxcentre/region-plot) was used to produce regional plots. (2020) 10:152 | https://doi.org/10.1038/s41598-019-57072-9 www.nature.com/scientificreports www.nature.com/scientificreports/ beta-cell function 31,32 ; previous global GWAS associated CADPS loci with treatment interaction of sulfonylurea (a glucose-lowering drug) and heart failure-related metabolite levels 21,22 ; and GTeX annotates the marker as regulating the expression of its own gene (CADPS) in adipose-subcutaneous and tibial artery tissues. Furthermore, as indicated in our results, with a heterozygous genotype at the risk variant, TG was significantly associated with HOMA-IR (p < 0.003) and FPG with HOMA-β (p < 0.003).
(ii) VARS encodes valyl-tRNA synthetase and is associated with diabetic cataract, neurodevelopmental disorder, microcephaly, seizures, and cortical atrophy. VWA7 encodes Von Willebrand Factor A Domain-Containing Protein 7; previous global GWAS associated the VWA7 locus with IBD, blood plasma proteome, blood protein levels, and schizophrenia. Furthermore, the risk variant and its 26 strong LD partners are from a gene-dense region, commonly referred to as the HLA "class III" region 33 Table S3). Markers and genes from the HLA region are associated with risk for type 1 diabetes 34 and type 2 diabetes 35 : TNF mediates obesity-related insulin resistance 36 ; the HSPA1A gene (encoding HSP70) gets upregulated and correlates with HbA1c levels in pregnant women with gestational diabetes 37 ; people with type 2 diabetes have higher HSP70 levels in serum correlating with diabetes duration 38 ; and an upstream variant of HSPA1A (i.e., rs17201192, an LD partner (r 2 = 0.83) of the reported [VARS, VWA7] marker) showed an association with FPG, albeit at a nominal p-value of 3.3E-05, in our analysis (see Supplementary Table S3). Our results imply, with genotypes of heterozygosity or homozygosity for the risk allele, significant interactions between FPG and HOMA-β and HOMA-S; and with a heterozygous genotype, interactions between HbA1c and HOMA-IR and HOMA-S. TG was also seen to interact with HOMA-S at p = 0.007 with a heterozygous genotype. The [VARS, VWA7] variant appeared to regulate the expression of LY6G5B, GPANK1, AIF1, C6orf25, SAPCD1-AS1, and TNXA; previous global GWAS associated these genes with ASD and IBD, which are known to co-occur with type 2 diabetes 39 .  Table 7. ROH regions overlaying the identified risk variants. @ Two approaches were used to identify ROH segments (see Methods for details). Method 1: Markers that passed quality control were pruned for LD (r 2 > 0.9) (n = 568,670) and employed to detect ROH segments using parameters suggested by Howrigan et al. 52 ; Method 2: Un-pruned marker set (n = 632,375) was employed to detect ROH using parameters deployed in Christofidou et al. 53 .   Table 8. Interactions between (TG, FPG, HbA1c) and Insulin Resistance traits (HOMA-IR, HOMA-β, C-peptide, HOMA-S) with respect to genotypes at the risk variants. @ Multiple testing significance threshold for p-value is 0.003. All the interaction models were corrected for age and gender.
The presented results indicate that the DHX58 risk variant regulates DHX58, RAB5C, KCNH4, and HSPB9; interestingly, previous global GWAS implicated these four genes in CAD 17 . Furthermore, markers from RAB5C were associated with fibrinogen levels, which are known to be elevated in diabetic patients, especially those with foot ulcers 40 . Our results pointed to significant (p-value < 0.003) interactions between TG and (HOMA-IR and C-peptide levels) and between FPG and HOMA-β at genotypes homozygous for a risk allele.
All the four identified risk variants are intronic; however, as discussed above, genotype-tissue expression data revealed that each of the four variants can regulate genes that are associated with diabetes-related or comorbid disorders. Given that a large burden of homozygosity and excess of recessive alleles are attributed to Arab population from Kuwait 8 , the observations that two of the four identified risk variants appeared when genetic model based on the recessive mode of inheritance was used and that all four variants were in ROH segments are not surprising.
Association tests were examined with both raw and inverse normal transformed FPG values. The reported four associations remained significant when co-variate adjustments were done for diabetes medication, obesity and diagnosis for diabetes. The four associations remained significant when FPG values were preadjusted by a fixed amount per diabetes medication status. Further examination of the identified associations in the sub-cohorts of entirely diabetic patients or of entirely healthy participants revealed that the RPS6KA1, CADPS and VARS markers performed better in terms of retaining significance in cohorts of diabetic patients and the DHX58 marker in the cohort of participants free of diabetes.
Because of the nature of the study design that uses HumanOmniExpress BeadChip, the study does not consider genetic variants that are seen only in the Arab population. However, we find that there are statistically significant differences in genotype distributions at the risk variants between the Arab population and continental populations (Supplementary Table S6). The risk allele frequencies also differ substantially across the populations (Supplementary Figure S5). In order to identify Arab-population-specific risk variants (that are not polymorphic in continental population), we need to perform large-scale genome-wide surveys (a combination of GWAS, exome, and genome sequencing and imputation) of the Arab population with diabetes 46 .
Our earlier studies identified three population subgroups in Kuwait 8 . the first group (Kuwait P) is largely of West Asian ancestry, representing Persians with European admixture; the second group (Kuwait S) is predominantly of city-dwelling Saudi Arabian tribe ancestry, and the third group (Kuwait B) includes most of the tent-dwelling Bedouin and is characterized by the presence of 17% African ancestry. Allele frequency assessment of the identified 4 risk variants among these substructures (Fig. 3) suggests that the variant rs1002487/RPS6KA1 is enriched in Persian ancestry, rs12600570/DHX58 in nomadic Bedouin ancestry, rs707927/(VARS, VWA7) in Saudi Arabian ancestry followed by nomadic Bedouin ancestry while the frequency of rs487321/CADPS is almost equal among the three population substructures of Kuwait.
Limitations of the study include the following: (i) Among study cohorts, there are many subjects assuming hypoglycemic therapy -which we took care by way of adjusting the association tests for medication and by performing sensitivity analysis; however, the such individuals at risk for hyperglycemia might have introduced corrective actions (such as exercise, hypocaloric diet and food supplements) affecting FPG; unfortunately, data relating to these corrective measures were not available and hence we were unable to consider them in association test models or in sensitivity analysis. (ii) The study cohort is relatively small, a limitation which might have hindered the ability to identify more than just the four reported risk variants and to observe any of the established risk variants for glucose-related traits. There is an urgent need to carry out much larger studies on the genetics of diabetes in Arab populations which are notorious for high prevalence of obesity and diabetes 46 .

Conclusions
This study identified novel risk variants for high FPG in the Arab population of Kuwait. The RPS6KA1 gene (associated with FPG at genome-wide significance) is known to be involved in glucose homeostasis. Gene loci of CADPS, (VARS, VWA7), and DHX58 exhibiting nominal associations with FPG were often found to be associated with CAD in previous global GWAS. The identified four associations remained significant when the regression models were adjusted for various confounders (such as medication, obesity and diabetes status) and when the FPG levels were preadjusted by a fixed value per diabetes medication status. The RPS6KA1, CADPS and VARS markers performed better in terms of retaining significance in cohorts of entirely diabetic patients and the DHX58 marker in the cohort of participants free of diabetes. With heterozygous or homozygous risk allele genotypes at these risk variants, significant interactions appear to occur between glucose-related and insulin resistance traits. The identified gene loci were previously associated with various other disorders (including IBD, schizophrenia, and autism) that appear to share risk factors with diabetes. This study presents, for the first time, potential associations between the RPS6KA1 gene loci and high TG, FPG, and HbA1c. (2020) 10 Study participants. Details on participant recruitment and a description of the study cohorts are presented in our previous paper 12 (for details, see Supplementary Material: Methods section on Study participants). Briefly, 3,145 participants were recruited from two cohorts in Kuwait. A representative sample of Kuwaiti native adults randomly selected from each of the six governorates of Kuwait formed the first group. Native Kuwaitis visiting our institutional clinics for tertiary medical care or our campaigns formed the second group; such visitors interested in participating were invited later to give blood samples after overnight fasting. We confirmed ethnicity through detailed questioning on parental lineage up to three generations. Data on age, sex, medical history, and medication were also recorded, as were baseline characteristics and vital signs. The discovery cohort was drawn largely from the second group and the replication cohort from the first group. 1,913 of the recruited participants were used for the discovery phase and 1,176 for the replication phase.
Power calculation. We adopted the "gene only" hypothesis and performed two types of power calculation (for details, see Supplementary Material: Methods section on Power calculation): (Type i): Quanto 47 was implemented to evaluate sample size and the potential to detect FPG trait variance with 80% power and p-value < 5.0E-08. Marginal genetic effect estimates (R G 2 ) were made to increment from 0.001 to 0.04 in steps of 0.001 in order to detect genetic effects explaining at least 0.1%-4% of trait variance could be detected. (Type ii): QPowR (https:// msu.edu/~steibelj/JP_files/QpowR.html) was used to determine the sample size for achieving 80% power for the study design of two phases (discovery and replication) with total sample size of 2,529, total heritability of 0.05, samples genotyped each of the two phases as ~50% of 2,529, markers typed in the second phase as ~0.2% of the markers typed in the first phase, and type I error rate of 5.0E-08. Quality control analyses. Raw intensity data from all samples were pooled and genotype calling was performed using GenomeStudio software. A series of quality metric thresholds was applied to derive a high-quality set of SNPs and samples (for details, see Supplementary Material: Methods section on Quality control analysis). Samples with a call rate >95% were retained. SNPs with inappropriate call quality were removed. Sex was estimated using GenomeStudio and removed mismatched samples. Strand designations were corrected to the forward strand, and REF/ALT designations were corrected using the design files for HumanOmniExpress BeadChip. Markers with allele frequency (-maf 0.01), and deviation from Hardy-Weinberg equilibrium (HWE <10 −6 ) were removed. We derived a set of LD-pruned markers (n = 340,299) by removing markers in LD (r 2 > 0.5) with others in a sliding window of 50-SNP and the LD-pruned marker set was used to measure relatedness among participants to the extent of third-degree relatives, to perform ancestry estimation (using ADMIXTURE 48 ), and www.nature.com/scientificreports www.nature.com/scientificreports/ principal component analysis (using EIGENSTRAT 49 ). One sample per pair of related participants was randomly removed. Samples with abnormal deviations, in the extents of component ancestry elements, from what we had established for the three Kuwaiti population subgroups 8 were removed as samples of ethnicity mismatch. Outliers in PCA were identified and the corresponding samples were removed.
Quantitative trait association tests. In discovery phase, all the 632,375 SNPs that passed quality control were used in association tests. Selected markers from discovery phase were tested in the replication phase. Both the additive and recessive genetic models were used in tests for associations with FPG and HbA1c. Two types of corrections were made to the associations tests -"Regular Corrections" involved adjustments for age, sex, and the first 10 principal components; and "Additional Corrections" involved further adjustment for glucose-lowering medication.
Joint analysis with results from discovery and replication phases. The METAL tool 50 was used to perform meta-analysis with association test statistics from both the discovery and replication phases. Combined analysis of data from both the phases is believed to enable detecting genetic associations with increased power 51 . P-value thresholds to assess significance of associations. Threshold for genome-wide significant p-values were calibrated for the counts of LD-pruned markers (n = 340,299), quantitative traits (n = 2, FPG and HbA1c), genetic models (n = 2, additive or recessive), and correction models for the association tests (n = 2, regular correction and further correction for glucose-lowering medication). The "stringent" p-value threshold to keep the type I error rate at 5% got calibrated to 1.84E-08. We further defined a "nominal" p-value threshold of (>1.84E-08 and <E-05) to identify "suggestive" associations. P-value threshold for significant associations in replication phase was set at 0.05.

Identifying runs of homozygosity (ROH).
Runs of Homozygosity (ROH) were identified, using PLINK-1.9, through two approaches: (Approach-1): Markers that passed quality control were pruned for LD (r 2 > 0.9) (n = 568,670) and employed to detect ROH segments using parameters recommended by Howrigan et al. 52 (Approach-2): The unpruned marker set (n = 632,375) was employed and parameters deployed by Christofidou et al. 53 were used. Consensus ROH regions were derived for the identified groups of overlapping ROH segments and mean ± SD was calculated (by considering the midpoint of each individual ROH falling in the group). Delineated ROH segments were classified as "known" or "novel" by comparison with ROH signatures discovered in global populations 54 .

Derivation of insulin resistance traits and association with glucose-related traits.
We considered a subset of 283 samples, randomly selected from the replication cohort, and measured C-peptide levels in plasma (for details, see Supplementary Material: Methods section on Derivation of insulin resistance traits). Insulin resistance traits (i.e., HOMA-IR, HOMA-β, and HOMA-S) were calculated using the FPG (mmol/l) and C-peptide (nmol/l) values with the HOMA2 calculator (https://www.dtu.ox.ac.uk/homacalculator/). Multivariate linear regression, corrected for age and sex, was performed to assess interactions between (TG, FPG, HbA1c) and insulin resistance traits with respect to the genotypes at risk variants; standardized beta-coefficients (β 1 ) and test significance (p-values) were derived using the R Project for Statistical Computing software (https:// www.r-project.org/). The p-value threshold calibrated for multiple testing was 0.003 (=0.05/16); the denominator corresponds to four interaction models on each of the four risk variants.