Introduction

Type 1 diabetes (T1D) is the most prevalent type of diabetes in children and adolescents, accounting for 5–10% of total diabetes cases globally1. The International Diabetes Federation Diabetes Atlas, 10th edition reports a global estimate of 2.61 billion cases of T1D among children and adolescents under the age of 19 years2. T1D is a chronic condition characterized by the progressive destruction of pancreatic β-cells, leading to an absolute deficiency of insulin and, subsequently, a lifelong dependency on treatment with exogenous insulin. Inflammation is believed to be a critical hallmark of the disease, as evidenced by the presence of multiple autoantibodies against islet cell antigens3.

The inherent risk of developing T1D involves genetic and environmental factors that vary across different geographical locations. Most of the patients with T1D lack a positive family history of the disease. However, an estimated heritability rate of > 50% in monozygotic twins, 6% in dizygotic twins, and 6–7% in sib-pairs indicate that genetic predisposition plays a role in the pathogenesis of this disease4. More than 60 genetic loci have been associated with T1D in multiple ethnic populations5. The heritable risk of T1D has been primarily attributed to the human leucocyte antigen (HLA) region that maps to the short arm of chromosome 65. The HLA region, with extreme polymorphisms at its multiple genes, remains by far the greatest contributor to the genetic susceptibility to T1D. Several high-risk HLA DR–DQ haplotypes are useful in the diagnosis of the disease, as around 90% of patients with T1D either carry the HLA DR4-DQ8 or DR3-DQ2 haplotype, while 30% tend to carry both haplotypes6. An increased risk of islet autoimmunity has been reported among siblings who share the DR3/4-DQ8 haplotype with their diabetic proband7.

Global studies on genome-wide association and linkage analyses have revealed around 50 independent non-HLA markers associated with T1D8,9,10,11. A recent trans-ancestral fine mapping study implicated 78 genome wide regions including 36 novel regions linked to T1D11. To date, the most strongly associated T1D marker is the polymorphism of a variable number of tandem repeats in the 5’ UTR of the insulin (INS) gene that maps to chromosome 11p15.512. Polymorphisms in other candidate genes, such as cytotoxic T-lymphocyte antigen (CTLA4), protein tyrosine phosphatase non-receptor type 22 (PTPN22), interleukin 2 receptor subunit alpha (IL2RA), and interferon-induced helicase (IF1H1), have also been widely associated with T1D9,13,14,15,16. Experimental studies have also suggested environmental and lifestyle-related factors such as low vitamin D, early introduction to cow’s milk, exposure to toxins, and deregulation of gut microbiota, in the pathogenesis of T1D17, but their causal role has not been proven. Environmental factors tend to modulate epigenetic events, including DNA methylation and histone modifications, influencing the transcriptomic profiles of key genes involved in T1D18,19.

Childhood diabetes broadly comprises of T1D, early onset type 2 diabetes (T2D) and other monogenic forms such as maturity onset diabetes of the young (MODY), neonatal and syndromic diabetes20. T1D is the most frequent form of diabetes in children and young adolescents. Most genetic association studies in T1D have been performed on Caucasians21. Currently, no such data have been published from Kuwait, even though the incidence and prevalence of T1D are alarmingly high in this region. As per the IDF (2021) ranking, Kuwait stands third globally, with a high incidence rate of 41.7 per 100,000 population per annum of T1D in children (IDF Atlas 10th Edition 2021)2. There was a surge in the incidence of T1D in children under the age of 14 years, from 17.7 per 100,000 population per year in 1992–1994 to 40.9 in 2011–201322. An annual increase of 4.1% is reported in the incidence of T1D among young Kuwaiti children22. Nearly one third of the children diagnosed with T1D from the region presents diabetic ketoacidosis23.

Incidence of T2D was reported to be 2.56 per 100,000 Kuwaiti children under 14 years of age, between 2011 and 201324. Prevalence of obesity in children was observed to be exceedingly high in Kuwait (30.5%) which could be the driving factor for diabetes epidemic and its associated complications25. A recent study by Al-Kandari et al. revealed that pathogenic mutations from known MODY genes accounted for 21.8% of the Kuwaiti MODY families investigated26. Diagnosis of MODY is challenging and it is often misdiagnosed as T1D due to its early onset. Globally, mutations in 14 known MODY genes account for 1–6% of diabetes in children27. Neonatal and syndromic diabetes are rare forms of monogenic diabetes. Neonatal diabetes is diagnosed in children under 6 months of age; mutations in 22 distinct genes have been associated with the disease28. Syndromic forms of diabetes are characterized by young onset age and additional non-autoimmune extra pancreatic features. Data on prevalence of MODY, neonatal and other syndromic forms of diabetes are limited from the Gulf region. Consanguineous marriages are common in Kuwait, leading to an excessive burden on the enrichment of recessive alleles and clustering of familial disorders29. The unique genetic profiles shaped by inbreeding and extreme climatic conditions of Kuwait further contribute to the disease development30.

Very limited studies have explored the link between genetic polymorphisms and susceptibility to T1D in Kuwaiti population. A significantly increased frequency of HLA DQA1*0301/DQB1*0201 haplotype is reported in Kuwaiti T1D cases compared to controls31. Increased occurrence of non-aspartic acid residue at position 57 of HLA DQB1 has been reported in Kuwaiti T1D children. A previous study on vitamin D receptor polymorphisms has shown significant association of FokI and TaqI with susceptibility to T1D in Kuwaiti children32. Variations in interleukin 4 and interleukin 13 genes are additionally associated with susceptibility to T1D and their co-inheritance with high-risk HLA haplotypes are reported in Kuwaiti children33. Our recent study has revealed differential methylation profile of islet cell autoantigen 1, a prominent marker for beta cell autoimmunity in Kuwaiti T1D families19. Additionally, the study revealed 84 markers differentially methylated CpG sites in T1D and validated 18 CpG using publicly available gene expression data.

To further address this knowledge gap in the increased incidence and genetic etiology of T1D among Kuwaitis, we aimed at identifying recurrent genetic variants segregating in high-risk Kuwaiti T1D families by performing linkage analyses. We hypothesized that the presented family-based study will enable us to detect population-specific genetic markers that may positively impact evidence-based national risk assessment strategies.

Methods

Clinical recruitment of T1D families

Families with one or more T1D cases indicating either a vertical or a horizontal transmission of the disease were selected for the study (Supplementary Fig. S1). Data and samples used in this study were obtained from the registry of the Childhood-Onset Diabetes eRegistry (CODeR), which is a comprehensive and prospective pediatric population-based diabetes registry22 maintained by Dasman Diabetes Institute, in collaboration with the Ministry of Health (MOH) of Kuwait. The CODeR was established in 2011 for the surveillance of children and adolescents diagnosed with diabetes in Kuwait. Newly diagnosed cases of T1D are registered electronically or manually in the national registry from hospitals across the country. Physicians report patients from a primary care (local government and private hospitals) and secondary source (Kuwait Diabetes Society, MOH primary care centers, and the clinics at Dasman Diabetes Institute). Data were extracted using standard registry forms and by individually reviewing the medical records of patients registered as having T1D. Patients referred by the registry or clinicians were contacted by the study team for enrollment in the study. After providing counselling for the proband and their first-degree relatives, blood samples were collected at the outpatient department of Dasman Diabetes Institute. This was done under the supervision of a pediatric endocrinologist who examined the participants and extracted data from their clinical records. A total of 18 Kuwaiti T1D families consisting of 37 affected probands and 49 unaffected first-degree relatives available to us were considered for inclusion in the study cohort.

Research protocols were approved by the Ethical Review Committee of Dasman Diabetes Institute and the study was performed in accordance with the principles of the Declaration of Helsinki, as revised in 2008. Written informed consent was obtained from all study participants. In the case of children, informed consent was obtained from the parents/legal guardians, and an assent was obtained from children aged seven years or more.

The diagnosis of T1D was established according to the World Health Organization (WHO) criteria, which included fasting hyperglycemia and absolute insulin deficiency, as measured by a deficient C-peptide concentration (< 0.3 nmol/l)34. The date of the first insulin injection was taken as the date of T1D onset. The diagnosis of T1D in children was further confirmed by measuring the GAD-antibodies. The collected data included age, gender, body mass index (BMI), nationality, date of birth, date of diagnosis, family history of diabetes in first-degree relatives, and measurements of glycated hemoglobin A1c (HbA1C), plasma glucose, blood pressure, uric acid, blood urea nitrogen, and creatinine concentrations. Inclusion criteria included (i) Families with more than one T1D cases exhibiting either a vertical or a horizontal transmission of the disease, (ii) Diagnosis of T1D established based on World Health Organization (WHO) criteria, (iii) Diagnosis of T1D was confirmed by the presence of one or more autoantibodies against pancreatic islet cells, and (iv) T1D cases of Kuwaiti-Arab origin/Ethnicity. Exclusion criteria included presence of other chronic systemic, genetic or metabolic diseases.

Blood sample collection and processing

Blood samples were collected in ethylenediaminetetraacetic acid (EDTA)-treated tubes. Genomic DNA was extracted using a QIAamp Blood DNA kit (Qiagen, Germany) and was quantified using a Qubit Fluorometer (Thermofisher, USA), following the manufacturer’s protocol.

Whole exome sequencing and variant calling

Exome libraries were generated at 40–50 × coverage using Nextera Rapid Capture Exome kit (Illumina Inc. USA) following the manufacturer’s protocol (Supplementary Fig. S1). A total of 50 ng of purified genomic DNA was simultaneously fragmented and tagged enzymatically. The purified fragments were ligated to adapters specific for the Illumina platform using a 10-cycle polymerase chain reaction (PCR). Libraries were denatured into single-stranded DNA and biotin labeled probes specific to the targeted region were used for the rapid capture hybridization. The pool was enriched for the targeted regions by using streptavidin beads. The enriched DNA fragments were hybridized for a second rapid capture. The captured libraries were amplified by polymerase chain reaction and quantified using a high sensitivity kit on a qubit fluorometer (Thermofisher Scientific, Massachusetts, United States). The quality of the prepared libraries was tested using Bioanalyzer (Agilent, California, United States). Enriched libraries were clustered using TruSeq Paired Cluster Kit V3 (Illumina Inc. USA) and paired end sequencing was carried out in HiSeq 2500 using Illumina’s Sequence by Synthesis technology as 100 paired end read. Pedigrees of the sequenced individuals are given in Supplementary Fig. S2.

Raw sequencing data in a BCL format was converted to Fastq data using bcl2fastq v.2.20. Quality control of Fastq files was performed using FastQC (v0.11.9). Sequence reads were aligned to the reference human genome build hg19 using BWA v.0.7.17. Marking the read pairs that are likely to have originated from duplicates of the same original DNA fragments and recalibrating base quality scores were performed on the initial BAM file. Variants were called from the processed alignment files adopting GATK’s best practice guidelines and using standard hard filtering parameters for variant discovery. GATK v.3.7 HaplotypeCaller was executed for each sample to generate an intermediate genomic variant call file (gVCF). Joint variant calling was performed on all samples using the GenotypeGVCFs tool.

Linkage analysis of the T1D pedigrees

Prior to performing the linkage analysis, we filtered the merged VCF files for only biallelic single nucleotide polymorphisms (SNPs), and we removed low-quality SNPs using a series of filters such as percentage genotyped (> 10%), missingness per individual (> 10%), variants departing from Hardy–Weinberg Equilibrium (HWE) (> 1e-06), and Mendelian errors (families with > 10% errors and SNPs with more than 10% errors) using PLINK1.9. Variations that passed through quality control checks, including tests for Hardy–Weinberg Equilibrium and Mendelian errors were used for subsequent analyses. Analysis was restricted to variants with a minor allele frequency of ≥ 1% and a call rate of ≥ 90%. Pedigrees were built based on individuals’ pairwise genetic distances in the SNP data. KINGS software was used to test pairwise genetic relatedness between the families35. We retained 18 families in the final analysis based on the criteria that there was no genetic relatedness between any 2 families. Further, we converted the VCF files to the PED format and set centimorgan positions for all variants in every autosome using recombination map files of GRCh37 build using PLINK 1.9.

Parametric two-point linkage analysis between a disease trait and a set of genotyped markers, based on a recessive inheritance model, was performed using PSEUDOMARKER 2.036,37 based on the haplotype-based haplotype relative risk (HHRR) method and the classical linkage analysis. An additional linkage analysis using a dominant inheritance model was performed to confirm the overlapping peaks seen with the recessive inheritance model. In the second stage, a computationally intensive Pseudomarker analysis was carried out on these candidate markers followed by linkage and association tests. The two-stage method of twostage.py was adopted for the analysis. In line with the previously proposed criteria by Lander and Kruglyak38 for linkage signals, we considered evidence of linkage as either significant (logarithm of the odds (LOD) ≥ 3.6; P ≤ 2.2 × 10–5) or suggestive (LOD ≥ 3.0; P ≤ 7.4 × 10–4). We tested all four models entailing combinations of with or without linkage versus with or without linkage disequilibrium. The SNPs passing this criterion was annotated for genes, functional consequences, effect prediction and allele frequencies from public databases using Variant Effect Prediction (VEP) tool and interrogated allele frequencies of the SNPs with the Genome Aggregation Database (gnomAD).

The flowchart of the performed procedures, from exome sequencing to identification of linkage signals, is presented in Fig. 1.

Figure 1
figure 1

Flowchart of the procedures performed to identify type 1 diabetes-associated linkage signals.

Building evidence for the linkage signals

The variants with LOD scores of ≥ 3 were investigated for their role in disease etiology by analyzing data on tissue-specific expression of the quantitative trait locus (eQTL) from resources such as the NEPTUNE patient characteristics for subjects in Expression quantitative trait loci (NephQTL)39 and genotype-tissue expression (GTEx)-eQTL40. Further, by way of utilizing the mQTLdb41 (http://www.mqtldb.org/), which is a public resource on lead mQTL loci influencing methylation variation in pregnancy–birth–childhood life courses, we examined the genetic influence of the study variants on differential methylation of CpG sites at different life stages. We examined genome-wide association studies (GWAS) catalog42 and literature reports to evaluate the evidence of associations between the lead variants and disease phenotypes. In addition, we investigated the genetic and functional annotation data pertaining to the shortlisted SNPs using Type 1 diabetes Knowledge Portal (T1D-KP)43.

Tag SNP selection and linkage disequilibrium analysis

We shortlisted all Linkage disequilibrium (LD) tag SNPs within SLC17A1, SLC17A3, TATDN2 and TMEM131L genes and their 1000 bp flanking regions on both 3’ and 5’ end using TagSNP tool, a modified version of TAGster software from the SNPinfo Web Server (National Institute for Environmental Health Sciences; http://snpinfo.niehs.nih.gov/)44. LD was measured by squared correlation (r2) method and the analysis was performed based on dbSNP data. The minimum number of genotype pairs to calculate LD was adopted as 5 and the maximum allowed physical distance between two SNPs for LD calculation was set for 250 KB, and minor allele frequencies were set 0.05–0.5.

Statistical analysis

Differences in the mean values of plasma glucose and uric acid among individuals with T1D, were calculated using the Mann Whitney test. A p-value of < 0.05 is considered as significant.

Ethics approval and consent to participate

The study was approved by the Ethical Review Committee of Dasman Diabetes Institute and was performed in accordance with the principles of the Declaration of Helsinki, as revised in 2008. Written informed consent was obtained from all study participants. In the case of children, informed consent was obtained from the parents/legal guardians, and an assent was obtained from children aged seven years or older.

Results

The clinical characteristics of the study participants are detailed in Table 1. Most of the affected people (76%) were children under the age of 18 years; the age at onset of T1D varied from 6 months to 28 years. Approximately 30% of the patients with T1D were overweight. No significant differences in sex distribution were observed among the affected people.

Table 1 Clinical characteristics of patients with type 1 diabetes in the study cohort.

Whole exome based linkage analysis of T1D families

A total of 18 Kuwaiti T1D families consisting of 37 T1D affected individuals and 49 unaffected first-degree relatives were considered. This includes 11 families with sib-pairs concordant for T1D (61%), four families with either a father or mother with T1D (22%) and two families had second degree relatives with T1D (17%). We analyzed exomes of 37 T1D individuals and detected a total of 217,354 single nucleotide variants, of which 46,565 were non-synonymous and 44,513 were synonymous variants. Up to 5% of the detected single nucleotide variants were observed to be novel including personal mutations. A total of 206,203 variants with known rsID were classified as frameshift deletion (n = 382), frameshift insertion (n = 202), non-frameshift deletion (n = 829) and non-frameshift insertion (n = 326). We further detected 118 start loss, 555 stop gain, 60 stop loss and 1 undefined mutation in our cohort.

Five types of linkage analysis tests were performed: tests for linkage with or without allowing for linkage disequilibrium, tests for linkage disequilibrium with or without allowing for linkage, and joint test of linkage and linkage disequilibrium. The analysis pointed to three sites that mapped to chromosomes 3, 4, and 6 (Fig. 2), at which the evidence of linkage under a recessive mode of inheritance was either significant (LOD ≥ 3.6; P ≤ 2.2 × 10–5) or suggestive (LOD ≥ 3.0; P ≤ 7.4 × 10–4) (Table 2). The significant linkage signal pointed to the chromosomal region 6p22.2, spanning two genes, namely, SLC17A1 and SLC17A3. The candidate SNPs from this region were rs1165196 (SLC17A1:p.T269I) and rs942379 (SLC17A3:p.S370S). The two suggestive linkage signals were seen at chromosomal regions 3p25.3 and 4q31.3, harboring TATDN2 and TMEM131L, respectively. The candidate SNPs from these two regions were rs394558 (TATDN2:p.V256I) and rs6848033 (TMEM131L:p.R190R). These three linkage signals were also seen when the analysis was performed under a dominant mode of inheritance, albeit with no statistical significance.

Figure 2
figure 2

Distribution of the logarithm of the odds (LOD) score across chromosomes 3, 4 and 6.

Table 2 Depicts genome-wide linkage signals identified by pseudomarker using recessive and dominant inheritance model.

We further investigated additional coding region variants from TATDN2, TMEM131L, SLCA17A1 and SLC17A3 in our cohort of 18 T1D families consisting of 37 T1D cases and 49 unaffected first-degree relatives (Table 3) based on their enrichment in T1D cohort, their segregation pattern in T1D affected versus unaffected individuals and furthermore based on the supportive evidence from the GTEx, NephQTL and T1D knowledge portal. We tended to prioritize coding region missense variants, since clinical interpretation of the effect of the missense variants is more amenable with missense variants than with non-coding variants. A missense variant from SLC17A3, rs56027330, was identified in 6 index cases from 6 T1D families, and in one unaffected relative. These 6 families showed no relatedness as confirmed by examination of kinship using the KING tool. The homozygous alternate allele genotype (TT) at this variant was detected in 6 of 37 T1D children at a frequency of 16.2% (as compared to 1 in 49 unaffected family members), while the 1000 genome phase 3 project reports it as a rare genotype (0.9%). The frequency of this genotype differed considerably across the different ethnicities 0% in AFR, 1.7% in AMR, 0.4% in EAS, 2.4% in EUR, 0.6% in SAS. We additionally detected the homozygous alternate allele genotype (TT) in 3 sporadic T1D cases from another dataset available to us with no family information. When we examined a population-based exome dataset of 291 individuals, with none diagnosed for T1D from Kuwait, only 3.4% (10 individuals) had homozygous alternate TT genotype and 5 of these 10 were diagnosed with T2D.

Table 3 Additional missense variants from the candidate genes as seen in 18 T1D families consisting of 37 T1D cases and 49 unaffected first-degree relatives.

Association of lead variants with clinical phenotypes of T1D

We further investigated the influence of the lead risk variants on the phenotypic traits relating to the study participants. A statistically significant increase in plasma glucose measurements was observed among individuals with T1D carrying the SLC17A1 rs1165196 GA genotype, as compared to those with the GG genotype (p = 0.043) (Fig. 3a). Irrespective of their diabetic status, individuals carrying the GA genotype showed a significantly higher plasma glucose measurement, as compared to those carrying the GG genotype (p = 0.044). Similarly, patients with T1D carrying the SLC17A3 rs942379 AG genotype showed significantly higher uric acid concentration compared with those with the GG genotype (P = 0.03) (Fig. 3b). As there were only 10 non-diabetic participants with available data on uric acid measurements, analysis with the combined cohort was not performed. The other two linked variants did not show any statistically significant impact on the clinical profiles of the patients with T1D.

Figure 3
figure 3

Plasma glucose and uric acid measurements among individuals carrying risk genotypes compared to non-risk genotypes at rs1165196 (risk allele = A; non-risk allele = G) and rs942379 (risk allele = G; non-risk allele = A). (a) Differences in the mean values for plasma glucose measurements among the cohort (left panel) or among individuals with type 1 diabetes (right panel) carrying the risk GA genotypes, as compared to those with the non-risk GG genotype at SLC17A1:rs1165196. (b) Differences in the mean values for uric acid measurements among the type 1 diabetes individuals carrying the risk AG genotype, as compared to those with the non-risk AA genotype at SLC17A3:rs942379. There were only 10 non-diabetic participants with available uric acid measurements.

Lead variants and quantitative trait loci (QTL) analysis

Further, examination of the mQTLdb revealed that the lead linkage variants SLC17A1 rs1165196, SLC17A3 rs942379, and TATDN2 rs394558 had genetic influence on DNA methylation at various CpG sites in different stages of life (Supplementary Table S1). Depending on the genotypes at the above-mentioned three SNPs, we found 26 differentially methylated CpG sites at the life stage of pregnancy, 18 sites at birth, 26 sites at childhood, 25 sites at adolescence, and 21 sites at middle-age (Fig. 4). A cross-comparison of these CpG sites across all the life stages found that 16 of them were common across all life stages, three sites were common between pregnancy–childhood–adolescence–middle age, one between pregnancy–birth–childhood–adolescence, two between pregnancy–childhood–adolescence, and one each between childhood–adolescence, adolescence–pregnancy, pregnancy–middle age, and pregnancy–birth. Furthermore, one unique CpG site in pregnancy and three unique CpG sites in childhood were seen. The direction of methylation among the 16 shared CpG sites across the different life stages showed no significant variation among the tested individuals (Fig. 5). The three CpG sites (Table 4) that were seen as unique to the childhood stage could be the key signatures mediating phenotypic changes in children. Two of these three CpG sites, namely cg17978425 (Beta = − 0.30; P = 1.40E-12) and cg10346111 (Beta = 0.25; P = 1.95E-09), are both regulated by TATDN2 rs394558. However, they differed in the direction of methylation with the former CpG site being hypo-methylated and the latter hyper-methylated (see Table 4). Further, the prioritized missense variant SLC17A3 rs56027330 also had genetic influence on DNA methylation at five CpG sites in different stages of life (see Supplementary Table S1); all these five sites are also influenced by the linked lead variants. One of these five CpG sites, namely cg12310025, is common across the five life stages of adolescence, birth, childhood, middle age, and pregnancy; two of these five sites, namely cg03264133 and cg03517284, are common across four of the above five life stages; two other CpG, namely cg23140839 and cg07061783, are seen in only one of the life stages.

Figure 4
figure 4

An UpSet plot illustrating cross-comparison of CpG sites across various life stages. The line joining the black dots indicates shared CpGs in the stages of life, while the vertical bar shows number of CpGs shared between the stages of life. The horizontal bar indicates the number of CpGs in each life stage.

Figure 5
figure 5

Shared CpGs across different life stages. Consistent methylation direction in 16 shared CpG sites across the life stages, suggesting that these shared CpGs have no contribution to childhood type 1 diabetes.

Table 4 mQTL statistics of CpGs regulated by rs394558 and rs942379 found to be unique in childhood.

Examination of the NephQTL database to determine which of the lead variants act as eQTL in kidney revealed that the SLC17A1 rs1165196, SLC17A3 rs942379, rs56027330, and TATDN2 rs394558 exhibited genotype-based differential expression of genes in either glomerular or tubular tissues (Table 5). Similarly, the GTEx database depicted rs942379 and rs394558 as eQTL variants that influence the expression of certain key genes in the pancreas and testes (Table 6). The prioritized SLC17A3 rs56027330 variant also tends to significantly impact the expression of several key genes such as SLC17A1, SLC17A3, and TRIM38 in multiple tissues indicating its pleiotropic effect (Table 7).

Table 5 Variants identified as eQTL with kidney tissue-specific expression from the NephQTL database.
Table 6 Variants identified as QTL from the GTEx portal.
Table 7 Identified eQTL related to rs56027330 variant from the GTEx portal.

Functional annotations of lead variants based on T1D knowledge portal

Supportive evidence from the T1D Knowledge portal indicates significant association of the lead variant from SLC17A1 (rs1165196) and SLC17A3 (rs942379) with T1D, HbA1C, serum urate, and uric acid. TATDN2 rs394558 variant also showed significant association with HbA1C, Matsuda insulin sensitivity index and several other insulin related parameters as shown in (Table 8). Similarly, TMEM131L rs6848033 variant presented evidence for significant association with Matsuda insulin sensitivity index (ISI), fasting insulin, HOMAIR, and HOMAB. Additionally, the missense variant rs56027330 shortlisted by our study consistently showed association with T1D, HbA1C, serum urate and diabetic nephropathy alongside several other nephropathy-related traits such as creatinine, sodium excretion, and mean corpuscular hemoglobin.

Table 8 Phenome wide associations related to lead variants from the Type 1 diabetes knowledge portal.

Tag SNP and QTL associations

As an indirect approach, we captured the extended list of variants that are in linkage disequilibrium (LD) with the lead variants by haplotype-tagging method. Notably, the lead variants from SLC17A1 rs1165196 and SLC17A3 rs942379 are completely in LD with each other. The schematic diagram of all tag SNPs and their pattern of Linkage disequilibrium (LD) is shown in Supplementary Fig. S3. SLC17A1 rs1165196 represented a haplotype block consisting of 23 SNPS, while SLC17A3 rs942379 and TATDN2 rs394558 captured 4 variants each in LD, while TMEM131L rs6848033 variant was observed in singleton LD.

We investigated the allelic effect of the captured tag SNPs using mQTLdb and NephQTL databases. Tag SNPs captured by SLC17A1 rs1165196 variant tends to profoundly impact 1752 CpG sites, while those captured by SLC17A3 rs942379 variant are likely to influence 337 CpG site. Likewise, the tag SNPs captured by TATDN2 rs394558 variant significantly associates with 126 CpG sites across different life stages.

Evidence from the NephQTL database indicates that tag SNPs associated with SLC17A1 rs1165196 and SLC17A3 rs942379 variants collectively tend to impact the expression of HIST1H2BE in glomerulus. Tag-SNPs captured by SLC17A3 rs942379 variant additionally impacts the expression of BTN3A2 in glomerulus and those captured by TATDN2 rs394558 variant combinedly affects the expression of ATP2B2 in glomerulus further elucidating its role in nephropathy related traits.

Discussion

Variation in the risk of developing T1D can largely be attributed to genetic predisposition, gene–environment interactions, and other prevailing factors such as ethnic disparities and rapid changes in lifestyle and dietary practices. It is challenging to identify the unique risk variants that segregate in the affected members of multiple families. In the present study, we aimed at minimizing heterogeneity and confounding factors by adopting a family-based linkage and fine-mapping approach to pinpoint genetic markers underlying T1D in the Kuwaiti population. We identified three potential loci with a LOD score of ≥ 3, spanning four candidate genes, namely SLC17A1, SLC17A3, TATDN2, and TMEM131L in Kuwaiti families diagnosed with T1D. The identified lead linkage variants (SLC17A1-rs1165196: pT269I, SLC17A3-rs942379: p.S370S, TATDN2-rs394558:p.V256I, and TMEM131L-rs6848033:p.R190R and the missense variant with the genotype homozygous for the alternate allele (SLC17A3 rs56027330_p.G279R) enrichening the affected members of Kuwaiti T1D families have not been previously linked with T1D and could be novel contributions to the existing literature.

Though the SLC17A3 rs56027330 missense variant is a common variant in the 1000 Genomes Phase3 Project samples at 7% MAF, the genotype homozygous for the alternate allele (TT) is rare at 0.9% in the 1000 Genomes Phase 3 Project samples; the frequency differed considerably among the continental populations—2.4% in EUR, 1.7% in AMR, 0.6% in SAS, 0.4% in EAS, and 0% in AFR. It is interesting to note that this genotype is seen in 16.2% of the T1D individuals in our study cohort. Furthermore, we observe this genotype in 3 T1D individuals from another dataset available to us with no family information. In the population-based exome dataset of 291 individuals, with none diagnosed for T1D from Kuwait, only 10 individuals (3.4%) had the homozygous alternate genotype and 5 of these 10 were diagnosed with T2D. The SIFT score (0.0 (deleterious) to 1.0 (tolerated) for this variant is 0.16 much lower than the minimum passing score of 0.4. Existing evidence from T1D Knowledge Portal, linking rs56027330 variant to T1D on GWAS meta-analysis of European ancestry, further substantiates the significance of the shortlisted variant in T1D etiology43. It is also the case that the T1D Knowledge Portal provides evidence for association of the variant with HbA1C trait in data sets including the UK Biobank on European ancestry, and the AMP T2DKP on European, East Asian and trans-ancestry. The observed recessive model of association for the rs56027330 variant with T1D in our study cohort is important given the accumulation of recessive alleles in the Arab population, whose genetic profile has evolved through practice of consanguinity and the resultant inbreeding45,46. Given the small size of the study cohort, it is recommended that further replication studies are performed in future to unravel the homozygous effect of rs56027330 variant on T1D and glycemic traits.

The top priority lead linkage signals from chromosome 6p22.2 revealed two candidate genes—SLC17A1 and SLC17A3. Both genes are primarily involved in urate metabolism and transport47,48. Given that 50% of patients with T1D are prone to developing diabetic kidney disease and that serum uric acid may be a modifiable risk factor of nephropathy in T1D49,50,51, the identification of a genetic variant that affects urate concentration is of high clinical significance, especially that high uric acid level has been implicated in β-cell dysfunction52 Our study can potentially contribute to the predictive roles of SCL17A1 rs1165196 and SLC17A3 rs942379 variants in the early diagnosis and prediction of T1D and its complications, particularly diabetic kidney disease. In agreement with this, large-scale GWAS have specifically associated the SCL17A1 rs1165196 variant with uric acid concentration in both European53,54,55 and Japanese individuals56, and with gout in those with a European background57. Other SNPs from SLC17A3 are also known to be associated with uric acid concentrations in European58 and Korean individuals59, and with urinary metabolite measurements in Europeans with chronic kidney disorder60. Moreover, SLC17A3 has been linked to gout susceptibility61. A recent study has shown a significant effect of the SLC17A3 rs942379 variant on the longitudinal measurement of uric acid in women58. The NephQTL database additionally indicates the influential role of SCL17A1 rs1165196 and SLC17A3 rs942379 variants in regulating allele-specific expression of specific histone and solute carrier genes in kidney tissues, and this observation further indicates these variants as potential markers for progression to renal disease in T1D. Likewise, SLC17A3 rs942379, which is a synonymous variant, seems to regulate HIST1H2BB via the mQTL for cg12588279 CpG site.

The prioritized missense variant SLC17A3 rs56027330 also had genetic influence on DNA methylation at five CpG sites in different stages of life. The variant also tends to significantly impact the expression of several key genes such as SLC17A1, SLC17A3, and TRIM38 in multiple tissues. Currently, there is no functional evidence linking SCL17A1 and SLC17A3 to T1D etiology. Evidence obtained from single tissue QTL analysis indicates the allelic effect of SLC17A3 rs942379 on the expression of TRIM38 in colon and pancreatic tissues, while SLC17A1 rs1165196 tends to be associated only with the expression TRIM38 in colon tissues. Given the fact that TRIM38 clusters within major histocompatibility region-1 and caters multiple innate immune and inflammatory responses62,63 its direct association with the linked variants may suggest a common pathogenesis that needs to be further validated.

The two additional loci detected in our study include chromosome 3p25.3 and 4q31.3 regions encoding KIAA genes, namely TATDN2 (KIAA0218) and TMEM131L (KIAA0922), respectively. TATDN2 is predominantly involved in metal ion binding and has been largely related to the unfolded protein response pathway, which in turn plays a crucial role in alleviating endoplasmic reticulum stress by facilitating the degradation of misfolded proteins64. A defective unfolded protein response has been proposed to contribute to β-cell apoptosis, leading to T1D64,65. TMEM13L is negatively involved in regulating the Wnt signaling and thymocyte proliferation pathways. The canonical Wnt signaling pathway has been widely implicated in various renal diseases, such as diabetic nephropathy, fibrosis, renal cancer, and renal failure66,67,68.

The functional roles of TATDN2 rs394558 and TMEM131L rs6848033 are yet to be well-characterized. The NephQTL database shows that the TATDN2 rs394558 variant tends to significantly impact the expression of ATP2B2 in glomerulus tissue, indicating its potential role in renal function. TATDN2 rs394558 variant also significantly influences the expression profile of several key genes such as IRAK2 in subcutaneous adipose tissue and TATDN2, GHRLOS, and LINC00852 in pancreatic tissues. IRAK2 is a member of serine-threonine kinase family that plays a critical role in regulating inflammatory responses69,70, by mediating interleukin-1 induced activation of NF-Kappa-B signaling (PMID 21495968). TATDN2 rs394558 is also predicted to be involved in regulating genes such as TATDN2, GHRL, GHRLOS, and LINC00852, via mQTL for cg10346111 CpG sites. The variant may engage in crosstalk with DNA methylation, and thereby impact the transcription of these genes or their nearby genes.

Interestingly, TATDN2 rs394558 variant tends to impact the expression of both GHRL and its antisense gene GHRLOS. Functionally, GHRL encodes the ghrelin–obestatin preproprotein that is cleaved to yield ghrelin and obestatin. Ghrelin is a powerful appetite stimulant that plays a vital role in energy homeostasis and pancreatic glucose-stimulated insulin secretion, while obestatin is involved in multiple metabolic events, including adipocyte function and glucose metabolism71. GHRLOS (ghrelin opposite strand/antisense RNA) is an antisense transcript for the GHRL gene that may potentially regulate the expression of ghrelin–obestatin preproprotein. The circulating levels of GHRLOS and LINC00852 are reported to be elevated in type 2 diabetes mellitus72, while its precise role in T1D remains to be explored.

Our study highlights genes linked to T1D and their prospective role in disease etiology by regulating the expression of genes implicated in inflammatory and metabolic events, as evidenced by publicly available QTL data. Further replicative and longitudinal studies are needed in the future to validate the associations for these loci obtained in the present study.

Conclusion

Very few studies have explored the genetic epidemiology of T1D in Arab populations19,32. The present study suggests SLC17A1, SLC17A3, TATDN2, and TMEM131L as candidate genes linked to T1D in Kuwaiti families. The supportive evidence obtained from mQTLdb, NephQTL, GTEx and T1D KP databases provides a suggestive role for SCL17A1 and SLC17A3 in T1D susceptibility. The study further highlights the SLC17A3 rs56027330_p.G279R variant as a potential marker for T1D; the variant associates with T1D under the genetic model for recessive mode of inheritance. This observation is interesting given the accumulation of recessive alleles in the Arab population, whose genetic profile has evolved through practices of consanguinity and the resultant inbreeding. Thus, this variant deserves further follow-up. A major limitation is the small size of the study cohort and the lack of an independent cohort for replicating the findings. It is hoped that future studies from the region aim at replicating these findings in a larger independent T1D cohort and aim at delineating the mechanistic details of the impact of these novel candidate genes on the pathophysiology of T1D.