Introduction

Type 1 diabetes (T1D) is caused by T-cell mediated autoimmune destruction of pancreatic β-cells1. There is no cure for T1D to date. The molecular mechanisms underlying T1D are complex and not completely understood. Human genetic studies have uncovered multiple T1D genes that contribute to our understanding of the pathogenesis ofT1D2,3,4,5,6,7. With the rapid advances in human genomics technology in recent years, over 70 T1D loci have been identified8 (https://www.ebi.ac.uk/gwas/). While these discoveries of T1D-associated genes have greatly increased our knowledge of T1D, our current genetic knowledge on T1D is far from complete, and a large number of T1D genes remain uncovered9. A key bottleneck for the GWAS approach is limitation of sample size even with the presence of collaborative international consortia10. The phenotype of type 1 diabetes has been regarded as heterogeneous. While the majority of T1D patients have autoimmune disease, 5–10% of Caucasian diabetic subjects with recent-onset T1D do not have islet cell antibodies, often referred to as T1bD11. Due to different pathogenesis, T1bD cases may be associated with different genetic loci from autoimmune T1D, or T1aD. However, the smaller proportion of T1bD cases suggests that T1bD-related genetic effects have been diluted in the previous studies with T1D cases studied in general. Besides T1bD, the non-autoimmune and monogenic form of pediatric diabetes, maturity-onset diabetes of the young (MODY) cases, may be misdiagnosed as T1D12, which further contributes to the heterogeneity of the T1D phenotype.

With numerous genetic loci for many human complex diseases identified to date, polygenic risk scores (PRS) aggregate the effects of many genetic variants across the human genome into a single score, an approach that has been shown of improve disease prediction and differential diagnosis13. The T1D loci identified by the GWAS studies to date are mainly associated with the genetic susceptibility of the major component of the heterogeneous T1D phenotype, i.e. T1aD, while the genetic susceptibility of the minor non-autoimmune components (e.g. T1bD and misdiagnosed MODY) are under-represented in those results likely as a result of being diluted. In this study, we propose that a high T1D PRS score predicts or suggests a T1aD case, whereas a low T1D PRS score in a T1D case suggests the opposite and represents our major interest in this study. Our aim in this study is to identify low PRS T1D cases and to run a separate GWAS in an attempt to uncover genetic loci associated with T1bD patients. Our approach effectively concentrates the dilution of non-mainstream T1D by excluding high PRS T1D cases, to uncover novel genetic loci associated with non-mainstream T1D. Therefore, the dilution of low PRS T1D by misdiagnosed MODY is not a concern. On the other hand, although the low PRS cases may include MODY patients, there are no MODY mutation identified with genome-wide significance in this GWAS study, which is as expected while next generation sequencing, e.g. whole exome sequencing, is the more proper approach.

Methods

Subjects

6599 European T1D cases and 12,350 European controls were included in this study. The T1D cases were from the Children's Hospital of Philadelphia (CHOP)14, The Montreal Children's Hospital14, The Diabetes Control and Complications Trial—Epidemiology of Diabetes Interventions and Complications (DCCT-EDIC) cohort (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000086.v2.p1), the Type 1 Diabetes Genetics Consortium (T1DGC, http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000180.v1.p1), and later recruited subjects at CHOP, respectively. The T1D cases were mainly recruited by clinical diagnosis, i.e. insulin dependent for at least 6 months, and diagnosed under the age of 18 for the subjects recruited at CHOP and Montreal. The non-mainstream T1D cases in this study were defined by low T1D PRS scores, with the cut-off value of PRC determined by Receiver Operating Characteristic (ROC) curve analysis. The included cases were all confirmed of European ancestry by principal component analysis (PCA) with genome-wide SNP markers, with individuals from other populations or with admixed ancestries excluded. The genotyping was done by the Illumina Human Hap550 Genotyping BeadChip or a newer version of Illumina Genotyping BeadChip. Other demographic, phenotypic and genotypic details about these individuals were described in our previous publication15. Imputation of single nucleotide polymorphisms (SNP) on auto-chromosomes was done using the TOPMed Imputation Server (https://imputation.biodatacatalyst.nhlbi.nih.gov) with the TOPMed (Version R2 on GRC38) Reference Panel, with the quality filters of R2 ≥ 0.3. Altogether, 104,689,647 autosomal single nucleotide variants (SNV) with quality R2 ≥ 0.3 were included in this study. Population stratification was assessed by PCA analysis, and genetic association tests conditioned on sex were corrected by the first 10 principal components (PC). The association test was done using PLINK1.9 software16.

Polygenic risk scores (PRS)

To avoid the issue of overfitting for PRS scoring, the subjects were randomly split into two independent cohorts without duplication, i.e. the PRS training cohort (Cohort A) including 3302 T1D cases (1739 males, 1560 females, and 3 cases with undetermined sex) and 6181 controls(3326 males, 2840 females, and 15 cases with undetermined sex), and the PRS testing cohort (Cohort B) including 3297 T1D cases (1744 males, 1549 females, and 4 cases with undetermined sex) and 6169 controls (3339 males, 2818 females, and 12 cases with undetermined sex). PRSs of the test cohort were calculated using the Polygenic Risk Score software (PRSice-2)17, based on the statistics of the training group. The performance of a series of cutoff of T1D association P values (including 10–10, 10–9 , 10–8, 10–7, 10–6, 10–5, 10–4, 0.001, 0.01, 0.05, 0.1, 0.2, and 1) for selection of SNP markers was assessed by the Area Under the ROC Curve (AUC). The P value cutoff with the largest AUC was adopted.

GWAS of T1D patients with low PRS

The flow chart of the study approach is shown in Fig. 1. According to the PRS values, the T1D patients were separated into two groups, i.e. a low PRS group and a high PRS group. The PRS cutoff was determined by the maximum Matthews correlation coefficient (MCC). Using the same PRS cutoff, health controls with low T1D PRS were identified. The GWAS of T1D patients with low PRS was performed by comparing to health controls with low T1D PRS. The Manhattan plots were done using the SNPEVG software18. Genetic association signals within each locus were plotted by LocusZoom19.

Figure 1
figure 1

The flow chart of the study approach.

Cohort switch

Consequently, we switched the two cohorts, i.e. using Cohort B for the statistics of PRS modelling, then we tested the PRS models in Cohort A. GWAS of T1D patients with low PRS was done using the same approach as described above.

Data and resource availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author upon reasonable request.

All methods were carried out in accordance with relevant guidelines and regulations. The study was approved by the Institutional Review Boards of Children’s Hospital of Philadelphia (CHOP). Written informed consent was obtained from each participating subject or, if subjects are under 18, their parent/guardian.

Significance statement

Type 1 diabetes (T1D) is a highly heterogeneous genetic disease. Human genetic and genomic study on T1D has gained us significant knowledge on the molecular basis of autoimmunity in T1D. However, it has been recognized for long that a small number of T1D cases present without autoantibodies and are considered non-autoimmune. Human genetic approach has not been helpful for the study of these patients, as genetic effects of these non-mainstream (or non-autoimmune) T1D have been diluted in the previous studies on T1D cases in general. For the first, we identified non-mainstream T1D cases represented by low T1D polygenic risk score (PRS), and identified 13 novel loci represented by rare SNVs. This study presents a brand-new genomic landscape of pediatric T1D.

Results

AUC of different cutoffs of T1D association P values for SNP selection and PRS

The AUCs of different cutoffs of T1D association P values for selection of SNP sets are shown in Table 1a. The best AUC (0.8607) is seen at the cutoff of P value ≤ 1E−05, which suggests that stricter cutoff may cause the missing of informative SNPs, while looser may introduce noise by including SNPs with spurious T1D association. Based on the SNP markers with T1D association P value ≤ 1E−05, a PRS score was acquired for each individual in the independent test cohort. By the maximum MCC (Supplementary Table 1), a PRS cutoff of 1.11E−03 has the maximum MCC (0.6294). A PRS ≤ 1.11E−03 was defined as low risk, and a PRS > 1.11E−03 was defined as high risk. With this threshold, the sensitivity (True positive rate, TPR) for T1D prediction is 75.9%, and the specificity (True negative rate, TFR) for T1D prediction is 86.4%. By PRS ≤ 1.11E−03, 805 (24.4%, including 407 males, 396 females, and 2 cases with undetermined sex) out of 3297 T1D cases had low PRS; and 5330 (86.4%, including 2882 males, 2436 females, and 12 cases with undetermined sex) out of 6169 controls had low PRS.

Table 1 The AUCs of different cutoffs of T1D association P values.

GWAS of T1D patients with low PRS

The GWAS of T1D patients with low T1D PRS compared to controls with low T1D PRS identified a large number of SNPs associated with T1D with genome-wide significance (P ≤ 5.0 × E−08), from 10 genetic loci (Supplementary Table 2, Fig. 2). Among these 10 genetic loci, 3 loci have been established of T1D association by previous studies, including HLA, INS, and PTPN22 (Table 2a). By looking at the established leading T1D signal of each locus, the frequencies of the predisposing alleles of HLA and PTPN22 were lower in the low T1D PRS cohort, while the protective allele of INS were higher in the low T1D PRS cohort. The effect sizes of HLA (P = 6.67E−08) and PTPN22 (P = 0.052) were smaller in the low PRS cases. Besides these 3 established T1D loci, 7 loci associated with low PRS T1D were identified (Table 3a). LocusZoom plots for genetic association signals within each locus are shown in Supplementary Figures 17. The association signals of these loci are only seen in low PRS T1D cases, but not in the T1D cases overall, and were missed previously due to diluted genetic effects. Among the 7 loci, 6 loci are novel, while the ankyrin 3 (ANK3) locus, related to neural control of the endocrine pancreas20, has been identified of genome-wide significance in our study on low T1D genetic risk scores (GRS) patients21,22.

Figure 2
figure 2

The Manhattan plots of cohort B. (a) The plot of the GWAS of T1D patients with low T1D PRS compared to controls with low T1D PRS (805 cases vs. 5330 controls); (b) the plot of the GWAS of all T1D patients compared to all controls (3297 cases vs. 6169 controls).

Table 2 Leading SNPs at three loci have been established of T1D association.
Table 3 Novel loci associated with low PRS T1D.

Replication of the PRS model and additional novel loci

Consequently, we switched the two cohorts, i.e. using the second cohort for the statistics of PRS modelling, then we tested the PRS models in the first cohort. The AUCs of different cutoffs of T1D association P values for selection of SNP sets are shown in Table 1b. The best AUC (0.8654) is seen at the cutoff of P value ≤ 1E−05, which repeated the PRS model in the above step. Based on the SNP markers with T1D association P value ≤ 1E−05, a PRS score was acquired for each individual in the independent test cohort. By the maximum MCC (Supplementary Table 3), a PRS cutoff of 7.18E−04 has the maximum MCC (0.6294). A PRS ≤ 7.18E−04 was defined as low risk, and a PRS > 7.18E−04 was defined as high risk. With this threshold, the sensitivity (True positive rate, TPR) for T1D prediction is 66.0%, and the specificity (True negative rate, TFR) for T1D prediction is 93.6%. By PRS ≤ 7.18E−04, 907 (27.5%, including 433 males, 472 females, and 2 cases with undetermined sex) out of 3302 T1D cases had low PRS; and 5567 (90.1%, including 2997 males, 2558 females, and 12 cases with undetermined sex) out of 6181 controls had low PRS.

As expected from the above results, in the switched cohort, the GWAS of T1D patients with low T1D PRS compared to controls with low T1D PRS identified a large number of SNPs associated with T1D with genome-wide significance (P ≤ 5.0 × E−08) as well (Supplementary Table  4, Fig. 3). Among these loci, 4 loci have been established of T1D association by previous studies, including HLA, INS, PTPN22, IKZF4/RPS26/ERBB3, and the locus (Table 2b). Consistent to the first GWAS results listed above, by looking at the established leading T1D signal of each locus, the frequencies of the predisposing alleles of HLA, PTPN22 and IKZF4 were lower in the low T1D PRS cohort, while the protective allele of INS were higher in the low T1D PRS cohort. The effect size of the leading HLA SNP was significantly smaller in the low PRS cases (P = 5.49E−11). Besides these established T1D loci, 18 loci associated with low PRS T1D were identified in this cohort (Table 3b). LocusZoom plots for genetic association signals within each locus are shown in Supplementary Figures 825. Among the 18 loci, 16 loci are novel, while the Notch ligand Delta-like 1 (DLL1) locus, with the gene function essential for pancreatic islet homeostasis23, have been identified by our gene-based association study on low PRS T1D24. The other locus, containing the UDP-GlcNAc:betaGal beta-1,3-N-acetylglucosaminyltransferase 2 gene (B3GNT2) and transmembrane protein 17 gene (TMEM17), is ~ 200 kb from the Eps15 homology domain binding protein 1 locus (EHBP1) that has been identified of genome-wide significance in our study on low T1D GRS patients21.

Figure 3
figure 3

The Manhattan plots of cohort A. (a) The plot of the GWAS of T1D patients with low T1D PRS compared to controls with low T1D PRS (907 cases vs. 5567 controls); (b) the plot of the GWAS of all T1D patients compared to all controls (3302 cases vs. 6181 controls).

Discussion

Altogether, rare variants (MAF < 5%) from 22 novel loci were identified in the low PRS T1D cases with genome-wide significance (P < 5.00E−08), in addition to the 4 established T1D loci, 2 reported loci in the low GRS patients, and 1 locus by our gene-based study. The genome-wide significant association signals of these loci are only seen in low PRS T1D cases, but not in the T1D cases overall, thus were missed previously due to rare allele frequencies and diluted genetic effects in the general T1D cohort. A number of genetic associations with body mass index (BMI), obesity, and autoimmunity, have been reported in the flanking regions of 300 kb on each side of these new loci according to the GWAS Catalog (https://www.ebi.ac.uk/gwas/, Supplementary materials for review). Further discussion on these novel loci is focused on 13 loci with high imputation quality (i.e. Quality Score r2 > 0.9). Among these loci, 9 loci are related to obesity traits (T1bD mechanism), 2 loci are related to glucose homeostasis (T1bD mechanism), and 2 loci are related to autoimmunity (T1aD mechanism).

Obesity-related/ T1bD-related loci

FAM49A/RAD51AP2 tagged by rs56806432

Two coding genes in this locus are the CYFIP related Rac1 interactor A gene (CYRIA) and the RAD51 associated protein 2 gene (RAD51AP2). CYRIA is highly expressed in brain and thyroid, while RAD51AP2 has restricted expression toward testis25. Previous GWAS has identified association of this locus with subcutaneous adipose tissue26.

NFIB tagged by rs10961435

The nuclear factor I B gene (NFIB) encodes a transcription factor in the FOXA1 transcription factor network. NFIB has been shown to play critical roles in lung and brain development. A previous study has shown that NFIB can bind with FoxA1 and modulate the transcriptional activity of FoxA127, while the later has been suggested to play a role in pancreatic and ß-cell function and non-autoimmune diabetes as discussed above. The nuclear factor I B gene (NFIB) has ubiquitous expression in fat, brain, and other tissues25. This locus has been reported of association with BMI by several GWA studies28,29,30.

LINC00841/C10orf142 tagged by rs746298

The two genes at this locus, LINC00841/C10orf142, encode two long intergenic non-protein coding RNAs (lincRNA). While the function of these two genes remain unknown, this locus has been reported of association with obesity-related traits31.

FAM136A/TGFA tagged by rs77418738

The family with sequence similarity 136 member A gene (FAM136A) encodes a mitochondrially localized protein32. The transforming growth factor alpha gene (TGFA) mediates cell–cell adhesion and activates cell proliferation, differentiation and development33. This region has been reported of association with obesity-related traits31.

CALN1 tagged by rs118182411

The calneuron 1 gene (CALN1), encoding a protein with high similarity to the calcium-binding proteins of calmodulin, is highly expressed in brain and adrenal25. This genetic region has established association with BMI by previous studies29,30.

EPHB4 tagged by rs3890144

The EPH receptor B4 gene (EPHB4) has ubiquitous expression in multiple tissues, and is involved in numerous developmental processes34. EPHB4 plays critical roles in vascular development35 and lymphatic valve development36. Previous GWAS has identified association of this locus with BMI37 and waist circumference adjusted for BMI30.

TXN/TXNDC8 tagged by rs10816957

The thioredoxin gene (TXN) has ubiquitous expression in multiple tissues, while the thioredoxin domain containing 8 gene (TXNDC8) has restricted expression toward testis25. Thioredoxin plays a protective role against oxidative stresses38. Thioredoxin interacting protein (TXNIP) has been implicated in β cells death in diabetes and is a novel potential therapeutic target of diabetes39. Previous GWAS has identified association of this locus with BMI40 and waist-to-hip ratio adjusted for BMI30.

SYT10/ALG10 tagged by rs4142676

The synaptotagmin 10 gene (SYT10) encodes a membrane protein of secretory vesicles expressed in pancreas, lung and kidney41. The ALG10 alpha-1,2-glucosyltransferase gene (ALG10) encodes a membrane-associated protein that adds the third glucose residue to the lipid-linked oligosaccharide precursor for N-glycosylation in endoplasmic reticulum (ER)42. As discussed above in the ZNF804B locus, N-glycosylation of IgG, cytokines and proteases is also a regulatory mechanism in inflammation and autoimmunity43,44 associated with different autoimmune diseases. Several previous GWASs have identified association of this locus with waist-to-hip ratio and waist-to-hip ratio adjusted for BMI30,45.

CHFR/LOC101928530/ZNF605 tagged by rs12230138

The checkpoint with forkhead and ring finger domains gene (CHFR) encodes an E3 ubiquitin-protein ligase and is involved in the DNA damage response and checkpoint regulation46. The structure and function of the gene LOC101928530 is still uncharacterized. The function of the zinc finger protein 605 gene (ZNF605) may be related to Herpes Simplex Virus 1 infection (https://pathcards.genecards.org/card/herpes_simplex_virus_1_infection). This region has been reported of association with BMI by previous study28.

Genetic loci related to glucose homeostasis (T1bD-related)

LOC730100 tagged by rs28957087

LOC730100 encodes a long non-coding RNA (ncRNA), a competing endogenous RNA for human microRNA 760 (miR-760)47. The latter inhibits the expression of the Forkhead Box A1 gene (FOXA1). As a hepatocyte nuclear factor, FOXA1, also known as HNF3A or TCF3A, regulates tissue-specific gene expression in liver and many other tissues48. FoxA1 is essential for normal pancreatic and ß-cell function and a negative regulator of the hepatocyte nuclear factor-1 (HNF1) homeobox A gene (HNF1A) and the hepatocyte nuclear factor 4, alpha gene (HNF4A)49,50. HNF1A and HNF4A are established genes causing maturity-onset diabetes of the young (MODY). The FOXA1 mutation Ser448Asn has been suggested of association with impaired glucose homeostasis50.

LINC01695/LINC00161 tagged by rs2831597

Function of the long intergenic non-protein coding RNA 1695 gene (LINC01695) is still uncharacterized. The long intergenic non-protein coding RNA 161 gene (LINC00161) encodes a functional RNA that regulates Mitogen-activated protein kinase 1 (MAPK1) expression51. The MAPK1/STAT3 pathway has been proposed as a novel diabetes target for its critical role in glucose homeostasis52.

Autoimmune-related loci

In addition to the above ALG10 locus associated with both autoimmune diseases and obesity-related traits, two other loci were identified in the low PRS T1D cases. The rare variants in these loci may represent rare forms of autoimmune diabetes with low T1D PRS53.

LINC02432/IL15 tagged by rs9790756

The long intergenic non-protein coding RNA 2432 gene (LINC02432) has higher expression in kidney and pancreas25. Interleukin 15 (IL-15) encoded by the gene IL15 is essential for regulating activation and proliferation of T and natural killer cells, and supporting lymphoid homeostasis54. IL-15 and interleukin 2 (IL-2) share many biological activities and receptor components with IL-255. IL-2 is a powerful growth factor for both T and B lymphocytes56. Both IL2 and the α chain of the IL2 receptor complex gene (IL2RA) has been established of genetic association with T1D by previous studies57,58,59.

ZNF804B tagged by rs76060515

The zinc finger protein 804B gene (ZNF804B) has been reported of association with N-linked glycosylation of human immunoglobulin G (IgG), which modulates its binding to Fc receptors43. N-glycosylation of cytokines and proteases is also a regulatory mechanism in inflammation and autoimmunity44. Changes in N-glycosylation have been associated with different autoimmune diseases, including rheumatoid arthritis60, type 1 diabetes61, Crohn's disease62.

In summary, in the genetic regions containing the 13 novel loci with high imputation quality disclosed by this study, 9 of these regions have been reported of association with obesity-related traits, BMI, or waist circumference. The correlation with obesity related traits or impaired glucose homeostasis is in keeping with non-autoimmune roles in the diabetes patients with low T1D PRS. Interestingly, the genes ZNF804B and ALG10 related to N-linked glycosylation are highlighted in this study, which may suggest the role of N-glycosylation in impaired glucose homeostasis and pediatric diabetes, while N-glycosylation is commonly altered in diabetes63. In addition, 3 loci encoding long intergenic non-protein coding RNAs (lncRNA) identified in this study emphasize the importance of lncRNAs in these diabetes patients. However, we admit that this study has limitations related to the bottleneck of sample size and data resources. The novel loci reported in this study still need replication in independent samples. In addition, the functional mechanisms of these genetic loci in diabetes warrant experimental investigation. Due to the lack of data of T1D autoantibodies in the subjects, the mixture of rare forms of autoimmune diabetes (e.g. monogenic autoimmunity53) in addition to non-autoimmune diabetes may exist as suggested by the identification of rare variants in autoimmune-related genes.