Introduction

Type 1 diabetes mellitus (T1DM) is chronic autoimmune diabetes characterized by autoimmune mediated destruction of pancreatic beta cells1. T1DM is most generally identified in children and adolescents2. Epidemiological studies have shown that the incidence of T1DM has been increasing by 2–5% globally3. T1DM is a complex disease affected by numerous environmental factors, genetic factors and their interactions4,5. Several T1DM associated complications include cardiovascular disease6, hypertension7, diabetic retinopathy8, diabetic nephropathy9, diabetic neuropathy10, obesity11 and cognitive impairment12. Therefore, it is crucial to understand the precise molecular mechanisms associated in the progression of T1DM and thus establish effective diagnostic, prognostics and therapeutic strategies.

Although the remarkable improvement is achieved in the treatment of T1DM is insulin therapy13, the long-term survival rates of T1DM still remain low worldwide. One of the major reasons is that most patients with T1DM were diagnosed at advanced stages. It is crucial to find out novel diagnostic biomarkers, prognostic biomarkers and therapeutic targets for the early diagnosis, prognosis and timely treatment of T1DM. Therefore, it is still urgent to further explore the exact molecular mechanisms of the development of T1DM. At present, several genes and signaling pathway are identified; for example vitamin D receptor (VDR)14, HLA-B and HLA-A15, HLA-DQ16, HLA‐DQB1, HLA‐DQA1 and HLA‐DRB117, IDDM218, CaMKII/NF-κB/TGF-β1 and PPAR-γ signaling pathway19, Keap1/Nrf2 signaling pathway20, HIF-1/VEGF pathway21, NLRP3 and NLRP1 inflammasomes signaling pathway22 and NO/cGMP signaling pathway23. Therefore, it is crucial to examine the accurate molecular targets included in occurrence and advancement of T1DM, in order to make a contribution to the diagnosis and treatment of T1DM.

Next generation sequencing (NGS) platform for gene expression analysis have been increasingly recognized as approaches with significant clinical value in areas such as molecular diagnosis, prognostic prediction and identification of novel therapeutic targets24. In recent years, NGS data analysis has been effective in detecting the advancement of T1DM, and even in screening biomarkers for T1DM prognosis, diagnosis and therapy. We therefore used an NGS dataset to investigate the molecular pathogenesis of T1DM.

In the present investigation, we selected NGS dataset GSE16268925, from Gene Expression Omnibus database (GEO) (http://www.ncbi.nlm.nih.gov/geo/)26 and used the DESeq2 package in R software to screen DEGs. We performed subsequent bioinformatics analysis, including gene ontology (GO) enrichment and REACTOME pathway enrichment analysis, and construction and analysis of protein–protein interaction (PPI) network, module analysis, construction and analysis of miRNA-hub gene regulatory network and TF-hub gene regulatory network. The hub genes were validated by receiver operating characteristic curve (ROC) analysis. This investigation might offer better insight into potential molecular mechanisms to examine preventive and therapeutic strategies.

Materials and methods

Data resources

NGS dataset of T1DM (GSE162689)25 was downloaded from the GEO database. The GSE162689 NGS data was composed of 27 T1DM samples and 32 normal control samples was based on the GPL24014 Ion Torrent S5 XL (Homo sapiens).

Identification of DEGs

Differentially expressed genes (DEGs) between T1DM and normal control samples were identified by using the DESeq2 package in R language software27. DEGs were considered when an adjusted P < 0.05, and a |log2 fold change|> 0.63 for up regulated genes and |log2 fold change|< − 1.3 for down regulated genes. The adjusted P values, by employing Benjamini and Hochberg false discovery rate28, were aimed to correct the occurrence of false positive results. The DEGs were presented in volcano plot and heat map drawn using a plotting tool ggplot2 and gplots based on the R language.

GO and REACTOME pathway enrichment analysis of DEGs

One online tool, g:Profiler (http://biit.cs.ut.ee/gprofiler/)29, was applied to carried out the functional annotation for DEGs. Gene Ontology (GO) (http://geneontology.org/)30 generally performs enrichment analysis of genomes. GO terms includes biological processes (BP), cellular components (CC) and molecular functions (MF) in the GO enrichment analysis. REACTOME (https://reactome.org/)31 is a comprehensive database of genomic, chemical, and systemic functional information. GO and pathway enrichment analyses were used to identify the significant GO terms and pathways. P < 0.05 was set as the cutoff criterion.

Construction of the PPI network and module analysis

PPI network was established using the IntAct Molecular Interaction Database (https://www.ebi.ac.uk/intact/)32. To assess possible PPI correlations, previously identified DEGs were mapped to the IntAct database, followed by extraction of PPI pairs with a combined score > 0.4. Cytoscape 3.8.2 software (www.cytoscape.org/)33 was then employed to visualize the PPI network, and the Cytoscape plugin Network Analyzer was used to calculate the node degree34, betweenness centrality35, stress centrality36 and closeness centrality37 of each node in PPI network. Specifically, nodes with a higher node degree, betweenness centrality, stress centrality and closeness centrality were likely to play a more vital role in maintaining the stability of the entire network. The PEWCC1 (http://apps.cytoscape.org/apps/PEWCC1)38 plug-in was applied to analyze the modules in the PPI networks, with the default parameters (node score = 0.2, K-core 2, and max depth = 100).

MiRNA-hub gene regulatory network construction

The miRNAs targeting the T1DM related were predicted using the miRNet database (https://www.mirnet.ca/)39, and those predicted by at least 14 databases (TarBase, miRTarBase, miRecords, miRanda, miR2Disease, HMDD, PhenomiR, SM2miR, PharmacomiR, EpimiR, starBase, TransmiR, ADmiRE, and TAM 2.0) were selected for constructing the miRNA-hub gene regulatory network by Cytoscape 3.8.2 software33.

TF-hub gene regulatory network construction

The TFs targeting the T1DM related were predicted using the NetworkAnalyst database (https://www.networkanalyst.ca/)40, and those predicted by RegNetwork database was selected for constructing the TF-hub gene regulatory network by Cytoscape 3.8.2 software33.

Validation of hub genes by receiver operating characteristic curve (ROC) analysis

A ROC curve analysis is an approach for visualizing, organizing and selecting classifiers based on their achievement of hub genes. A diagnostic test was firstly performed in order to estimate the diagnostic value of hub genes in T1DM. ROC curves were obtained by plotting the sensitivity, against the specificity using the R package “pROC”41. Area under the curve (AUC) was used to measure the accuracy of these diagnostic values of the hub genes An AUC > 0.9 determined that the model had a favorable fitting effect.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

No informed consent because this study does not contain human or animals participants.

Results

Identification of DEGs

On the basis of the cut‐off criteria, DEGs in GEO dataset was identified between T1DN and normal control samples (Supplementary Table S1). There were 952 DEGs, including 477 up regulated and 475 down regulated genes in GSE162689 with the threshold of adjusted P < 0.05, and a |log2 fold change|> 0.63 for up regulated genes and |log2 fold change|< − 1.3 for down regulated genes. Volcano plots (Fig. 1) showed the correlation of all DEGs from the NGS data. Heat map of the up regulated and down regulated genes were indicated in Fig. 2.

Figure 1
figure 1

Volcano plot of differentially expressed genes. Genes with a significant change of more than two-fold were selected. Green dot represented up regulated significant genes and red dot represented down regulated significant genes.

Figure 2
figure 2

Heat map of differentially expressed genes. Legend on the top left indicate log fold change of genes. (A1–A32 = normal control samples; B1–B27 = T1DM samples).

GO and REACTOME pathway enrichment analysis of DEGs

To characterize the functional roles of the above DEGs, we used GO (Table 1) and REACTOME pathway (Table 2) enrichment analyses. The BP category of the GO analysis results showed that up regulated genes were significantly enriched in multicellular organism development and nitrogen compound metabolic process. For CC, these up regulated were enriched in membrane-enclosed lumen and nuclear lumen. Moreover, up regulated genes were significantly enriched in protein binding and transcription regulator activity in the MF categories. In addition, the most significantly enriched GO terms for down regulated genes were detection of stimulus and multicellular organismal process (BP), cell periphery and plasma membrane (CC), and transmembrane signaling receptor activity and molecular transducer activity (MF). According to REACTOME pathway enrichment analysis, up regulated genes were significantly enriched in diseases of signal transduction by growth factor receptors and second messengers and formation of the cornified envelope. Down regulated genes were enriched in olfactory signaling pathway and sensory perception.

Table 1 The enriched GO terms of the up and down regulated differentially expressed genes.
Table 2 The enriched pathway terms of the up and down regulated differentially expressed genes.

Construction of the PPI network and module analysis

The PPI network of the DEGs was constructed with 5111 nodes and 9392 edges by using the IntAct database (Fig. 3). A node with a higher node degree, betweenness centrality, stress centrality and closeness centrality consider as a hub genes and are listed in Table 3. The hub genes included MYC, EGFR, LNX1, YBX1, HSP90AA1, ESR1, FN1, TK1, ANLN and SMAD9. To detect significant modules in the PPI network, the PEWCC1 plug‐in was used for analysis, and two modules that had the highest degree stood out. GO and pathway enrichment analysis showed that module 1 contained 28 nodes and 63 edges (Fig. 4A), which were associated with diseases of signal transduction by growth factor receptors and second messengers, disease, nitrogen compound metabolic process and membrane-enclosed lumen, while module 2 had 14 nodes and 30 edges (Fig. 4B), which were mainly associated with signal transduction, multicellular organismal process and detection of stimulus.

Figure 3
figure 3

PPI network of DEGs. The PPI network of DEGs was constructed using Cytoscap. Up regulated genes are marked in green; down regulated genes are marked in red.

Table 3 Topology table for up and down regulated genes.
Figure 4
figure 4

Modules of isolated form PPI of DEGs. (A) The most significant module was obtained from PPI network with 28 nodes and 63 edges for up regulated genes (B) The most significant module was obtained from PPI network with 14 nodes and 30 edges for down regulated genes. Up regulated genes are marked in green; down regulated genes are marked in red.

MiRNA-hub gene regulatory network construction

The network of miRNAs and predicted targets (hub genes) is presented in Table 4. Based on the miRNAs, a miRNA -hub gene regulatory network was constructed with 2568 nodes (miRNA: 2259; hub gene: 309) and 16,618 interaction pairs (Fig. 5). Notably, MYC targeted 194 miRNAs, including hsa-mir-4677-3p; HSP90AA1 targeted 188 miRNAs, including hsa-mir-3125; FKBP5 targeted 116 miRNAs, including hsa-mir-4779; RNPS1 targeted 109 miRNAs, including hsa-mir-548az-3p; SQSTM1 targeted 108 miRNAs, including hsa-mir-106a-5p; ANLN targeted 127 miRNAs, including hsa-mir-664a-3p; CDK1 targeted 109 miRNAs, including hsa-mir-5688;FN1 targeted 105 miRNAs, including hsa-mir-199b-3p;ESR1 targeted 98 miRNAs, including hsa-mir-206; TK1 targeted 80 miRNAs, including hsa-mir-6512-3p.

Table 4 miRNA-target gene and TF-target gene interaction.
Figure 5
figure 5

MiRNA—hub gene regulatory network. The chocolate color diamond nodes represent the key miRNAs; up regulated genes are marked in green; down regulated genes are marked in red.

TF-hub gene regulatory network construction

The network of TFs and predicted targets (hub genes) is presented in Table 4. Based on the TFs, a TF -hub gene regulatory network was constructed with 899 nodes (TF: 604; hub gene: 295) and 3542 interaction pairs (Fig. 6). Notably, MAPK3 targeted 48 TFs, including JUND; HSP90AA1 targeted 35 TFs, including HSF2; SQSTM1 targeted 34 TFs, including SMAD4; STUB1 targeted 31 TFs, including ATF6; EGFR targeted 27 TFs, including ELF3; ESR1 targeted 126 TFs, including ELF3; SMAD9 targeted 38 TFs, including ELF3; CDK1 targeted 36 TFs, including ELF3; FN1 targeted 25 TFs, including ELF3; NEK6 targeted 16 TFs, including ELF3.

Figure 6
figure 6

TF—hub gene regulatory network. The blue color triangle nodes represent the key TFs; up regulated genes are marked in green; down regulated genes are marked in red.

Validation of hub genes by receiver operating characteristic curve (ROC) analysis

As these 10 hub genes are prominently expressed in T1DM, we performed a ROC curve analysis to evaluate their sensitivity and specificity for the diagnosis of T1DM. As shown in Fig. 7, MYC, EGFR, LNX1, YBX1, HSP90AA1, ESR1, FN1, TK1, ANLN and SMAD9 achieved an AUC value of > 0.9, demonstrating that these genes have high sensitivity and specificity for T1DM diagnosis. The results suggested that MYC, EGFR, LNX1, YBX1, HSP90AA1, ESR1, FN1, TK1, ANLN and SMAD9 can be used as biomarkers for the diagnosis of T1DM.

Figure 7
figure 7

ROC curve validated the sensitivity, specificity of hub genes as a predictive biomarker for dementia prognosis. (A) MYC (B) EGFR (C) LNX1 (D) YBX1 (E) HSP90AA1 (F) ESR1 (G) FN1 (H) TK1 (I) ANLN (J) SMAD9.

Discussion

T1DM is the common forms of chronic autoimmune diabetes that affect an individual's quality of childhood life42. However, the potential causes of T1DM remain uncertain. Understanding the underlying molecular pathogenesis of T1DM is of key importance for diagnosis, prognosis and identifying drug targets. As NGS data can provide information regarding the expression levels of thousands of genes in the human genome simultaneously, this methodology has been widely used to predict the potential diagnostic and therapeutic targets for T1DM. In the present investigation, we analyzed the NGS dataset GSE162689, which includes 27 T1DM samples and 32 normal control samples. We identified 477 up regulated and 475 down regulated genes between T1DM samples and normal control samples using DESeq2 package in R language software. FGA (fibrinogen alpha chain)43 and FGB (fibrinogen beta chain)44 levels are correlated with disease severity in patients with cardiovascular disease, but these genes might provide new targets for the development of drugs to treat T1DM. IGF245, IAPP (islet amyloid polypeptide)46, INS (insulin)47 and MAFA (MAF bZIP transcription factor A)48 are proved to be involved in T1DM. Altered expression of ADCYAP1 was observed to be associated with the progression of type 2 diabetes mellitus49. Gold et al.50 reported that CSNK1G1 might be essential for cognitive impairment. Therefore, these genes are might be essential in the advancement of T1DM and its complications.

Furthermore, we investigated the biological functions of these DEGs by using online website, and GO and pathway enrichment analysis. Husemoen et al.51, Zhang et al.52, Hartz et al.53, Słomiński et al.54, Johansson et al.55, Pan et al.56, Lopez-Sanz et al.57, Grant58, Słomiński et al.59, Galán et al.60, Jordan et al.61, Winkler et al.62, Yip et al.63, Crookshank et al.64, Lempainen et al.65, Qu and Polychronakos66, Morrison et al.67, Zhang et al.68, Gerlinger-Romero et al.69, Belanger et al.70, Dieter et al.71, Wanic et al.72, Ushijima Wanic et al.73, Guo et al.74, Davis et al.75, Elbarbary et al.76, Villasenor et al.77, Zhang et al.78, Lee et al.79, Zhi et al.80, Li Calzi et al.81, Sebastiani et al.82, Cherney et al.83, Doggrell84 and Yanagihara et al.85 studied the clinical and prognostic values of FLG (filaggrin), FGF21, PEMT (phosphatidylethanolamine N-methyltransferase) KL (klotho), CEL (carboxyl ester lipase), FOSL2, STAT1, TCF7L2, TP53, EGFR (epidermal growth factor receptor), ETS1, KCNJ8, DEAF1, GCG (glucagon), IKZF4, OAS1, IRS1, ABCG2, FBXO32, PTBP1, BACH2, CNDP2, KLF11, MT1E, DPP4, SLC29A3, RGS16, MAS1, GCGR (glucagon receptor), HLA-C, VASP (vasodilator stimulated phosphoprotein), CCR2, PTGS2, GLP1R and JMJD6 in patients with T1DM. Vassilev et al.86, Qin et al.87, Ma et al.88, West et al.89, Hoffmann et al.90, Deary et al.91, Belangero et al.92, Jung et al.93, Tang et al.94, Goodier et al.95, Petyuk et al.96, Roux et al.97, Castrogiovanni et al.98, Suleiman et al.99, Haack et al.100, Kwiatkowski et al.101, Pinacho et al.102, Luo et al.103, He et al.104, Moudi et al.105, Thevenon et al.106, Li et al.107, Reitz et al.108, Jenkins and Escayg109, Letronne et al.110, Ma et al.111, Chabbert et al.112, Abramsson et al.113, Aeby et al.114 and Roll et al.115 showed the diagnostic values of genes include DCC (DCC netrin 1 receptor), PLP1, SNX19, SH3RF1, TNFRSF1A, NCSTN (nicastrin), DGCR2, NPAS2, CDNF (cerebral dopamine neurotrophic factor), SMCR8, HSPA2, STUB1, CHID1, ATP13A2, SQSTM1, LIG3, SP4, ACSL6, ERN1, ATF6B, LRFN2, NRG3, LRRTM3, GABRA2, ADAM30, GABRR2, TSHZ3, LOXL1, SCN1B and SRPX2 in patients with cognitive impairment. Previous studies have shown that genes include KCP (kielin cysteine rich BMP regulator)116, NOG (noggin)117, COL6A3118, BTG2119, RPS6120, KLF15121, KLF3122, ZFP36123, ETV5124, TLE3125, NNMT (nicotinamide N-methyltransferase)126, WDTC1127, ZFHX3128, SIAH2129, MBOAT7130, RUNX1T1131, MAPK4132, KLF9133, SELENBP1134, HELZ2135, ELK1136, SERTAD2137, CRTC3138, ABCB11139, TACR1140, SLC22A11141, PER3142, P2RX5143, MFAP5144, FGL1145, OLFM4146, NTN1147, ESR1148, ABCB1149, VAV3150 and LAMB3151 can be used as clinical prognostic biomarkers for obesity. Genes include STAR (steroidogenic acute regulatory protein)152, IL1RN153, AQP5154, EGR1155, SFTPD (surfactant protein D)156, KLF10157, PODXL (podocalyxin like)158, FOXN3159, IL6R160, PBX1161, APOD (apolipoprotein D)162, ACVR2B163, CD34164, INSR (insulin receptor)165, APOA5166, STAR (steroidogenic acute regulatory protein)167, PDK4168, GLS (glutaminase)169, FKBP5170, SLC6A15171, MT2A172, SLC38A4173, AQP7174, ABHD15175, ABCA1176, ZNRF1177, PPP1R3B178, MAOA (monoamine oxidase A)179, UBE2E2180, RNASEK (ribonuclease K)181, PREX1182, DGKG (diacylglycerol kinase gamma)183, POSTN (periostin)184, COMP (cartilage oligomeric matrix protein)185, GAP43186, P2RY12187, SELL (selectin L)188 and DLG2189 were related to type 2 diabetes mellitus. Expression of ERRFI1190, ALOX12191, SOCS5192, DDIT4193, DUSP4194, IL6ST195, DUSP1196, SMAD1197, NCL (nucleolin)198, METTL14199, FMOD (fibromodulin)200, CYGB (cytoglobin)201, UNC5A202 and TAAR9203 are believed to be associated with diabetic nephropathy. Genes include FAP (fibroblast activation protein alpha)204, EYA4205, BCL9206, IRF2BP2207, EGR3208, GADD45B209, DMD (dystrophin)210, LSR (lipolysis stimulated lipoprotein receptor)211, DLL4212, SUN2213, SOS1214, PIK3CA215, GAMT (guanidinoacetate N-methyltransferase)216, RBM47217, HSP90AA1218, GAB1219, S1PR1220, EDNRB (endothelin receptor type B)221, NFKBIA (NFKB inhibitor alpha)222, GJA1223, GADD45G224, PHLDA1225, CMPK2226, FIGN (fidgetin, microtubule severing factor)227, KCNJ2228, ABCC9229, DIRAS3230, EPHX1231, RAB4A232, UBIAD1233, CASQ2234, TTN (titin)235, KCNH1236, JPH2237, OXGR1238, UCHL1239, SERPINA3240, MMP28241, ADAMTS2242, P2RY1243, CSF2RA244, MYO1F245, SELPLG (selectin P ligand)246 and SAMHD1247 have been reported to be associated with cardiovascular disease. Previous studies had shown that the altered expression of genes include MAOB (monoamine oxidase B)248, VEGFC (vascular endothelial growth factor C)249, DBP (D-box binding PAR bZIP transcription factor)250, MYADM (myeloid associated differentiation marker)251, NES (nestin)252, SMURF1253, EDNRB (endothelin receptor type B)254, MUC6255, TOR2A256, TNKS (tankyrase)257, NEDD9258, ASIC1259, ADAMTS8260, DYSF (dysferlin)261, SLC26A9262, SLC45A3263 and KCNQ2264 were closely related to the occurrence of hypertension. Yang et al.265, Zhang et al.266 and Wang et al.267 revealed that genes include SYVN1, BTG1 and CFB (complement factor B) might be the potential targets for diabetic retinopathy diagnosis and treatment. Study indicating that these enriched genes might play important roles in the progression of T1DM.

Construction of PPI network of DEGs may be favorable for understanding the relationship of advancing T1DM. The results of the present investigation might provide potential biomarkers for the diagnosis of T1DM. SMAD9 plays an important role in the development of hypertension268. Our results indicate the importance of this hub gene might be involved in occurrence and development of T1DM. MYC (MYC proto-oncogene, bHLH transcription factor), LNX1, YBX1, FN1, TK1 and ANLN (anillin actin binding protein) are likely to provide new potential biomarkers for clinical practice or treatment of T1DM with further research.

In this investigation, the miRNA-hub gene regulatory network and TF-hub gene regulatory network that regulates T1DM was constructed. CDK1269, hsa-mir-199b-3p270, JUND271 and FOXF2272 are a promising biomarkers in obesity detection and diagnosis. Hsa-mir-106a-5p273, hsa-mir-206274, SMAD4275 and ATF6276 biomarkers were confirmed in type 2 diabetes mellitus progression. Hsa-mir-106a-5p277 and HSF2278 have been shown to promote cardiovascular disease.. Mendes-Silva et al.279 reported that hsa-mir-664a-3p promotes cognitive impairment. Some scholars pointed out that ELF3 was involved in the pathogenesis of diabetic nephropathy280. Previous studies have shown that SRY is involved in the development of hypertension281. Our results showed that these hub genes, miRNAs and TFs are might be involved in progression of T1DM. Together, RNPS1, MAPK3, NEK6, hsa-mir-4677-3p, hsa-mir-3125, hsa-mir-4779, hsa-mir-548az-3p, hsa-mir-5688, hsa-mir-6512-3p, XAB2, KHDRBS1 and RELA might be effective targets in T1DM, but more experimental investigations and clinical trials are needed.

In conclusion, the study used a comprehensive bioinformatics analysis methods to identify DEGs, as well as unique biological functions and pathways of T1DM, thereby enhancing the current understanding of the molecular pathogenesis of T1DM. Moreover, these results might provide potential biomarkers for the initial and proper diagnosis of T1DM, as well as potential therapeutic targets for the advancementof novel T1DM treatments.