Elucidate multidimensionality of type 1 diabetes mellitus heterogeneity by multifaceted information

Type 1 diabetes (T1D) is an autoimmune disease. Different factors, including genetics and viruses may contribute to T1D, but the causes of T1D are not fully known, and there is currently no cure. The advent of high-throughput technologies has revolutionized the field of medicine and biology, and analysis of multi-source data along with clinical information has brought a better understanding of the mechanisms behind disease pathogenesis. The aim of this work was the development of a data repository linking clinical information and interactome studies in T1D. To address this goal, we analyzed the electronic health records and online databases of genes, proteins, miRNAs, and pathways to have a global view of T1D. There were common comorbid diseases such as anemia, hypertension, vitreous diseases, renal diseases, and atherosclerosis in the phenotypic disease networks. In the protein–protein interaction network, CASP3 and TNF were date-hub proteins involved in several pathways. Moreover, CTNNB1, IGF1R, and STAT3 were hub proteins, whereas miR-155-5p, miR-34a-5p, miR-23-3p, and miR-20a-5p were hub miRNAs in the gene-miRNA interaction network. Multiple levels of information including genetic, protein, miRNA and clinical data resulted in multiple results, which suggests the complementarity of multiple sources. With the integration of multifaceted information, it will shed light on the mechanisms underlying T1D; the provided data and repository has utility in understanding phenotypic disease networks for the potential development of comorbidities in T1D patients as well as the clues for further research on T1D comorbidities.


Material and methods
Electronic health records. We used the EHRs from Taiwan National Health Insurance Research Database (NHIRD) in the study. The data gives a nation-wide picture of the medical condition of 99% of 23.78 million Taiwaneses. We analyzed the 2002-2008 hospitalization data and there were a total of 20,603,462 inpatient claims, pertaining to 8,044,512 persons (data approval number: NHIRD-104-293). Each record consists of the birthdate, gender, the date of visit, a main diagnosis and up to 4 side diagnoses, all specified by the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) codes of up to 5 digits. Our hospitalization data consists of 759,683 diabetic inpatients who were diagnosed with 250.×× (T1D by 250.×1 or 250.×3, and T2D by 250.×0 or 250.×2) during 2002-2008 and stayed one or more nights in hospitals for DM treatment. The non-DM inpatients were considered as control inpatients. The numbers and proportions of these DM inpatients across ages and types are listed in Table 1. These data are consistent with the fact that T1D is usually diagnosed before 20 years old, and T2D is usually diagnosed after 40 years old 24 .
Phenotypic disease network. The phenotypic disease network (PDN) is constructed in the form of a network with diseases as the nodes and pairwise comorbidity correlations between diseases as the links, and it can be viewed as a map of the progression of diseases 10 . The comorbidity correlation represents the "distance" between a pair of diseases. We employed the ϕ-correlation, a contingency coefficient, to quantify the comorbidity correlations of T1D inpatients 10 where N is the total number of patients in the population, N i and N j are the prevalence of diseases i and j, and N ij is the number of patients affected by both diseases. The top 0.6% of comorbidity links and their related nodes were selected in each PDN. In the PDN, we divided disease codes into 13 categories (Table S1)  Protein-protein interactions network. Protein-protein interactions (PPIs) are important because biological processes in human bodies are directly controlled in the level of protein proceedings 12 . Connecting the topological properties with biological knowledge will provide us more comprehensive information to understand the biological mechanisms. We aimed to analyze the contribution of the proteins encoding by T1D related genes to the pathogenesis of T1D and discover other key proteins cooperating with them by topological analyses. The candidate genes were converted into the seed proteins to obtain the PPI network from the STRING v.11.0 database 26 , a precomputed database for the exploration of PPIs. Given a list of the proteins as inputs, STRING will search for their neighbor interactors, and generate the PPI network consisting of these proteins and the interactions between them. We constructed the PPI network of T1D under the setting with 0.93 confidence based on active interaction sources from high-throughput lab experiments and previous knowledge in curated databases.
Topological analysis of the PPI network. The topological analyses have been applied extensively in recent years. Degree centrality (DC), betweenness centrality (BC), and closeness centrality (CC) are widely adopted to evaluate nodes in a network 8 . The degree is the count of the direct links of a node in the network. The betweenness is the proportion of the number of shortest paths passing through a node to the number of all shortest paths in the network. The closeness is defined as the reciprocal average length of the shortest paths between a node and all other nodes. In a PPI network a node with high degree is considered as a hub protein, whereas a node with high betweenness is viewed as a bottleneck protein 8,13,27 . A node with high closeness is close to the other nodes in the network. Furthermore, average degree (<k>), diameter (D), mean shortest path length (mspl), and the average clustering coefficient (acc) are viewed as global topological measurements of networks. Average degree (<k>) is the mean of all degree measurements of nodes in a network. Diameter (D) is the largest among all shortest paths in a network. The mean shortest path length (mspl) is calculated by averaging over all shortest paths between all pairs of nodes. The clustering coefficient is a measure of the local interconnectedness of the network. A network is considered as a small-world if it has a low mspl and a high acc 28 . In this study, the PPI network included a giant component and several small separate components derived from seed proteins. The structure of the PPI network was analyzed by Gephi v.0.9.2 29 , a program for analysis and visualization of very large networks.
Retrieval of backbone from the PPI network. We aimed to identify the important proteins and the molecular connectivity between regulatory pathways related to T1D. In the PPI network the nodes with high BC are similar to heavily used intersections which have great influence on flow in the network 13,27 . Because most of the shortest paths in a network go through the high BC nodes, these nodes play as the bottlenecks which have more control over the network. The nodes with top 20% BC were considered as important and these proteins and the links between them were extracted from the giant component to constitute the backbone of the network.
MiRNAs related to T1D. MiRNAs are related to immune system functions and β-cell metabolism, proliferation involving in T1D pathogenesis, and modulate mRNA expressions of the major T1D autoantigens 18,19 . A number of studies show that miRNAs have a vital role in the etiology and pathogenesis of DM and its complications 19,30,31 . Changes in miRNA expression levels in T1D are noticed because the dysregulation of miRNA expression might play important functions in moderating the development of T1D. We searched for miRNA of T1D from the HMDD v3.2 database 32 , a database collecting experiment-supported evidence for human miRNA and disease associations, and connected these miRNAs with our T1D backbone proteins to construct an interaction network.
Ethical approval and consent for participation and publication. This study was approved and granted for exempt review by the Institutional Review Board (IRB) of Tzu Chi Hospital, Hualien, Taiwan, certificated by the Ministry of Health & Welfare, Taiwan (IRB approval number: IRB 104-114-C). All methods were carried out in accordance with relevant guidelines and regulations, and the informed consent was waived.

Results
EHR data and the PDN of T1D. We analyzed the hospitalization data in Taiwan from 2002 to 2008 to draw the PDNs of T1D consisting of 566 nodes with 644 links for male inpatients and 577 nodes with 667 links for female inpatients (Fig. 1). The PDNs of T1D were complex, mainly covering neoplasms (140-239), endocrine, nutritional and metabolic diseases, and immunity disorders (240-279), diseases of the nervous system and the sense organs (320-389), diseases of the circulatory system (390-459), and disease of the digestive system www.nature.com/scientificreports/ (520-579) for both male and female T1D inpatients (Table S4). Additionally, female T1D inpatients tended to have significantly more comorbidities than male T1D inpatients in diseases of the genitourinary system (580-629) by two-sample proportion test. The PDNs of control group consisted of 722 nodes with 2358 links for male inpatients and 769 nodes with 2445 links for female inpatients ( Figure S1). The PDNs of control group mainly covered neoplasms (140-239), diseases of the nervous system and the sense organs (320-389), diseases of the circulatory system (390-459), and disease of the digestive system (520-579) for both male and female control inpatients (Table S4). There was no different between male control inpatients and female control inpatients. When comparing the PDNs of TID inpatients with the PDNs of control inpatients, we found that the male TID inpatients had significantly more comorbidities than male control inpatients in diseases of the skin and subcutaneous tissue (680-709) and endocrine, nutritional and metabolic diseases (240-279). The male T1D inpatients had less comorbidities than male control inpatients in infectious and parasitic diseases (001-139).
Moreover, because T1D is usually diagnosed before the age of 20, we selected 1-20 T1D inpatients to construct the PDN ( Figure S2(A), (B)). For male (or female) inpatients with T1D aged 1-20 years, the PDN has 323 (or 366) nodes and 741 (or 829) links. Both PDNs of male and female inpatients with T1D aged 1-20 years had more comorbid nodes in the same disease categories of 240-279, 320-389, 390-459, and 520-579 (Table S5). There was no different between young male T1D inpatients and young female T1D inpatients. For comparison, we also selected 40-60 T1D inpatients to construct the PDNs ( Figure S2(C), (D)). For male (or female) inpatients with T1D aged 40-60 years, the PDN has 594 (or 544) nodes and 918 (or 953) links. There were more comorbid nodes and links in the 40-60 PDNs than the 1-20 PDNs. Based on the proportion test, both 1-20 PDNs of male and female had significantly less comorbidities in neoplasm category (140-239) than the 40-60 PDNs. The 1-20 PDN of male also had significantly less comorbidities in diseases of the musculoskeletal system and connective tissue (710-739) than the 40-60 PDN of male. The 1-20 PDN of male had significantly more comorbidities in endocrine, nutritional and metabolic diseases, and immunity disorders (240-279) than the 40-60 PDN of male. The details of T1D comorbid diseases for males and females were listed in the  www.nature.com/scientificreports/ PPI network of T1D. We constructed the PPI network under the STRING setting with 0.93 confidence based on active interaction sources from high-throughput lab experiments and previous knowledge in curated databases. The PPI network was generated from 48 genes related to T1D through the STRING database, covering 278 nodes and 919 edges ( Figure S3). The number of edges is significantly larger than the expected for random network of the same size (p-value ≤ 10 −16 ). It contained one giant component and several small separate components derived from seed proteins. The giant network consisted of 230 nodes and 865 links. The characteristics of the giant network, number of nodes (N), average degree (<k>), diameter (D), mean shortest path length (mspl), and average clustering coefficient (acc) were listed in Table S7. Furthermore, nodes with top 20% highest BC, DC, and CC were selected and listed in Table 2, Tables S8 and S9, respectively. Several KEGG pathways 33 are involved in T1D. The nodes of the PPI network demonstrated 39 nodes for the pathway of Th17 cell differentiation, 26 nodes for type 1 diabetes mellitus, 30 nodes for Th1 and Th2 cell differentiation, 30 nodes for NF-kappa B signaling pathway, 31 nodes for Apoptosis, and 28 nodes for TNF signaling pathway (Table S10). The nodes of the PPI network involved in these KEGG pathways were marked in different colors (Fig. 2) and listed in Table S11.
Backbone in the PPI network. In the PPI network, nodes with a BC value of the top 10% (BC ≥ 0.0344) were viewed as bottleneck nodes; nodes with a DC value of the top 10%t (DC ≥ 12) were considered as hubs. According to the result of the topological analysis, CASP3 was a hub (with the largest degree k = 45) and a bottleneck (with the highest BC = 0.219802) in the giant component. CASP3 also had the 6th highest CC value (CC = 0.372358), indicating that CASP3 was located at the center of the giant component. Because proteins with high BC are the heavily used intersections 13 , the top 20% highest BC nodes were then selected as key proteins (Table 2) to constitute the backbone of the giant component. The nodes in the backbone had much control over the nodes in the giant component and were extensively connected with their neighbors. The backbone network consisted of 46 nodes and 147 links ( Figure S4). Among them, 13 proteins, CASP3, TGFB1, SRC, CASP8, UBC, EGFR, SHC1, IGF1R, CBL, PIK3R1, LCK, TNF, and FYN were date-hubs (hub-bottlenecks) which preferentially connect functional modules to each other 34 , whereas CTNNB1, CD4, AGT, APP, TGFBR2, SNCA, AGTR1, CREBBP, PTPN11, and CASP1 were bottlenecks but not hubs. In addition, UBC, UBB, TRAF2, CBLB, UBA52, IKBKG, RPS27A, BIRC2, MAP3K7, and BIRC3 were party-hubs (hub-nonbottlenecks) which preferentially act inside functional modules 34 . All of them worth further investigating the signaling pathways involved in T1D development. The nodes of the backbone network involved in the KEGG pathways were listed in the Table 3.

Discussion
To address a global view of T1D, we analyzed medical claims of Taiwan NHIRD, gene data of the OMIM, protein interactions of the STRING, pathway information of the KEGG, and miRNA data of the HMDD. We developed a data repository linking medical claims and interactome studies in T1D. The main limitation of our study was not knowing where the T1D inpatients were in their disease course, e.g. at initial diagnosis or the number of years having suffered from T1D. Studies like TEDDY 40 are designed to study disease initiation and progression to the development of clinical T1D. The inferred results of our work using medical claims can be regarded as relevant but not causal. By multifaceted information including genes, proteins, miRNAs, pathways, and medical claims, we elucidated multidimensionality of T1D heterogeneity and provided some hints for further research.
Both the PDNs of T1D males and females had more nodes in neoplasms (140-239), endocrine, nutritional and metabolic diseases, and immunity disorders (240-279), diseases of the nervous system and the sense organs (320-389), diseases of the circulatory system (390-459), and disease of the digestive system (520-579). Additionally, female inpatients have more comorbidities in the diseases of the genitourinary system (580-629). Furthermore, anemia caused by chronic renal disease, hypertensive renal disease, peripheral arterial disease with ulceration, nephritis and nephropathy, and chronic renal failure were correlated to T1D in men and women. Women with T1D aged 1-20 years old had comorbidity of peripheral angiopathy, whereas men with T1D aged www.nature.com/scientificreports/ 1-20 years old had comorbidities of cholesteatoma, orthostatic hypotension, decubitus ulcer, and osteomyelitis. Comorbid diseases of T1D such as anemia, atherosclerosis in peripheral vessels, cellulitis and abscess, decubitus ulcer, osteomyelitis, renal diseases, ureteral diseases, and vitreous diseases were reported in previous researches 1, 10,11 . Nevertheless, cholesteatoma, orthostatic hypotension, and nonunion of fracture were novel comorbid diseases of T1D.  Figures S1, S2), CASP3 is a hub and bottleneck protein. CASP3, one downstream effector caspase, and CASP8, one upstream initiator caspase, are members of caspases family and are both in the T1D backbone network. Caspases as cysteine-aspartyl specific proteases play key roles in β-cell apoptosis which is a fundamental process involved in the pathogenesis of T1D 41 . CASP3-mediated β-cell apoptosis is a necessary condition for T-cell priming which is a key initiating event to T1D 42 . CASP8 is critical for β-cell apoptosis in T1D and T2D and in maintaining β-cell mass and insulin secretion under physiological conditions 43 .
Transforming growth factor beta (TGF-β) is one of the factors involved in the cellular growth, differentiation and migration, and apoptosis 44 . TGFB1 could be a response to immuno-inflammatory activation present at the onset of T1D 45 . It displays potent strong immunomodulatory activity and is involved in the control of autoimmune diseases, therefore it may aid in development of therapeutics to prevent the onset of autoimmunity, including T1D 44 .
CD4 belongs to the family of CD (cluster of differentiation) antigens which are cell surface-expressed antigens defined by monoclonal antibodies providing targets for immunophenotyping of cells 46 . It has an impact on immune function and carcinogenesis and is called T helper cell. The antigen is associated with a number of autoimmune diseases such as vitiligo and T1D 46 . In an animal study, TGF-β1 derived from T cells acts on diabetogenic CD4 + T cells, but not Foxp3 + Treg cells, to control Th1 cell differentiation and spontaneous T1D development 47 . Figure 2. The KEGG pathways in the PPI network of T1D. The nodes involved in Th17 cell differentiation (hsa04659), type 1 diabetes mellitus (hsa04940), Th1 and Th2 cell differentiation (hsa04658), NF-kappa B signaling pathway (hsa04064), apoptosis (hsa04210), and TNF signaling pathway (hsa04668) were colored in blue, red, green, yellow, pink, and orange, respectively. The KEGG information were from KEGG database 22,23 . The PPI network was generated using STRING v11.0 26 (https:// string-db. org/). www.nature.com/scientificreports/ TNF is the cytokine which is mainly secreted by macrophages. Tumor necrosis factor-alpha (TNF-α) as a proinflammatory cytokine, participates in the regulation of several biological processes including cell apoptosis. It is suspected to relate to T1D pathogenesis 48 . Serum TNF-α level in T1D patients has significantly elevated among all age, disease duration and ethnicity groups. In addition, TNFRSF1A, a ubiquitous membrane receptor binding TNF-α, is associated with chronic renal failure 49 , which had comorbidity correlation with T1D in the PDNs. Circulating level of TNFRSF1A is considered as a biomarker of glomerular filtration rate decline in T1D patients.
Recently, studies on the renin angiotensin system (RAS) have shed light on the contribution of the RAS to the complications of T1D. The RAS includes circulating renin, acting on angiotensinogen (AGT) to produce angiotensin I (Ang I) and peptide angiotensin II (Ang II) 50 . An increase in the expression of AGT mRNA and in the Ang II synthesis may contribute to the glomerular sclerosis observed in diabetic nephropathy 51 . The IGF1R is generally considered as a growth factor receptor, and has important metabolic effects in many organisms. Once activated, the IGF1R will lead to glucose and lipid metabolism 52 . The IGF1R is involved in several signaling pathways (MAPK signaling pathway, PI3-kinase/PKB pathway, etc.), therefore it is related to T1D.
In the PPI network, six pathways related to T1D pathogenesis were marked in different colors. Three of them were also related to chronic renal failure, which had comorbidity correlations with T1D in the PDNs, including Th17 cell differentiation 53 , NF-kappa B signaling pathway 54 , and TNF signaling pathway 55 . Moreover, plenty of diseases pathways, such as pathways in cancer, toxoplasmosis, HTLV-1 infection, herpes simplex infection, Kaposi's sarcoma-associated herpesvirus infection, Chagas disease (American trypanosomiasis), inflammatory bowel disease (IBD), Epstein-Barr virus infection, tuberculosis, allograft rejection, measles, Leishmaniasis, viral myocarditis, hepatitis B, influenza A, chronic myeloid leukemia, and autoimmune thyroid disease were involved in T1D PPI network (Table S10). Most of them have been confirmed by previous studies to be related to T1D [56][57][58][59] .
Several recent studies have identified some key genes and pathways for T1D using documented genes in literature 60 , selecting differentially expressed genes from microarray data 4,61 , combining GWAS statistics with gene expression profiles 62 , or mining RNA-seq datasets 63 . All these studies indicated that the genes and pathways involved in the immune system were key to the progression of T1D, which was consistent with our findings. Immune-linked biological pathways such as apoptosis, cytokine-cytokine receptor interaction, regulation of immune response, etc. were commonly highlighted, although the identified genes were somewhat different for different research purposes. Some solely focused on T1D-assocated genes and pathways 60,62 , another aimed at the interaction between peripheral blood mononuclear cells and pancreatic β-cells 61 , and the others were interested in T1D, multiple sclerosis, and other autoimmune diseases 4,63 . Compared to them, we investigated the genes and pathways related to T1D as well as chronic renal failure that had comorbidity correlations with T1D in our PDNs.

Conclusion
Advances in science and technology have enabled the study of biological systems in their integrity. By understanding the multifaceted complexity of biological systems, predicting the risk of disease development and controlling disease progression to prevent complications will be realized. We used the medical records of Taiwan NHRID and online databases of genes, proteins, and miRNAs to have a systemic view of T1D. There were common comorbid diseases such as anemia, hypertension, vitreous diseases, renal diseases, and atherosclerosis; whereas there were novel comorbid diseases such as cholesteatoma, orthostatic hypotension, and nonunion of fracture in the PDN. We constructed the PPI network derived from T1D related genes. In the PPI network, CASP3 is a date-hub protein involved in apoptosis, TNF signaling, and MAPK pathways. The protein TNF is also a date-hub protein involving T1DM, NFKB, apoptosis, cytokine-cytokine, TCR, TLR, TNF signaling, and MAPK pathways. Moreover, we connected T1D related miRNAs with backbone proteins to understand the relationships between miRNAs and backbone proteins. CTNNB1, IGF1R, and STAT3 were hub proteins, whereas miR-155-5p, miR-34a-5p, miR-23-3p, and miR-20a-5p were hub miRNAs in the gene-miRNA network. Multiple levels of information including genetic, protein, and clinical data resulted in multiple results, which suggests the complementarity of multiple sources. With the integration of multifaceted information, it will shed light on the mechanisms underlying T1D; the provided data and repository has utility in understanding phenotypic disease networks for the potential development of comorbidities in T1D patients as well as the clues for further research on T1D comorbidities. Figure 3. The interaction network between T1D-related miRNAs (blue) and backbone-proteins (red). The interaction network was generated using Gephi v0.9.2 29 (https:// gephi. org/).