Exploring celiac disease candidate pathways by global gene expression profiling and gene network cluster analysis

Celiac disease (CeD) is a gastrointestinal autoimmune disorder, whose specific molecular basis is not yet fully interpreted. Therefore, in this study, we compared the global gene expression profile of duodenum tissues from CeD patients, both at the time of disease diagnosis and after two years of the gluten-free diet. A series of advanced systems biology approaches like differential gene expression, protein–protein interactions, gene network-cluster analysis were deployed to annotate the candidate pathways relevant to CeD pathogenesis. The duodenum tissues from CeD patients revealed the differential expression of 106 up- and 193 down-regulated genes. The pathway enrichment of differentially expressed genes (DEGs) highlights the involvement of biological pathways related to loss of cell division regulation (cell cycle, p53 signalling pathway), immune system processes (NOD-like receptor signalling pathway, Th1, and Th2 cell differentiation, IL-17 signalling pathway) and impaired metabolism and absorption (mineral and vitamin absorptions and drug metabolism) in celiac disease. The molecular dysfunctions of these 3 biological events tend to increase the number of intraepithelial lymphocytes (IELs) and villous atrophy of the duodenal mucosa promoting the development of CeD. For the first time, this study highlights the involvement of aberrant cell division, immune system, absorption, and metabolism pathways in CeD pathophysiology and presents potential novel therapeutic opportunities.

The genetic liability of CeD is supported by the involvement of both HLA (40%) and non-HLA genes (60%) in its etiology 9 . The HLA variants (DQA1 and DQB1), encode two antigens related to CeD, of which HLA-DQ2 antigen is found in 90% of CeD patients and is associated with stronger gluten-specific T helper cell response 10 . The second antigen HLA-DQ8 is found in the remaining patients. Interestingly, 30-40% of the general population carry these risk alleles but do not present any CeD symptoms when exposed to dietary gluten. This suggests that HLA-DQ2 or HLA-DQ8 alleles act as a prerequisite but not determine the development of CeD in individuals. Hence, non-HLA genes are assumed to play a critical role in the disease pathogenesis 11 . Early genome-wide association studies (GWAS) conducted on CeD have discovered that non-HLA genes like IL2 and IL21, which are involved in T cell maturation, can modulate the risk of disease development in genetically susceptible individuals [12][13][14] . Since then, several follow-up population genetics and in-vitro functional studies have also underlined the potential molecular crosstalk between HLA and non-HLA risk alleles, genetic expression and epigenetic changes, which subsequently triggers the cascade of autoimmune reactions critical to the development of CeD [15][16][17][18] .
The genetic etiology of CeD is so far widely studied by different genetic approaches like candidate gene sequencing, exome sequencing, SNP genotyping and epigenetic screening 16,[19][20][21][22] . However, compared to the above-mentioned genotyping approaches, there are very few gene expression studies which have assessed the contribution of genes to the pathophysiology of CeD. Moreover, those gene expression studies have only used basic statistical methods to explore the up or down expressed genes. The noise and bias of gene expression measurements and regulation of gene expression at post transcription level pose an additional challenge to interpret the actual role of individual or group of genes in celiac disease. Therefore, combining the gene expression measurements with protein-protein interactions (PPIs) and pathway analysis will provide a deeper insight into gene expression induced CeD development.
Thus, we conducted this first systems biology study to compare the gene expression profile of duodenum tissue samples of celiac patients at diagnosis and after restricted gluten-free diet. This study characterized the protein interactions and molecular pathways involving several differentially expressed genes (DEGs) and provided a global view of gene expression changes critical to CeD pathogenesis, which presents potential therapeutic avenues for future research.

Materials and methods
Gene datasets sources. Gene expression changes in CeD patients were compared in different conditions; at disease diagnosis, post-gluten-free dietary management as well as after in-vitro gliadin challenge. The gene expression profiles from the above mentioned three conditions were downloaded from the public domain Array Express-functional genomics data (https ://www.ebi.ac.uk/array expre ss/). These gene expression profiles were generated on Affymetrix Human Genome U133 Plus 2.0 Array, GPL570 platform (Affymetrix, Santa Clara, CA USA). The full details about tissue processing, RNA isolation, hybridization of arrays can be found in the original research article 23 .
The gene expression profile of duodenum tissue biopsies after two years of gluten-free diet (n = 9, control samples) was compared to two different gluten exposure conditions. The first one is at disease diagnosis (chronic exposure, test samples, n = 9), and the corresponding dataset Array Express ID is E-MEXP-1828. The diagnosis was based on positive CeD-associated antibodies and a histological classification of intestinal villi was done according to Marsh staging grade 3b or c changes (villous atrophy). The second condition is in-vitro gliadin challenge (acute exposure, test samples, n = 9), and its corresponding dataset Array Express ID is E-1823 24 .
Data processing. Preprocessing of gene expression data sets was performed using R package (https :// www.r-proje ct.org) 25 . To standardize and reduce the technical noise in the sample data, raw intensity signals in the CEL file format were loaded into the Bioconductor-Affy package and the raw signal values of each sample set were standardized to a median of all samples using the Robust Multiarray Average (RMA) algorithm by baseline 25,26 . This algorithm normalizes the raw signals by generating a matrix of expression from the data with context correction and log 2 conversion followed by quantile normalization. DEGs screening. Limma package (https ://bioco nduct or.org/packa ges/relea se/bioc/html/limma .html) was used to obtain the required tools to analyze DEGs with t-test 27 . False discovery rate (FDR) was calculated using Benjamini & Hochberg method 28 . The logFC cut off value for DEGs was |log FC|> 1.5, and the FDR was < 0.01 while p-value was < 0.05 29 . Heatmap was generated for each dataset using Heatmap online software (https :// www.heatm apper .ca) to represent significant DEGs. PPI construction, cluster networks and hub genes identification. The DEGs were classified into up-and down-regulated genes and then analyzed in STRING database (https ://strin g-db.org) for detecting differences in the PPI network 30 . The STRING selection is based on different parameters of direct and indirect interactions. Statistical information about each PPI network was obtained using STRING. The maximum PPI enrichment p-value was < 1.0 × 10 -16 and the minimum average local clustering coefficient was > 0.4. Both Upand down-regulated PPI networks were visualized using Cytoscape 3.7.1 software 31 . Molecular Complex Detection (MCODE) tool was used to screen out clusters of PPI networks with the following parameters, degree cutoff of 2, node score cutoff of 0.2, k-core = 2, and max depth of 100 32 . Genes with the highest MCODE scores were identified as hub genes by Cytoscape plug-in cytoHubba. www.nature.com/scientificreports/ using functional analysis modules of ClueGo and Cluepedia tools. GO annotations interpret the association of gene products to biological process (BP), molecular function (MF), cellular component (CC), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways 33,34 (https ://www.kegg.jp/kegg/kegg1 .html) and immune system processes (ISP) [35][36][37] . The selection criteria included a minimum of 3 genes in the cluster with GO tree interval range in between 3 and 8 and a kappa score of 0.4 for pathway network connectivity 38,39 . The Bonferroni stepdown (pV correction) method with two-sided hypergeometric test option was selected for statistical assessment.
With the aforementioned parameters we have chosen GO term fusion and restriction for creating ClueGO category network based on network overlapping at a statistical significance of P < 0.05.

Results
Data processing and DEGs screening. The comparison of expression profiles between CeD at the time of diagnosis and after two years of gluten-free diet condition revealed the differential expression of 299 genes (corresponding to 425 probes), including 106 upregulated and 193 downregulated genes. Top five DEGs are presented in Table 1. Among the 106 upregulated genes, LPL has the highest LogFC of 4.36. Similarly, APOA1 has the lowest LogFC value of -4.34 among 193 downregulated genes. The volcano plot represents the log2FC and the heatmap shows the DEGs in all the samples (Fig. 1). On the contrary, gluten-free diet versus in-vitro gliadin challenge analysis showed that global gene expression changes were less than 1.5 folds and insignificant, hence they were omitted from further analysis. The significant DEGs (more than 1.5 folds) from at the time of diagnosis versus gluten-free diet groups were selected for further analysis (Supplementary data Figure S1). The gene ontology analysis of upregulated DEGs showed their significant enrichment in two broad groups namely cell cycle regulation and immune system function, under biological processes ontology source (Figures S2, S3). Gene expression changes in cell components were mainly enriched in the spindle, midbody, condensed chromosome kinetochore, and centromeric region, which are involved in cytokinesis processes at the end of cell division (Supplementary data Figure S4). In molecular function annotation, gene expression alterations were associated with regulation of enzyme activities of endopeptidase, peptidase, and cysteine-type endopeptidase, which are mainly involved in activating cell-mediated immunity, autoimmune and inflammatory responses (Supplementary data Figure S5). The KEGG analysis revealed that DEGs were connected to cell cycle, p53 signalling pathway and apoptosis, where dysfunction of p53 and apoptosis are known for their association with autoimmunity 33,34 (Supplementary data Figure S6). Further classification of all upregulated DEGs under GO ontology source revealed their significant enrichment in immune system processes. Their pathway enrichment analysis showed that response to interferon-gamma, regulation of T-cell proliferation, antigen processing, presentation of exogenous peptide antigens, NOD-like receptor signalling, Th1 and Th2 cell differentiation, IL-17 signalling pathway were branch end terms ( Fig. 2 and Supplementary data Tables S1, S2).

PPI networks of up
GO analysis of down-regulated DEGs showed their relation to metabolic and transport processes of a variety of molecules (Fig. 3). Some BP annotations include cellular lipid catabolism processes involved in lipid breakdown, and detoxification of inorganic compounds (Supplementary data Figure S7). MF annotations include symporter activity, which enables active transporting across the membrane and secondary active transmembrane transporter activity, which is a wider term involving solute transportation across the membrane (Supplementary data table S3 and Figure S8). The CC annotations included apical plasma membrane which is the microvilli surface of the lumen and cluster of actin-based cell projections, which form the microvilli of the small intestine. www.nature.com/scientificreports/ www.nature.com/scientificreports/ www.nature.com/scientificreports/ KEGG pathways highlighted mineral absorption, drug metabolism, vitamin digestion and absorption ( Fig. 4 and Supplementary data S9, S10).

Cluster networks and hub genes identification using MCODE scores. Protein interaction network
clusters are a group of proteins with great functional similarity than proteins in different clusters, whereas hub genes are functionally significant interconnected nodes in a cluster. MCODE is a Cytoscape plugin that searches for clusters (highly interconnected regions) in a protein interaction network. The PPI network analysis of up and down-regulated DEGs revealed two significant cluster networks from each category (MCODE score of > 5 GO annotations of network clusters. The top cluster networks from MCODE were used as input for analyzing the PPI functional enrichment maps using ClueGo and CluePedia plugins. Tables 2 and 3 shows, highly significant GO annotation clusters with an p-value of < 1.35 × 10 -2 . The cluster 1 from upregulated DEGs network in BP ontology source has projected mitotic nuclear division and sister chromatid segregation as top GO terms. In MF ontology source, the top GO term was cyclin-dependent protein serine/threonine kinase regulator activity. For CC ontology source, the significant GO terms were related to kinetochore and spindle microtubule. KEGG pathway ontology source included cell cycle and p53 signalling pathway as significant GO terms, whereas cluster-2 was related to immune system processes. From BP ontology source, the top GO terms were cellular response to interferon-gamma and its interferon-gamma signalling pathway. These two GO terms were also seen to be significant under ISP ontology source. MF ontology source highlighted CXCR chemokine receptor binding especially CXCR3 as top GO terms. Cluster-1 of downregulated DEGs showed that the genes in this cluster were particularly related to mineral absorption and detoxification. The BP ontology source highlighted the detoxification of inorganic compound and stress response to metal ions as top GO terms. While the KEGG ontology source identified mineral absorption pathway as the significant GO term. The cluster-2 (from downregulated DEGs) was related to metabolism and absorption of diverse sets of molecules. BP highlighted GO terms like terpenoid metabolic process which is an organic compound and intestinal absorption. MF ontology source showed modified amino acid transmembrane transporter activity and dicarboxylic acid transmembrane transporter activity as top GO terms. CC ontology source has highlighted lipid absorption and metabolism-related GO annotations including chylomicron which are responsible for lipid transport and very-low-density lipoprotein particle. KEGG underlined GO terms like vitamin digestion and absorption as well as cholesterol metabolism.

Discussion
CeD is a complex multifactorial enteropathy where transglutaminase-deamidated gliadin peptides act as just initial event, but the actual anatomical and histological presentation of the disease is determined by multiple genomic and proteomic alterations taking place in a complex biological network 24 . Thus, global gene expression, which involves studying expression changes in both immune response genes as well as non-immune response genes controlling the gliadin peptide recognition is an attractive strategy to identify the potential molecular pathological networks involved in CeD development. Several gene expression studies have investigated biological pathways essential for the development of CeD in intestinal tissues 40,41 and specific cell types 42 . By integrating gene expression data with protein interaction network concepts, this study has identified the contribution of dysregulated immune system genes in the intestinal mucosa of CeD. Furthermore, this study reports that gene expression alterations in pathways connected to cell division regulation may have a compensatory role to contain the intestinal mucosal injury due to prolonged autoimmune responses. The additional noteworthy finding is related to impeded absorption, metabolism, and transportation of mineral and vitamins in the intestinal tissues, which eventually increases the likelihood of malnutrition alongside the role of villus atrophy in CeD 43 .
GO annotations interpret the association of gene products to certain pathways from published works on disease etiology and development 44 . Majority of the annotations are enriched in the up-and down-regulated PPI clusters represent the most interacting group of genes amongst the whole PPI networks; especially, hub genes, which showed highest connectivity and correlation to their modules. Diverse pathways of hub genes connected to dysregulation of the immune system in intestinal duodenum tissues were enriched in the overexpressed genes and subsequently in PPI networks and its functional clusters. In the upregulated DEGs, KEGG pathway (https ://www.kegg.jp/kegg/kegg1 .html) identified the significant enrichment of signalling pathways like NOD-like receptors (NLRs) and Toll-like receptors (TLRs). Both NOD-like and Toll-like receptors take part in mediating immune recognition by initiating innate immunity and activating adaptive immunity. Specifically, NLRP3 inflammasome (a member of NLRs family) is associated with innate immunity in response to the wheat protein in CeD knockout mice 45 . Other enriched pathways included genes controlling TNF and IL-17 signalling responses, as well as Th1, Th2 and Th17 differentiation. CD4 + T cells differentiation is directly correlated to autoimmunity, and it is induced by IFNγ and other cytokines including IL-17 and TNF protein 46 . This differentiation is essential for cytotoxic T lymphocyte activation, leading to intestinal epithelial cell destruction and villus atrophy 47  www.nature.com/scientificreports/ GO annotations of the immune-related module included signalling pathway of interferon-gamma (IFNγ), a major proinflammatory cytokine implicated in CeD, is well known for its role in regulating immune responses to infections and autoimmune diseases. IFNγ is also known to be very essential for the development of histopathological changes like villus atrophy, crypt hyperplasia in intestinal mucosa and production of CeD-associated antibodies, which mounts a strong adaptive immune response to develop CeD 47 . The additional key pathway enriched is chemokine signalling PPI cluster, which consists of CXCL9, CXCL10 and CXCL11 as hub genes. Another important hub gene from the immune-related module is STAT1, which is a direct activator of IFNstimulated cells 48 . STAT1 has been previously associated with type 1 diabetes, which is caused by pancreatic β-cells destruction via cytokine-mediated apoptosis. Moreover, JAK2 gene, one of the gene from our upregulated PPI network, was previously reported to be overexpressed in intestinal tissues of adults and children CeD patients 49 . JAK2 is also critical for interleukin 12 (IL-12) signalling, whose production is attributed to several hub genes of this module such as interferon regulatory factors genes (IRF1, IRF8 and IFNG). Both IFNG and IL-12 contribute to T-helper1 cell differentiation and pathogenesis of systemic lupus erythematosus 50 . This suggests www.nature.com/scientificreports/ that dysregulated JAK-STAT cytokine signaling pathway mediates cascade of autoimmune reactions in CeD and other co-autoimmune conditions 51 . Another major finding from upregulated cluster through KEGG pathway enrichment analysis includes cell cycle and p53 signalling pathways 33,34 , both of which are known play key role in the activation of intestinal mucosal cellular division and apoptosis 52 . The hub gene GTSE1 negatively regulates the p53 activity, hence it controls the downstream effects of p53 signalling pathway mediated cell cycle 53 . The Cyclin B2 (CCNA2) hub gene is directly involved in G2/M transition phase during the cell cycle and delays the cellular senescence and apoptosis by p53 54 . Other upregulated pathways reported in dietary gluten restricted mouse model of CeD are apoptosis and DNA repair in lamina propria and epithelium of the small intestine 47 . Upregulation of cell division related processes is thought to be a compensatory mechanism to the continuous apoptosis. The persistent apoptosis without sufficient cellular regeneration, causes villus atrophy of intestinal tissues, subsequently leading to malabsorption, a known complication in CeD patients 55 . The increased cellular division and abnormal activation of the immune system findings derived from the annotations of the upregulated PPI network and its clusters are consistent with the results of previous gene expression studies on CeD 24,56 .
The downregulated PPI network cluster results highlights the contribution of impaired homeostasis, digestion, metabolism and absorption pathways in CeD. Of these network clusters, mineral absorption pathway alterations including iron, copper, magnesium and zinc deficiencies are common clinical manifestations seen in CeD patients 57 . This is finding is supported by the identification of the metallothionein genes as hub genes in the first downregulated clusters, which are involved in heavy metal homeostasis 58 . Another identified pathway is vitamin digestion and absorption, enriched by the SLC19A1, SLC46A, and other hub genes in the second downregulated module. Downregulation of this pathway could explain a common CeD clinical symptom-the www.nature.com/scientificreports/ multivitamin deficiency 57 . Along with the impaired vitamin absorption, folate (B9) is mainly absorbed in the duodenum, which is affected by villous atrophy, making the CeD patients five times more susceptible to folate deficiency than normal individuals. Lastly, cholesterol metabolism, fat digestion and absorption pathways are enriched in downregulated hub genes like APOA1, APOA4, and CD36. APOA1 is the major component of highdensity lipoprotein (HDL) which is strongly associated with coronary heart disease (CHD) 59,60 . Both low HDL levels and risk of CHD have been reported in CeD patients 61

conclusion
This study highlights the utility of diverse system biology approaches for studying the gene expression profile of duodenum tissues to gain a comprehensive understanding about the underlying molecular mechanisms of CeD. Key pathways connected to potential biological events like (a) dysregulated immune system processes (NOD-like receptor signalling pathway, Th1 and Th2 cell differentiation, IL-17 signalling pathway), (b) loss of regulated cell division (cell cycle, p53 signalling pathway) and (c) impaired absorption (mineral and vitamin digestion and absorption as well as drug metabolism) were identified through protein interaction networks. All those pathways are connected to an increased number of intraepithelial lymphocytes (IELs) and villous atrophy of the duodenal mucosa. Validation of these biological pathways through functional studies could further confirm the present study findings. Furthermore, functional studies can then be utilized to identify the sensitive biomarker panel for diagnosis, prognosis, and novel drug targets for CeD.