Introduction

Celiac disease (CeD) is a gluten-induced autoimmune disease seen in genetically susceptible people1. It is estimated to be prevalent in 1% of the world population2,3. CeD patients exhibit severe gastrointestinal symptoms such as diarrhoea, bloating, and abdominal pain following gluten consumption which is commonly found in wheat, rye and barely4. Other manifestations of the disease involve malabsorption and anaemia, which are consequences of the villus atrophy in small intestine4,5. Adopting a gluten-free diet results in the clinical and histological improvements in patients. However, a substantial portion of the patients exhibit symptoms and persistent villus atrophy even after dietary management6,7. Patients with CeD demonstrate other autoimmune diseases such as type 1 diabetes, thyroid disease, multiple sclerosis and inflammatory bowel disease, more frequently (5%) than healthy individuals8. Several factors like genetic background, autoimmunity, environment (gluten as the main factor) and gut microbiome are mainly implicated in the etiology of CeD.

The genetic liability of CeD is supported by the involvement of both HLA (40%) and non-HLA genes (60%) in its etiology9. The HLA variants (DQA1 and DQB1), encode two antigens related to CeD, of which HLA-DQ2 antigen is found in 90% of CeD patients and is associated with stronger gluten-specific T helper cell response10. The second antigen HLA-DQ8 is found in the remaining patients. Interestingly, 30–40% of the general population carry these risk alleles but do not present any CeD symptoms when exposed to dietary gluten. This suggests that HLA-DQ2 or HLA-DQ8 alleles act as a prerequisite but not determine the development of CeD in individuals. Hence, non-HLA genes are assumed to play a critical role in the disease pathogenesis11. Early genome-wide association studies (GWAS) conducted on CeD have discovered that non-HLA genes like IL2 and IL21, which are involved in T cell maturation, can modulate the risk of disease development in genetically susceptible individuals12,13,14. Since then, several follow-up population genetics and in-vitro functional studies have also underlined the potential molecular crosstalk between HLA and non-HLA risk alleles, genetic expression and epigenetic changes, which subsequently triggers the cascade of autoimmune reactions critical to the development of CeD15,16,17,18.

The genetic etiology of CeD is so far widely studied by different genetic approaches like candidate gene sequencing, exome sequencing, SNP genotyping and epigenetic screening16,19,20,21,22. However, compared to the above-mentioned genotyping approaches, there are very few gene expression studies which have assessed the contribution of genes to the pathophysiology of CeD. Moreover, those gene expression studies have only used basic statistical methods to explore the up or down expressed genes. The noise and bias of gene expression measurements and regulation of gene expression at post transcription level pose an additional challenge to interpret the actual role of individual or group of genes in celiac disease. Therefore, combining the gene expression measurements with protein–protein interactions (PPIs) and pathway analysis will provide a deeper insight into gene expression induced CeD development.

Thus, we conducted this first systems biology study to compare the gene expression profile of duodenum tissue samples of celiac patients at diagnosis and after restricted gluten-free diet. This study characterized the protein interactions and molecular pathways involving several differentially expressed genes (DEGs) and provided a global view of gene expression changes critical to CeD pathogenesis, which presents potential therapeutic avenues for future research.

Materials and methods

Gene datasets sources

Gene expression changes in CeD patients were compared in different conditions; at disease diagnosis, post-gluten-free dietary management as well as after in-vitro gliadin challenge. The gene expression profiles from the above mentioned three conditions were downloaded from the public domain Array Express—functional genomics data (https://www.ebi.ac.uk/arrayexpress/). These gene expression profiles were generated on Affymetrix Human Genome U133 Plus 2.0 Array, GPL570 platform (Affymetrix, Santa Clara, CA USA). The full details about tissue processing, RNA isolation, hybridization of arrays can be found in the original research article23.

The gene expression profile of duodenum tissue biopsies after two years of gluten-free diet (n = 9, control samples) was compared to two different gluten exposure conditions. The first one is at disease diagnosis (chronic exposure, test samples, n = 9), and the corresponding dataset Array Express ID is E-MEXP-1828. The diagnosis was based on positive CeD-associated antibodies and a histological classification of intestinal villi was done according to Marsh staging grade 3b or c changes (villous atrophy). The second condition is in-vitro gliadin challenge (acute exposure, test samples, n = 9), and its corresponding dataset Array Express ID is E-182324.

Data processing

Preprocessing of gene expression data sets was performed using R package (https://www.r-project.org)25. To standardize and reduce the technical noise in the sample data, raw intensity signals in the CEL file format were loaded into the Bioconductor-Affy package and the raw signal values of each sample set were standardized to a median of all samples using the Robust Multiarray Average (RMA) algorithm by baseline25,26. This algorithm normalizes the raw signals by generating a matrix of expression from the data with context correction and log2 conversion followed by quantile normalization.

DEGs screening

Limma package (https://bioconductor.org/packages/release/bioc/html/limma.html) was used to obtain the required tools to analyze DEGs with t-test27. False discovery rate (FDR) was calculated using Benjamini & Hochberg method28. The logFC cut off value for DEGs was |log FC|> 1.5, and the FDR was < 0.01 while p-value was < 0.0529. Heatmap was generated for each dataset using Heatmap online software (https://www.heatmapper.ca) to represent significant DEGs.

PPI construction, cluster networks and hub genes identification

The DEGs were classified into up- and down-regulated genes and then analyzed in STRING database (https://string-db.org) for detecting differences in the PPI network30. The STRING selection is based on different parameters of direct and indirect interactions. Statistical information about each PPI network was obtained using STRING. The maximum PPI enrichment p-value was < 1.0 × 10–16 and the minimum average local clustering coefficient was > 0.4. Both Up- and down-regulated PPI networks were visualized using Cytoscape 3.7.1 software31. Molecular Complex Detection (MCODE) tool was used to screen out clusters of PPI networks with the following parameters, degree cutoff of 2, node score cutoff of 0.2, k‐core = 2, and max depth of 10032. Genes with the highest MCODE scores were identified as hub genes by Cytoscape plug-in cytoHubba.

Functional annotations of cluster networks

Both up- and down-regulated (PPI networks and network clusters) genes were provided as an input to Cytoscape 3.7.1 software for recognizing GO terms and pathways using functional analysis modules of ClueGo and Cluepedia tools. GO annotations interpret the association of gene products to biological process (BP), molecular function (MF), cellular component (CC), Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways33,34 (https://www.kegg.jp/kegg/kegg1.html) and immune system processes (ISP)35,36,37 . The selection criteria included a minimum of 3 genes in the cluster with GO tree interval range in between 3 and 8 and a kappa score of 0.4 for pathway network connectivity38,39. The Bonferroni step-down (pV correction) method with two-sided hypergeometric test option was selected for statistical assessment. With the aforementioned parameters we have chosen GO term fusion and restriction for creating ClueGO category network based on network overlapping at a statistical significance of P < 0.05.

Results

Data processing and DEGs screening

The comparison of expression profiles between CeD at the time of diagnosis and after two years of gluten-free diet condition revealed the differential expression of 299 genes (corresponding to 425 probes), including 106 upregulated and 193 downregulated genes. Top five DEGs are presented in Table 1. Among the 106 upregulated genes, LPL has the highest LogFC of 4.36. Similarly, APOA1 has the lowest LogFC value of -4.34 among 193 downregulated genes. The volcano plot represents the log2FC and the heatmap shows the DEGs in all the samples (Fig. 1). On the contrary, gluten-free diet versus in-vitro gliadin challenge analysis showed that global gene expression changes were less than 1.5 folds and insignificant, hence they were omitted from further analysis. The significant DEGs (more than 1.5 folds) from at the time of diagnosis versus gluten-free diet groups were selected for further analysis (Supplementary data Figure S1).

Table 1 Top five differentially expressed genes (DEGs) in intestinal duodenum tissues at the time of CeD diagnosis versus post-gluten free dietary management.
Figure 1
figure 1

The differentially expressed genes (DEGs) analysis of duodenum tissue at the time of CeD diagnosis in comparison to gluten restricted dietary managament. (A) Volcano plots of Log fold change of gene expression. (B) Heatmap of the DEGs with a LogFC > 1.5. Red: up-regulation; green: down-regulation. (C) Circos view of localization of DEGs on chromosomes (first track-chromosome number, second track- DEGs, Third track Up (Red) and Down (Blue) genes) (Circos figure generated using: https://marianattestad.com/chordial).

PPI networks of up and down regulated genes

PPI networks highlight the physical contacts among protein partners. They are critical in most basic molecular mechanisms involved in cellular function but are often perturbed in disease states. The PPI networks of upregulated DEGs included 103 nodes connecting 664 edges with a clustering coefficient of 0.531 and network centralization of 0.221. While the downregulated PPIs included 188 nodes connecting 444 edges with a clustering coefficient of 0.256 and network centralization of 0.120 (Fig. 2).

Figure 2
figure 2

Overview of PPI network constructed using Cytoscape STRING database. (A) upregulated (B) down regulated PPI network, the density of the network nodes is based on string confidence score > 0.7 (Network Figures generated using https://cytoscape.org/).

The gene ontology analysis of upregulated DEGs showed their significant enrichment in two broad groups namely cell cycle regulation and immune system function, under biological processes ontology source (Figures S2, S3). Gene expression changes in cell components were mainly enriched in the spindle, midbody, condensed chromosome kinetochore, and centromeric region, which are involved in cytokinesis processes at the end of cell division (Supplementary data Figure S4). In molecular function annotation, gene expression alterations were associated with regulation of enzyme activities of endopeptidase, peptidase, and cysteine-type endopeptidase, which are mainly involved in activating cell-mediated immunity, autoimmune and inflammatory responses (Supplementary data Figure S5). The KEGG analysis revealed that DEGs were connected to cell cycle, p53 signalling pathway and apoptosis, where dysfunction of p53 and apoptosis are known for their association with autoimmunity33,34 (Supplementary data Figure S6). Further classification of all upregulated DEGs under GO ontology source revealed their significant enrichment in immune system processes. Their pathway enrichment analysis showed that response to interferon-gamma, regulation of T-cell proliferation, antigen processing, presentation of exogenous peptide antigens, NOD-like receptor signalling, Th1 and Th2 cell differentiation, IL-17 signalling pathway were branch end terms (Fig. 2 and Supplementary data Tables S1, S2).

GO analysis of down-regulated DEGs showed their relation to metabolic and transport processes of a variety of molecules (Fig. 3). Some BP annotations include cellular lipid catabolism processes involved in lipid breakdown, and detoxification of inorganic compounds (Supplementary data Figure S7). MF annotations include symporter activity, which enables active transporting across the membrane and secondary active transmembrane transporter activity, which is a wider term involving solute transportation across the membrane (Supplementary data table S3 and Figure S8). The CC annotations included apical plasma membrane which is the microvilli surface of the lumen and cluster of actin-based cell projections, which form the microvilli of the small intestine. KEGG pathways highlighted mineral absorption, drug metabolism, vitamin digestion and absorption (Fig. 4 and Supplementary data S9, S10).

Figure 3
figure 3

Enriched Immune system groups using the ClueGo and CluePedia plugins of Cytoscape. (A) GO/immune pathwy terms specific for upregulated genes. (B) An overview pie chart with functional groups, including specific terms for the upregulated proteins in the immune pathways. (C) The bars represent the number of genes associated with the immune pathway (AD Figures generated using https://cytoscape.org/).

Figure 4
figure 4

ClueGO analysis of the predicted Go Annotations. Functionally grouped network of enriched categories was generated for the target genes. GO terms are represented as nodes, and the node size represents the term enrichment significance. (A) Representative Biological Process (B) Molecular Function (C) KEGG Pathways (D) Cellular components interactions among predicted targets. (AD Figures generated using https://cytoscape.org/).

Cluster networks and hub genes identification using MCODE scores

Protein interaction network clusters are a group of proteins with great functional similarity than proteins in different clusters, whereas hub genes are functionally significant interconnected nodes in a cluster. MCODE is a Cytoscape plugin that searches for clusters (highly interconnected regions) in a protein interaction network. The PPI network analysis of up and down-regulated DEGs revealed two significant cluster networks from each category (MCODE score of > 5). From the upregulated PPI network, cluster 1, showed 28 nodes linked via 365 edges with an MCODE score of 27.037. The top nodes in this cluster showing MCODE scores of > 23 (PTTG1, CDC20, TTK, BIRC5 and DEPDC1) were identified as hub genes for CeD. The cluster 2 shows 15 nodes linked via 80 edges with MCODE score of 11.429. In cluster 2, the top 4 genes (CXCL9, CXCL10, IRF1 and STAT1) with MCODE scores > 7 were identified as hub genes for CeD. For the downregulated PPI network, the cluster 1, shows 9 nodes linked via 32 edges; of which 5 (55.5%) were identified as hubs with MCODE score of 5.8. The top 3 hub genes (MT1H, MT1G and MT1E) identified for CeD from this cluster had an MCODE score of > 5.2. The second cluster has an MCODE score of 5.4 and is characterized by 31 nodes linked to 81 edges (Fig. 5). The top 2 hub genes showing an MCODE score of > 6 from this cluster were IGFBP3 and APOA1.

Figure 5
figure 5

The MCODE clusters and hub genes identified from DEGs in duodenum tissue of celiac patients. Upregulated (A) Cluster-1. (B) Cluster-2, and Downregulated (C) Cluster-1. (D) Cluster-2 classified based on MCC score > 5. (A-D Figures generated using https://cytoscape.org/).

GO annotations of network clusters

The top cluster networks from MCODE were used as input for analyzing the PPI functional enrichment maps using ClueGo and CluePedia plugins. Tables 2 and 3 shows, highly significant GO annotation clusters with an p-value of < 1.35 × 10–2. The cluster 1 from upregulated DEGs network in BP ontology source has projected mitotic nuclear division and sister chromatid segregation as top GO terms. In MF ontology source, the top GO term was cyclin-dependent protein serine/threonine kinase regulator activity. For CC ontology source, the significant GO terms were related to kinetochore and spindle microtubule. KEGG pathway ontology source included cell cycle and p53 signalling pathway as significant GO terms, whereas cluster-2 was related to immune system processes. From BP ontology source, the top GO terms were cellular response to interferon-gamma and its interferon-gamma signalling pathway. These two GO terms were also seen to be significant under ISP ontology source. MF ontology source highlighted CXCR chemokine receptor binding especially CXCR3 as top GO terms.

Table 2 Functional enrichment of MCODE cluster networks of upregulated DEGs highlights GO terms related to cell division and immune system.
Table 3 Functional enrichment of MCODE cluster networks of downregulated DEGs highlights GO terms related to mineral absorption and metabolism.

Cluster-1 of downregulated DEGs showed that the genes in this cluster were particularly related to mineral absorption and detoxification. The BP ontology source highlighted the detoxification of inorganic compound and stress response to metal ions as top GO terms. While the KEGG ontology source identified mineral absorption pathway as the significant GO term. The cluster-2 (from downregulated DEGs) was related to metabolism and absorption of diverse sets of molecules. BP highlighted GO terms like terpenoid metabolic process which is an organic compound and intestinal absorption. MF ontology source showed modified amino acid transmembrane transporter activity and dicarboxylic acid transmembrane transporter activity as top GO terms. CC ontology source has highlighted lipid absorption and metabolism-related GO annotations including chylomicron which are responsible for lipid transport and very-low-density lipoprotein particle. KEGG underlined GO terms like vitamin digestion and absorption as well as cholesterol metabolism.

Discussion

CeD is a complex multifactorial enteropathy where transglutaminase-deamidated gliadin peptides act as just initial event, but the actual anatomical and histological presentation of the disease is determined by multiple genomic and proteomic alterations taking place in a complex biological network24. Thus, global gene expression, which involves studying expression changes in both immune response genes as well as non-immune response genes controlling the gliadin peptide recognition is an attractive strategy to identify the potential molecular pathological networks involved in CeD development. Several gene expression studies have investigated biological pathways essential for the development of CeD in intestinal tissues40,41 and specific cell types42. By integrating gene expression data with protein interaction network concepts, this study has identified the contribution of dysregulated immune system genes in the intestinal mucosa of CeD. Furthermore, this study reports that gene expression alterations in pathways connected to cell division regulation may have a compensatory role to contain the intestinal mucosal injury due to prolonged autoimmune responses. The additional noteworthy finding is related to impeded absorption, metabolism, and transportation of mineral and vitamins in the intestinal tissues, which eventually increases the likelihood of malnutrition alongside the role of villus atrophy in CeD43.

GO annotations interpret the association of gene products to certain pathways from published works  on disease etiology and development 44. Majority of the annotations are enriched in the up- and down-regulated PPI clusters represent the most interacting group of genes amongst the whole PPI networks; especially, hub genes, which showed highest connectivity and correlation to their modules. Diverse pathways of hub genes connected to dysregulation of the immune system in intestinal duodenum tissues were enriched in the overexpressed genes and subsequently in PPI networks and its functional clusters. In the upregulated DEGs, KEGG pathway (https://www.kegg.jp/kegg/kegg1.html) identified the significant enrichment of signalling pathways like NOD-like receptors (NLRs) and Toll-like receptors (TLRs). Both NOD-like and Toll-like receptors take part in mediating immune recognition by initiating innate immunity and activating adaptive immunity. Specifically, NLRP3 inflammasome (a member of NLRs family) is associated with innate immunity in response to the wheat protein in CeD knockout mice45. Other enriched pathways included genes controlling TNF and IL-17 signalling responses, as well as Th1, Th2 and Th17 differentiation. CD4 + T cells differentiation is directly correlated to autoimmunity, and it is induced by IFNγ and other cytokines including IL-17 and TNF protein46. This differentiation is essential for cytotoxic T lymphocyte activation, leading to intestinal epithelial cell destruction and villus atrophy47.

GO annotations of the immune-related module included signalling pathway of interferon-gamma (IFNγ), a major proinflammatory cytokine implicated in CeD, is well known for its role in regulating immune responses to infections and autoimmune diseases. IFNγ is also known to be very essential for the development of histopathological changes like villus atrophy, crypt hyperplasia in intestinal mucosa and production of CeD-associated antibodies, which mounts a strong adaptive immune response to develop CeD47. The additional key pathway enriched is chemokine signalling PPI cluster, which consists of CXCL9, CXCL10 and CXCL11 as hub genes. Another important hub gene from the immune-related module is STAT1, which is a direct activator of IFN-stimulated cells48. STAT1 has been previously associated with type 1 diabetes, which is caused by pancreatic β-cells destruction via cytokine-mediated apoptosis. Moreover, JAK2 gene, one of the gene from our upregulated PPI network, was previously reported to be overexpressed in intestinal tissues of adults and children CeD patients49. JAK2 is also critical for interleukin 12 (IL-12) signalling, whose production is attributed to several hub genes of this module such as interferon regulatory factors genes (IRF1, IRF8 and IFNG). Both IFNG and IL-12 contribute to T-helper1 cell differentiation and pathogenesis of systemic lupus erythematosus50. This suggests that dysregulated JAK-STAT cytokine signaling pathway mediates cascade of autoimmune reactions in CeD and other co-autoimmune conditions51.

Another major finding from upregulated cluster through KEGG pathway enrichment analysis includes cell cycle and p53 signalling pathways33,34, both of which are known play key role in the activation of intestinal mucosal cellular division and apoptosis52. The hub gene GTSE1 negatively regulates the p53 activity, hence it controls the downstream effects of p53 signalling pathway mediated cell cycle53. The Cyclin B2 (CCNA2) hub gene is directly involved in G2/M transition phase during the cell cycle and delays the cellular senescence and apoptosis by p5354. Other upregulated pathways reported in dietary gluten restricted mouse model of CeD are apoptosis and DNA repair in lamina propria and epithelium of the small intestine47. Upregulation of cell division related processes is thought to be a compensatory mechanism to the continuous apoptosis. The persistent apoptosis without sufficient cellular regeneration, causes villus atrophy of intestinal tissues, subsequently leading to malabsorption, a known complication in CeD patients55. The increased cellular division and abnormal activation of the immune system findings derived from the annotations of the upregulated PPI network and its clusters are consistent with the results of previous gene expression studies on CeD24,56.

The downregulated PPI network cluster results highlights the contribution of impaired homeostasis, digestion, metabolism and absorption pathways in CeD. Of these network clusters, mineral absorption pathway alterations including iron, copper, magnesium and zinc deficiencies are common clinical manifestations seen in CeD patients 57. This is finding is supported by the identification of the metallothionein genes as hub genes in the first downregulated clusters, which are involved in heavy metal homeostasis58. Another identified pathway is vitamin digestion and absorption, enriched by the SLC19A1, SLC46A, and other hub genes in the second downregulated module. Downregulation of this pathway could explain a common CeD clinical symptom- the multivitamin deficiency57. Along with the impaired vitamin absorption, folate (B9) is mainly absorbed in the duodenum, which is affected by villous atrophy, making the CeD patients five times more susceptible to folate deficiency than normal individuals. Lastly, cholesterol metabolism, fat digestion and absorption pathways are enriched in downregulated hub genes like APOA1, APOA4, and CD36. APOA1 is the major component of high-density lipoprotein (HDL) which is strongly associated with coronary heart disease (CHD)59,60. Both low HDL levels and risk of CHD have been reported in CeD patients61. GO annotations of the second cluster includes drug metabolism, metal ion homeostasis, lipid and other molecules transportation. Heme, bile acid and xenobiotic metabolism are downregulated in dietary gluten restricted mouse model of CeD47.

Conclusion

This study highlights the utility of diverse system biology approaches for studying the gene expression profile of duodenum tissues to gain a comprehensive understanding about the underlying molecular mechanisms of CeD. Key pathways connected to potential biological events like (a) dysregulated immune system processes (NOD-like receptor signalling pathway, Th1 and Th2 cell differentiation, IL-17 signalling pathway), (b) loss of regulated cell division (cell cycle, p53 signalling pathway) and (c) impaired absorption (mineral and vitamin digestion and absorption as well as drug metabolism) were identified through protein interaction networks. All those pathways are connected to an increased number of intraepithelial lymphocytes (IELs) and villous atrophy of the duodenal mucosa. Validation of these biological pathways through functional studies could further confirm the present study findings. Furthermore, functional studies can then be utilized to identify the sensitive biomarker panel for diagnosis, prognosis, and novel drug targets for CeD.