Characterization of PANoptosis-related genes in Crohn’s disease by integrated bioinformatics, machine learning and experiments

Currently, the biological understanding of Crohn’s disease (CD) remains limited. PANoptosis is a revolutionary form of cell death reported to participate in numerous diseases, including CD. In our study, we aimed to uncover the roles of PANoptosis in CD. Differentially expressed PANoptosis-related genes (DE-PRGs) were identified by overlapping PANoptosis-related genes and differentially expressed genes between CD and normal samples in a combined microarray dataset. Three machine learning algorithms were adopted to detect hub DE-PRGs. To stratify the heterogeneity within CD patients, nonnegative matrix factorization clustering was conducted. In terms of immune landscape analysis, the “ssGSEA” method was applied. qRT-PCR was performed to examine the expression levels of the hub DE-PRGs in CD patients and colitis model mice. Ten hub DE-PRGs with satisfactory diagnostic performance were identified and validated: CD44, CIDEC, NDRG1, NUMA1, PEA15, RAG1, S100A8, S100A9, TIMP1 and XBP1. These genes displayed significant associations with certain immune cell types and CD-related genes. We also constructed gene‒microRNA, gene‒transcription factor and drug‒gene interaction networks. CD samples were classified into two PANoptosis patterns according to the expression levels of the hub DE-PRGs. Our results suggest that PANoptosis plays a nonnegligible role in CD by modulating the immune system and interacting with CD-related genes.


Nonnegative matrix factorization
The underlying concept of nonnegative matrix factorization (NMF) may be summarized as follows 23 .The data matrix X is represented by R p×q dimensions, where p represents the data dimension and q represents the sample size.The goal of the NMF approach is to uncover the nonnegative matrices U = u ij ∈ R p×r and V = v ij ∈ R q×r .In this context, the symbols u ij and v ij denote the specific element located at the ij th position in the matrices U and V , respectively.The symbol r denotes the intended reduced dimension.The link between the matrices is estimated as X ≈ UV T , where U is the base matrix and V is the coefficient matrix.The calcula- tion of similarity is achieved by determining the distance.The generally used distance metric is the Frobenius norm, which measures the squared Euclidean distance between two matrices 24 .Therefore, the NMF model may be formulated as an optimization problem: The symbol �•� F denotes the Frobenius norm.The requirements U ≥ 0 and V ≥ 0 imply that every element in matrices U and V is nonnegative.The objective function of the problem exhibits convexity in either U or V , rendering the identification of the global minimum unattainable.Hence, the following multiplicative updating rules are suggested to obtain the best solutions: As an efficient dimensionality reduction tool, NMF is frequently utilized to discover molecular patterns based on high-dimensional genomics data.We used the "NMF" R package to classify 279 CD samples into several clusters based upon the hub DE-PRGs, the geometrical distance between which was visualized by PCA.

Human subjects
Ten CD patients whose diagnoses of CD were confirmed by their clinical manifestations, endoscopic examinations, and biopsies were recruited.CD samples from the involved mucosa together with control samples from the normal mucosa were acquired during surgery.This protocol was approved by the Ethics Committees of Xiangya Hospital of Central South University.All research was performed in accordance with relevant guidelines and regulations.

Acute TNBS-induced and DSS-induced mouse models
Twenty-two male C57BL/6J mice aged 6-8 weeks were purchased from Hunan SJA Laboratory Animal Co., Ltd.(Changsha, China) and housed in a specific pathogen-free facility at Central South University.Seven mice were intrarectally administered 100 µL of 2% w/v TNBS (Sigma-Aldrich) while five control mice were intrarectally injected with 100 µL 50% ethanol.All of them were euthanized 7 days later.For the DSS mouse model, to (1)  min

Quantitative real-time PCR
Total RNA was extracted with TransZol Up reagent (TransGen Biotech), followed by cDNA synthesis with TransScript Uni All-in-One First-Strand cDNA Synthesis SuperMix for qPCR (TransGen Biotech).Quantitative real-time PCR (qRT-PCR) was then performed using PerfectStart Green qPCR SuperMix (TransGen Biotech) on an ABI QuantStudio 5 instrument.Gene expression was normalized to that of glyceraldehyde-3-phosphate dehydrogenase.The sequences of the primers used are listed in Supplementary file 2: Table S2.

Statistical analysis
All analyses were conducted with R software 4.3.0 and Cytoscape software 3.10.0 25 .Unpaired Student's t tests and Wilcoxon rank-sum tests were used to compare the differences between two groups.adj.p < 0.05 was considered significant.
The overall workflow of our study is delineated in Fig. 1.

Ethics approval and consent to participate
This study was approved by the Ethics Committees of Xiangya Hospital of Central South University and the Institutional Animal Care and Use Committee of Central South University.Informed consent was obtained from all subjects involved in the study.

GEO dataset integration and immune landscape of CD
We constructed a combined dataset covering 279 CD samples and 224 control samples from mucosa after the removal of batch effects (Fig. 2A,B).A broadly uncoordinated immune response is an indispensable hallmark of CD.With the aim of revealing the immune landscape, we scored the immune cell infiltration of CD patients and controls via the ssGSEA method.As illustrated in Fig. 2C, the infiltration of 20 immune cells in the CD group and control group was significantly different, among which only the scores of T helper 17 (Th17) cells were lower in CD tissues than in control tissues.We then performed a correlation analysis of distinct immune cells, as shown in Fig. 2D.Interestingly, Th17 cells, CD56bright natural killer (NK) cells, CD56dim NK cells and monocytes showed inverse correlations with almost all other immune cells, whereas the other immune cells were generally positively correlated with one another, which deserves special attention.

Identification of DE-PRGs
A total of 1265 DEGs, consisting of 592 upregulated and 673 downregulated genes, were identified through differential expression analysis (Fig. 3A).A list of possible PRGs was produced from previous research (Supplementary file 1: Table S1).Subsequently, we intersected the 1265 DEGs with 930 PRGs via a Venn diagram; thus, 130 DE-PRGs were identified (Fig. 3B), which were further grouped in a heatmap (Fig. 3C).The overall expression of these DE-PRGs in the CD group and control group is shown in Supplementary file 3: Fig. S1.We could conclude that the vast majority of DE-PRGs were expressed at higher levels in CD tissues than in control tissues.

Enrichment and PPI network analyses of DE-PRGs
We then examined the latent functions and signaling pathways of the DE-PRGs.GO analysis revealed that these DE-PRGs were predominantly involved in regulation of apoptotic signaling pathway, leukocyte cell-cell adhesion, regulation of inflammatory response (biological process); membrane raft, membrane microdomain, focal adhesion (cellular component); ubiquitin-like protein ligase binding, ubiquitin protein ligase binding, and phosphatase binding (molecular function) (Supplementary file 4: Fig. S2A).Additionally, DE-PRGs were notably enriched in apoptosis, proteoglycans in cancer, NOD-like receptor signaling pathway, among others, according to the KEGG results (Supplementary file 4: Fig. S2B).Moreover, a PPI network analysis of the DE-PRGs was performed and a complex network of the DE-PRGs was constructed (Supplementary file 5: Fig. S3).

Regulatory networks of the hub DE-PRGs
Subsequently, a gene-miRNA interaction network of the 10 hub DE-PRGs consisting of 226 nodes and 338 edges was constructed (Supplementary file 7: Fig. S5 and Supplementary file 8: Table S3).Apparently, miR-124-3p, miR-34a-5p and miR-27a-3p were most strongly associated with the hub DE-PRGs in CD.After that, we generated a gene-TF regulatory network of the 10 hub DE-PRGs (Supplementary file 9: Fig. S6).The 10 hub DE-PRGs were regulated by 35 total TFs.Among them, FOXC1 was found to regulate as many as 7 hub DE-PRGs and S100A8 was regulated by 13 miRNAs (Supplementary file 10: Table S4).In addition, we looked for available drugs that act on the hub DE-PRGs, and a host of drugs were involved (Supplementary file 11: Fig. S7 and Supplementary file 12: Table S5).Specifically, a total of 19 drugs interacted with XBP1, 8 of which inhibited it.

Recognition of PANclusters
To distinguish different PANoptosis patterns in CD patients, we adopted the NMF method for unsupervised clustering on the basis of the 10 hub DE-PRGs.At k = 2, the most stable and optimal PANclusters were identified (Fig. 7A).There were 101 and 178 CD samples in PANcluster A and PANcluster B, respectively.The geometrical distance between the two clusters is shown in Fig. 7B, validating their gene expression heterogeneity.Thereafter, a boxplot and a heatmap were generated to compare the expression levels of the hub DE-PRGs between PANcluster A and PANcluster B (Fig. 7C,D).Specifically, PANcluster A was distinguished by the considerably high expression levels of CIDEC, NDRG1, NUMA1 and RAG1, while the other hub DE-PRGs, that is, CD44, PEA15, S100A8, S100A9, TIMP1 and XBP1, were expressed at higher levels in PANcluster B.

GSVA of key pathways between the PANclusters
GSVA was performed with the aim of shedding light on the functional diversity patterns of the recognized PANclusters.With regard to Hallmark pathways, increased activity of p53 pathway, androgen response and hypoxia were detected in PANcluster A, whereas mTORC1 signaling, inflammatory response, TNF-α signaling via NF-κB, IL-6/JAK/STAT3 signaling and epithelial mesenchymal transition were increased in PANcluster B (Supplementary file 13: Fig. S8A).In addition, results from the KEGG analysis suggested that PANcluster A had hypoactive ECM-receptor interaction and endocytosis but expressed high levels of genes associated with cytokine-cytokine receptor interaction and numerous signaling pathways, including toll-like receptor signaling pathway and NOD-like receptor signaling pathway (Supplementary file 13: Fig. S8B).Concerning the Reactomebased pathways, PANcluster A showed an increase in the cell cycle pathway, while most pathways, such as cytokine signaling in immune system and extracellular matrix-related pathways, were significantly enriched in PANcluster B (Supplementary file 13: Fig. S8C).

Characterization of different PANclusters
To clarify the disparities in the immune system among the PANclusters, we compared their immune microenvironments, as shown in Fig. 8A.Remarkably, the enrichment scores of 26 immune cells were much greater in PANcluster B than in PANcluster A. Consequently, CD56bright NK cells and monocytes were the only two exceptions with higher infiltration degrees in PANcluster A, the explanations behind which demand further investigation.In addition, differential gene analysis revealed 533 DEGs, including 171 upregulated and 362 downregulated genes (Fig. 8B).To learn more about the biological functions and processes linked to these DEGs, GO and KEGG analyses were performed.The 533 DEGs were markedly enriched in the following terms: positive regulation of cell adhesion, leukocyte cell-cell adhesion, and extracellular matrix organization (biological process); collagen-containing extracellular matrix, secretory granule membrane, and basement membrane (cellular component); and extracellular matrix structural constituent, glycosaminoglycan binding, and integrin binding (molecular function) (Fig. 8C,D).Moreover, the 533 DEGs were principally involved in many pathways, such as cell adhesion molecules, ECM-receptor interaction and PI3K-Akt signaling pathway (Fig. 8E).

Validation of the hub DE-PRGs
CD and control samples were acquired from 10 patients who were diagnosed with CD, and their demographic and clinical information is presented in Table 1.qRT-PCR was subsequently conducted to determine the relative expression levels of the 10 hub DE-PRGs (Fig. 9A).As expected, the levels of CD44, PEA15, S100A8, S100A9, TIMP1 and XBP1 increased in CD samples compared with those in control samples; while the opposite trend was observed for NDRG1.Moreover, there was no significant difference in the mRNA expression levels of CIDEC, NUMA1 or RAG1.Furthermore, we established classic TNBS and DSS mouse models of CD and collected colon tissues to analyze the expression levels of the hub DE-PRGs in murine colon tissues from the TNBS, DSS and control groups (Fig. 9B,C).Generally, the results of the TNBS model were in line with expectations.Specifically, in TNBS-induced colitis, Cd44, Numa1, S100a8, S100a9, Timp1 and Xbp1 were more highly expressed, while Cidec and Rag1 were less expressed.In addition, the levels of Ndrg1 and Pea15a did not significantly differ between the TNBS group and the control group.Consistent with previous work, in the DSS mouse model, the expression levels of Cd44, S100a8, S100a9 and Timp1 were greater in the mice with colitis; while the expression level of Ndrg1 was lower in the mice with colitis.In addition, no significant difference in the expression levels of Cidec, Pea15a or Xbp1 was detected.Unexpectedly, the expression levels of Numa1 and Rag1 in the DSS group were different from those in the CD and TNBS colitis groups.

Discussion
As a fundamental physiological and pathological process, cell death has a variety of functions, ranging from homeostasis maintenance, embryogenesis and aging to immune system modulation 26 .Dysregulated immune responses and intestinal epithelial cell death, which are common in patients suffering from CD, have long been speculated to contribute to perpetuating inflammation 27 .Existing studies have typically focused on individual forms of cell death rather than the complex and intricate crosstalk among disparate forms of cell death in CD.Currently, a novel form of inflammatory cell death called PANoptosis (apoptosis, necroptosis, pyroptosis and ferroptosis) has been proposed.Herein, we provide a comprehensive and thorough investigation of the roles of PRGs in CD.Our findings highlight the associations between PANoptosis, dysregulation of immunity and CD pathogenesis and development.
Utilizing data from four microarray datasets, we identified 130 DE-PRGs, which were enriched mainly in leukocyte adhesion, inflammatory response, membrane raft and focal adhesion, between the CD group and the normal group.Leukocyte adhesion to the intestinal microvascular endothelium is considered to initiate inflammation in CD 28 .Moreover, lipid rafts have been reported to play a unique role in inflammation and are regarded as a promising therapeutic strategy against inflammation 29 , but their function in CD has not yet been elucidated.Focal adhesion kinase is implicated in the antifibrotic effects of milk fat globule-epidermal growth factor 8 in CD 30 .Likewise, the KEGG enrichment results illustrated that a range of pathways were enriched, such as proteoglycans in cancer and NOD-like receptor signaling pathway.This may partly account for the higher incidence of colorectal cancer in patients with CD 31 .Consistent with earlier reports, NOD2 mutations are likely responsible for the pathogenesis and development of CD 32 , and further investigations are needed to validate the hypothetical underlying mechanisms.
Ten hub DE-PRGs with salient predictive values were further selected and validated by integrated machine learning approaches and experiments: CD44, CIDEC, NDRG1, NUMA1, PEA15, RAG1, S100A8, S100A9, TIMP1 and XBP1.It was not surprising that the expression levels of CIDEC, NUMA1 and RAG1 in human        Since immune dysregulation is an indispensable feature of CD, we comprehensively explored its immune landscape.Generally, the CD group exhibited a relatively high level of immune cell infiltration, stressing the impacts of abnormal immune responses.Unlike that of other immune cells, the infiltration score of Th17 cells was lower in the CD group than in the control group.Th17 cells are thought to be key effectors of both protective immunity and autoimmune inflammation 54 .Intriguingly, single-cell analyses revealed augmented Th17 activation in CD tissues 55 .One reasonable explanation for this contradiction lies in Th17 plasticity 56 and in such a context the failure of anti-IL-17 therapy in CD has become understandable and acceptable 54 .There is no denying that additional studies on Th17 cells are necessary.Whether there are significant differences between the CD and normal groups concerning the infiltration of CD56bright NK cells, CD56dim NK cells and monocytes is controversial according to earlier studies [57][58][59] .Our results support the view that the infiltration of CD56bright NK cells, CD56dim NK cells and monocytes does not differ between CD and normal samples, all of which are negatively linked to most other immune cells.This divergence may be attributed to heterogeneity among CD patients and distinct datasets and algorithms.More samples and more advanced algorithms are indispensable to end this controversy.
To the best of our knowledge, our study is the first to comprehensively describe the roles of PANoptosis in CD.Not only have the roles of PRGs been evaluated, but the interactions among PANoptosis, aberrant immunity and CD pathogenesis have also received particular attention.In addition, the combined analysis of three advanced machine learning techniques effectively reduced bias.Another key advantage lies in that we innovatively categorized CD patients based on hub DE-PRGs.Specific or even personalized diagnosis and therapy may be inspired by our findings.Finally, we not only recruited human subjects but also established TNBS-induced and DSS-induced mouse models to validate our results.
It cannot be denied that several limitations still exist in our study.First, limited by the raw data, we were incapable of tracing the specific sampling locations of the human tissues for the generation of the microarray datasets and connecting expression profiles with specific phenotypes.Second, the sample sizes of both human subjects and mice were small and we established only two acute mouse models to confirm our results.Accordingly, we are going to enlarge the sample size next and employ chronic murine models of CD.To further elucidate the mechanisms underlying the impacts of hub DE-PRGs on CD, we also plan to carry out causal inference analysis.

Conclusions
In summary, we screened and validated 10 hub DE-PRGs in CD patients via integrated bioinformatics, machine learning and experiments, all of which performed well in terms of CD diagnosis.Each hub DE-PRG showed significant associations with certain immune cells and CD-related genes, indicating that their roles in CD involved modulating the immune system and interacting with genes involved in the pathogenesis of CD.Gene-miRNA, gene-TF and drug-gene interaction networks were subsequently generated based on the hub DE-PRGs, which could provide directions for subsequent CD therapies.We also identified two PANclusters with distinct immune infiltration and functional patterns.Undoubtedly, further studies are urgently warranted to uncover the involvement of PANoptosis in CD etiology and develop more effective diagnostic and therapeutic strategies for CD.

Figure 3 .
Figure 3. Identification of DE-PRGs.(A) Volcano map of the DEGs with the cutoff threshold set at |log2 (fold change)| > 1 and adj.p < 0.05.The blue dots represent downregulated DEGs, the red dots represent upregulated DEGs, and the gray dots represent genes with no significant difference.(B) Venn diagram of DEGs and PRGs.Pink circle represents DEGs, blue circle represents PRGs, and their overlapping area represents DE-PRGs.(C) Clustered heatmap of the top 40 DE-PRGs.Each row represents one of the top 40 DE-PRGs, and each column represents one sample, either CD or normal.DE-PRGs differentially expressed PANoptosis-related genes, DEGs differentially expressed genes, PRGs PANoptosis-related genes, CD Crohn's disease.

Figure 4 .Figure 5 .
Figure 4. Identification of the hub DE-PRGs.(A) Cross-validations of adjusted parameter selection in the LASSO model.Each curve corresponds to one gene.(B) LASSO coefficient analysis.Vertical dashed lines are plotted at the best lambda.(C) SVM algorithm for hub gene selection.(D) Relationship between the number of random forest trees and error rates.(E) Ranking of the relative importance of genes.(F) Venn diagram showing the 10 hub DE-PRGs identified by LASSO, SVM and RF.Pink circle represents potential hub DE-PRGs identified by RF, blue circle represents potential hub DE-PRGs identified by SVM, green circle represents potential hub DE-PRGs identified by LASSO, and their overlapping area represents the final hub DE-PRGs.(G) Chord diagram showing the correlations between the hub DE-PRGs.Red represents positive correlations between different genes and green represents negative correlations between different genes.(H) ROC curves of the hub DE-PRGs in CD diagnosis.DE-PRGs differentially expressed PANoptosis-related genes, LASSO least absolute shrinkage and selection operator, RF random forest, SVM support vector machine, ROC receiver operating characteristic, AUC area under the curve, CD Crohn's disease.

Figure 6 .
Figure 6.Expression levels of the top 30 CD-related genes and relationships between them and hub DE-PRGs.(A) Boxplot of the top 30 crucial genes in relation to CD.The blue bars represent controls, and the red bars represent CD samples.(B) Pearson correlation analysis between the top 30 CD-related genes and the 10 hub DE-PRGs.*p < 0.05; **p < 0.01; ***p < 0.001.CD Crohn's disease, DE-PRGs differentially expressed PANoptosisrelated genes.

Figure 7 .
Figure 7. Recognition of PANclusters in CD. (A) Unsupervised clustering matrix generated using NMF method when k = 2. (B) PCA plot showing the distribution of PANcluster A and PANcluster B. The red dots represent PANcluster A and the blue dots represent PANcluster B. (C) Boxplot of the expression levels of the hub DE-PRGs in PANcluster A and PANcluster B. The red bars represent PANcluster A, and the blue bars represent PANcluster B. (D) Heatmap of the expression levels of the hub DE-PRGs in PANcluster A and PANcluster B. Each row represents one hub DE-PRG, and each column represents one CD sample.PANclusters PANoptosis patterns, CD Crohn's disease, NMF nonnegative matrix factorization, PCA principal component analysis, DE-PRGs differentially expressed PANoptosis-related genes.

Figure 8 .
Figure 8. Characterization of different PANclusters.(A) Infiltration levels of 28 immune cell subtypes in PANclusters A and B. The red bars represent PANcluster A, and the blue bars represent PANcluster B. (B) Volcano map of DEGs between PANclusters A and B. The blue dots represent downregulated DEGs, the red dots represent upregulated DEGs, and the gray dots represent genes with no significant difference.(C,D) Enriched items in GO analysis based on the DEGs between PANclusters A and B. (E) Enriched items in KEGG analysis based on the DEGs between PANclusters A and B. Node color indicates gene expression level; quadrilateral color indicates z-score.PANclusters PANoptosis patterns, DEGs differentially expressed genes, BP biological process, CC cellular component, MF molecular function, GO Gene Ontology, KEGG Kyoto Encyclopedia of Genes and Genomes.
Figure 1.Schematic representation of the study.PRGs PANoptosis-related genes, CD Crohn's disease, GEO Gene Expression Omnibus, DEGs differentially expressed genes, DE-PRGs differentially expressed PANoptosisrelated genes, GO Gene Ontology, KEGG Kyoto Encyclopedia of Genes and Genomes, PPI protein-protein interaction, LASSO least absolute shrinkage and selection operator, RF random forest, SVM support vector machine, The purple squares represent positive correlations, and the orange squares represent inverse correlations.GEO Gene Expression Omnibus, CD Crohn's disease, PCA principal component analysis.