Molecular markers of type II alveolar epithelial cells in acute lung injury by bioinformatics analysis

In this study, we aimed to identify molecular markers associated with type II alveolar epithelial cell injury in acute lung injury (ALI) models using bioinformatics methods. The objective was to provide new insights for the diagnosis and treatment of ALI/ARDS. We downloaded RNA SEQ datasets (GSE109913, GSE179418, and GSE119123) from the Gene Expression Omnibus (GEO) and used R language package to screen differentially expressed genes (DEGs). DEGs were annotated using Gene Ontology (GO), and their pathways were analyzed using Kyoto Encyclopedia of Genes and Genomes (KEGG). DEGs were imported into the STRING database and analyzed using Cytoscape software to determine the protein network of DEGs and calculate the top 10 nodes for the hub genes. Finally, potential therapeutic drugs for the hub genes were predicted using the DGIdb database. We identified 78 DEGs, including 70 up-regulated genes and 8 down-regulated genes. GO analysis revealed that the DEGs were mainly involved in biological processes such as granulocyte migration, response to bacterial-derived molecules, and cytokine-mediated signaling pathways. Additionally, they had cytokine activity, chemokine activity, and receptor ligand activity, and functioned in related receptor binding, CXCR chemokine receptor binding, G protein-coupled receptor binding, and other molecular functions. KEGG analysis indicated that the DEGs were mainly involved in TNF signaling pathway, IL-17 signaling pathway, NF-κB signal pathway, chemokine signal pathway, cytokine-cytokine receptor interaction signal pathway, and others. We identified eight hub genes, including IRF7, IFIT1, IFIT3, PSMB8, PSMB9, BST2, OASL2, and ZBP1, which were all up-regulated genes. We identified several hub genes of type II alveolar epithelial cells in ALI mouse models using bioinformatics analysis. These results provide new targets for understanding and treating of ALI.

eliminating pathogens 7 .Consequently, the repair of damaged alveolar epithelial cells, especially the proliferation and differentiation of ATII cells, holds significant importance for the prognosis of ALI/ARDS 8 .Despite the identification of several biomarkers that can predict the severity of damage to alveolar epithelial cells and vascular endothelial cells in ARDS patients, these markers lack uniformity and specificity 9 .Hence, it is imperative to investigate related molecular markers specific to ATII cells in order to enhance our understanding of ARDS and explore novel therapeutic options.
Bioinformatics is an interdisciplinary field that combines computer science, information technology, novel mathematical algorithms, and statistical methods.It specializes in the analysis of biological experimental data, uncovering the hidden biological significance within the data.Moreover, it aims to develop novel data analysis tools for the acquisition and management of diverse information 10 .In comparison to other frequently used statistical methods, bioinformatics is known for its comprehensive and efficient approach.Recent advancements in gene sequencing at the mRNA level have opened up new avenues for investigating the mechanisms and treatment of diseases.The combination of chip technology and bioinformatics analysis further enables the exploration of diseases at the genetic level.While this method has been extensively employed for screening tumor targets at the genome level, there has been a limited number of bioinformatics studies focused on ALI/ARDS.Through these bioinformatics investigations, previous researchers have identified a multitude of genes associated with ALI/ARDS, with many of them being linked to inflammatory mechanisms 11,12 .Additionally, other studies have highlighted the significance of the body's immune response in relation to the prognosis of ALI/ARDS 13,14 .In this study, we employed bioinformatics analysis to explore the pivotal genes and molecular markers linked to ATII injury in various ALI models.We retrieved the necessary dataset from GEO, conducted data sorting and analysis using the R language, and identified the shared Differentially Expressed Genes (DEGs) in different ALI models.By subjecting these DEGs to Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses, our aim was to uncover common patterns in gene expression.This approach enables us to potentially identify novel strategies for improving patient outcomes in the context of ARDS.

Data sources
The Gene Expression Omnibus (GEO) is a gene expression database established by the National Center for Biotechnology Information (NCBI) in 2000.This database contains gene expression data submitted by research institutions worldwide, including gene chip and high-throughput sequencing data.We utilized GEO to retrieve the data needed for our study.We used "ATII", "acute lung injury" and "ATII and acute lung injury"as key words to retrieve data sets.The screening criteria for selecting data were as follows: (1) model: acute lung injury mouse models, (2) data type: high-throughput sequencing and single-cell sequencing, (3) cell type: ATII cells, and (4) sample size: each dataset includes at least three samples in both the control and experimental groups.By setting these conditions, we aimed to obtain high-quality and relevant data for our study.The data set is screened again based on processing time.

Research method
Identify differentially expressed genes (DEGs) R language is a statistical programming language known for its capabilities in statistical computation, data mining, and data visualization.RStudio provides a supportive environment for executing code, creating visualizations, and more.In our study, we employed the R software to preprocess the data, which encompassed batch correction and standardization tasks accomplished through the use of the DESeq2 software package.To identify the differentially expressed genes (DEGs), we utilized the limma software package and removed overlapping genes from the datasets to obtain a final list of DEGs.The screening criteria for differential expression were set at an adjusted p-value < 0.05 and an absolute value of logarithmic fold change (|LogFC|) > 1 15 .These criteria ensured that only significant and biologically relevant genes were included in subsequent enrichment and protein network analyses.

Enrichment analysis of DEGs
To gain insights into the biological functions and pathways associated with the differentially expressed genes (DEGs), we performed Gene Ontology (GO) enrichment analysis (biological process, cellular component, and molecular function) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis using the clus-terProfiler package in R software.We used the "ggplot2" package to generate graphical representations of the analysis results.The primary parameters were configured as follows: an enrichment significance threshold of (p-value) < 0.05 and a corrected p-value (q-value) < 0.05.Using these criteria, we were able to identify significantly enriched components within the analysis module.

Protein-protein interaction (PPI) network and identification of hub genes
To construct a protein-protein interaction (PPI) network for the differentially expressed genes (DEGs), we used the Search Tool for the Retrieval of Interacting Genes (STRING) database (http:// string-db.org/).We uploaded the list of Differentially Expressed Genes (DEGs) into the STRING database, specifying the species as "Mouse, " and set the significance threshold at 0.4.Subsequently, the interaction network diagram of DEGs was automatically generated by the database.We then exported the node file for visualization using Cytoscape software 16.The CytoHubba plug-in and the MCODE plug-in are widely employed analytical tools within the software.In our analysis, we opted for the degree scoring algorithm in CytoHubba, as it is the most commonly used algorithm for identifying key genes.We set the criteria to select the top 10 key genes based on the degree ranking.Additionally, the MCODE plug-in was utilized to detect the most densely interacting module within the Protein-Protein

Identify drug candidates
DGIdb database is a valuable resource for predicting potential drugs for identified hub genes.By integrating drug-gene interaction data, it allows users to search for potential drugs that can target specific genes of interest 18 .The database provides information on FDA-approved drugs, experimental drugs, and investigational drugs, as well as their indications, target genes, and drug-gene interaction types 19 .Users can search for drugs based on gene symbols or drug names, and filter the results based on various criteria such as drug type, drug status, and interaction type.The database also provides links to other resources such as PubMed and ClinicalTrials.govfor further information on drugs and their clinical applications.

Acquisition of data sets
According to the 2012 Berlin definition of ARDS, respiratory dysfunction must occur within one week of a known insult 20 .Therefore, we selected mouse models of ALI from datasets with time points within 72 h.Ultimately, we identified three gene chip datasets from the GEO database platform: GSE109913, which contains three samples of lung injury caused by lipopolysaccharide infection and control samples; GSE179418, which contains three samples of lung injury caused by Escherichia coli infection; and GSE119123, which contains five samples of lung injury caused by influenza virus.Each dataset includes an equal number of control samples.The detailed informationis illustrated in Table 1.

Differentially expressed genes of ATII in 3 datasets
The R programming language package was utilized to analyze the aforementioned single cell RNA sequencing (RNA-seq) datasets.Using Venn diagrams, a total of 82 genes were identified, with 70 being up-regulated and

Go enrichment and KEGG pathway enrichment analysis of DEGs
The Go analysis was performed on the biological processes (BP), cellular components (CC), and molecular functions (MF) of the DEGs.The results revealed that the CC of these DEGs were primarily composed of host cell components, proteinome core complexes, and endopeptidase complexes, etc.The BP were significantly enriched in granulocyte migration, response to bacterial-derived molecules, cytokine-mediated signaling pathways, and others.These DEGs were also found to have cytokine activity, chemokine activity, and receptor ligand activity, and they can participate in related receptor binding, CXCR chemokine receptor binding, and G protein-coupled receptor binding, among others.Additionally, KEGG analysis indicated that the DEGs were mainly involved in TNF signaling pathway, IL-17 signaling pathway, NF-κB signaling pathway, chemokine signaling pathway, and cytokine receptor interaction signaling pathway (as shown in Figs. 2 and 3).The present study suggests that these  pathways play a crucial role in the development of human ALI/ARDS.The NF-κB signaling pathway is intricately involved in various processes, including inflammatory responses, immune responses, apoptosis regulation, and stress responses.In the context of acute lung injury inflammation, NF-κB serves as a key transcription factor that, upon activation, promotes the expression of relevant inflammatory mediators.Additionally, it has the capacity to regulate the expression of genes associated with ALI 21 .Recent literature highlights the cytokine storm as a pivotal factor in inducing ARDS, with IL-17, TNF-α, and IL-6 being extensively discussed 22 .IL-17 plays a role in recruiting neutrophils, stimulating the release of various inflammatory cytokines, and amplifying the inflammatory response.Reducing the expression of TNF-α can benefit patients dealing with the disease.Chemokines also play a critical role in these processes, as they are produced by neutrophils and macrophages, and contribute to cell aggregation while maintaining local inflammation homeostasis.Considering these pathways, blocking the activity of relevant factors may be a viable approach for the treatment of ALI/ARDS.

Protein-protein interaction (PPI) network and identification of hub genes
Through the use of the STRING database and Cytoscape software, we identified the top 10 genes with the highest degree and the subnetwork with the highest MCODE score.The intersection of these results yielded eight hub genes: IFIT1, IFIT3, IRF7, PSMB8, PSMB9, BST2, OASL2, and ZBP1 (Table 2).These genes were identified as critical players in the PPI network and could potentially serve as therapeutic targets for ARDS.The visualization of this subnetwork is shown in Fig. 4.

Identify drug candidates
Through the use of the DGIdb online database, we were able to identify drugs that are potentially relevant for the genes PSMB8 and PSMB9.However, we were unable to find any relevant drugs for the other hub genes.Table 3 displays the results of the drug prediction analysis for PSMB8 and PSMB9.

Discussion and conclusions
In recent years, significant progress has been made in the research of ALI/ARDS, including epidemiology, pathogenesis, and pathophysiology.Studies on optimizing mechanical ventilation modes and fluid management have also brought benefits to clinical treatment.However, specific and effective therapeutic drugs for ALI/ARDS have not yet been identified 23 .With the rapid development of modern biotechnology, bioinformatics has gained more attention as researchers explore therapeutic options for ALI/ARDS at the molecular and cellular levels 24 .www.nature.com/scientificreports/Numerous studies have shown that multiple pathogenic factors induce gene alterations during ALI/ARDS 25 .In this study, 78 co-expressed DEGs were screened from three ATII single-cell RNA sequencing datasets of ALI mouse models induced by three different pathogens.Finally, eight hub genes, including IRF7, IFIT1, IFIT3, PSMB8, PSMB9, BST2, OASL2, and ZBP1, were identified using bioinformatics methods.These hub genes are closely related to ATII injury during ALI, as indicated by protein network analysis.
A literature review shows that IRF7, IFIT1, IFIT3, BST2, and OASL2 are related to the immune response of the body.During lung injury, these genes are induced by interferon and participate in the IRF3-IFNAR-STAT1-IFIT1 signal pathway of pulmonary epithelial cells.They work together to regulate the activation of inflammatory cells and induce the death of infected cells 26 .The expression of IRF7 is increased by viral infection, tumor necrosis factor-α (TNF-α), and the inflammatory cytokine IL-1β.IRF7 also induces plasmacytoid dendritic cells and monocytes to produce the inflammatory cytokine IL-6, which participates in the occurrence and development of ALI/ARDS 27 .Proteomic studies of bronchoalveolar lavage fluid and ATII cells in ALI mouse models confirmed the important role of IRF7 26 .Excessive activation of IRF7 promotes the development of ALI/ARDS caused by Influenza A Avirus (IAV), and reducing IRF7 activity at local infection sites may be a new method to treat ALI/ ARDS in IAV 28 .Studies have shown that IFN-induced protein with tetrapeptide repeats 3 (IFIT3) protects against lung injury caused by viral infection 29,30 .IFIT1 and IFIT3 induced by interferon can be used as relevant marker proteins of M1 polarization of pulmonary macrophages, and a useful marker of potential inflammatory diseases 31 .Bone marrow stromal cell antigen 2 (BST2) activates the NF-κB signal pathway, promoting the production of proinflammatory factors such as TNF-α, IL-1β, and IL-6 32 .As a member of the 2′-5′ oligoadenylate synthetase (OAS) family, OASL2 encodes an important antiviral protein and promotes the cleavage of viruses or infected cells 33 .At present, the role of OASL2 in ALI/ARDS has not been reported.Our results show that the expression of OASL2 is increased in different ALI models, suggesting that OASL2 may be a potential target worth further exploration in ALI/ARDS.
Proteasome subunit beta type-8 (PSMB8) and Proteasome subunit beta type-9 (PSMB9) are mainly expressed in monocytes and T lymphocytes, encoding proteasome β subunit.They are responsible for the degradation of proteasome after ubiquitination, promoting abnormal proliferation and anti-apoptosis of cancer cells 34 .The expression of PSMB8 and PSMB9 is significantly down-regulated in tumors 35 , but in inflammatory diseases, PSMB8 is highly expressed.By recognizing and degrading pathway repressor proteins in the NF-κB pathway, PSMB8 can promote the release of inflammatory mediators 36 .Selective inhibitors of PSMB8 can block and reduce the inflammatory reaction 37 .In the study of IAV-induced lung epithelial cell injury, researchers found that PSMB8 gene was up-regulated and inhibition of PSMB8 reduced the replication of influenza virus and attenuated lung epithelial injury 38 .Similarly, when analyzing the lung tissue and single-cell transcriptome results of patients with COVID19 infection, the results also showed that the expression of PSMB8 and PSMB9 was increased and related to the polarization of pulmonary macrophages 39 .Our results, together with others, indicate that PSMB8 and PSMB9 may play an important role in ALI/ARDS, providing a new target for treatment.
Z-DNA binding protein 1 (ZBP1), mostly expressed in CD8 + T lymphocytes, plays an important role in immune defense 40 .Previous reports revealed that ZBP1 could regulate the activation of the Nod-like receptor with pyrin domain-containing 3 inflammasome (NLRP3) through the RIPK3-caspase-8 axis, and promote the secretion of IL-1β and interleukin-18 (IL-18).Moreover, it could stimulate the apoptosis of necrotic cells at the infection site through pyroptosis 41 .Additionally, in an IAV-induced lung injury model of mice, ZBP1 could activate the NF-κB signaling and promote pro-inflammatory cytokines, resulting in the formation of neutrophil extracellular traps 42 .Our results showed that ZBP1 was highly expressed in three different ALI models, suggesting a prominent role in ALI/ARDS.However, the particular role and mechanism of ZBP1 still need further study.
Analyzing the final hub genes, we found that they all play an important role in regulating the immune system, which provides new ideas for the treatment of ALI/ARDS.Through online drug prediction, we obtained drugs primarily targeting genes PSMB8 and PSMB9, which are proteasome inhibitors mainly used in tumors and immune-related diseases 43 .An experiment demonstrated that bortezomib could improve lung function in an acute pancreatitis model of mice and reduce other complications 44 .Studies of other drugs are mainly used for tumor diseases such as hematological malignancies 45 , glioblastoma 46 , etc. Currently, there are no relevant reports on ALI/ARDS, so further research is needed in this field.
In summary, using bioinformatics methods, we screened and analyzed the common characteristics of differentially expressed genes of ATII in ALI/ARDS caused by different pathogens.The results of this study provide a basis for further exploring the pathogenesis, prognosis evaluation, and new drug targets of ALI/ARDS.The predicted related drugs need to be further investigated in animal experiments and clinical studies.

Figure 2 .
Figure 2. Gene ontology (GO) analysis (BP, CC, and MF) of DEGs.(A) Barplot, the abscissa is the number of enriched genes; (B) Bubble, the abscissa GeneRatio represents the proportion of enriched genes to the total number of genes.

Figure 3 .
Figure 3.The KEGG enrichment analysis of DEGs.(A) Barplot, the abscissa is the number of enriched genes; (B) Bubble, the abscissa GeneRatio represents the proportion of enriched genes to the total number of genes.

Table 2 .
Hub genes.Note: logFC (fold change, the ratio of expression between the control group and the experimental group); P value (corrected p value); FDR (error rate, i.e. false positive).(A)

Table 3 .
Drug candidates combined with hub genes.