Artificial neural network identified the significant genes to distinguish Idiopathic pulmonary fibrosis

Li, Zhongzheng; Wang, Shenghui; Zhao, Huabin; Yan, Peishuo; Yuan, Hongmei; Zhao, Mengxia; Wan, Ruyan; Yu, Guoying; Wang, Lan

doi:10.1038/s41598-023-28536-w

Download PDF

Article
Open access
Published: 21 January 2023

Artificial neural network identified the significant genes to distinguish Idiopathic pulmonary fibrosis

Zhongzheng Li¹,
Shenghui Wang¹,
Huabin Zhao¹,
Peishuo Yan¹,
Hongmei Yuan¹,
Mengxia Zhao¹,
Ruyan Wan¹,
Guoying Yu¹ &
…
Lan Wang¹

Scientific Reports volume 13, Article number: 1225 (2023) Cite this article

1678 Accesses
5 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Idiopathic pulmonary fibrosis (IPF) is a progressive interstitial lung disease that causes irreversible damage to lung tissue characterized by excessive deposition of extracellular matrix (ECM) and remodeling of lung parenchyma. The current diagnosis of IPF is complex and usually completed by a multidisciplinary team including clinicians, radiologists and pathologists they work together and make decision for an effective treatment, it is imperative to introduce novel practical methods for IPF diagnosis. This study provided a new diagnostic model of idiopathic pulmonary fibrosis based on machine learning. Six genes including CDH3, DIO2, ADAMTS14, HS6ST2, IL13RA2, and IGFL2 were identified based on the differentially expressed genes in IPF patients compare to healthy subjects through a random forest classifier with the existing gene expression databases. An artificial neural network model was constructed for IPF diagnosis based these genes, and this model was validated by the distinctive public datasets with a satisfactory diagnostic accuracy. These six genes identified were significant correlated with lung function, and among them, CDH3 and DIO2 were further determined to be significantly associated with the survival. Putting together, artificial neural network model identified the significant genes to distinguish idiopathic pulmonary fibrosis from healthy people and it is potential for molecular diagnosis of IPF.

Identifying potential biomarkers of idiopathic pulmonary fibrosis through machine learning analysis

Article Open access 02 October 2023

Identification of diagnostic biomarkers for idiopathic pulmonary hypertension with metabolic syndrome by bioinformatics and machine learning

Article Open access 12 January 2023

A 3D-CNN model with CT-based parametric response mapping for classifying COPD subjects

Article Open access 08 January 2021

Introduction

IPF is a chronic progressive interstitial lung disease characterized by developing lung scarring and deterioration of lung function¹. The abnormal extracellular matrix accumulates with the lung tissue and vascular system are repeatedly micro-damaged, and the alveolar structure is destroyed, resulting in a decrease in the lung tissue's ability to exchange gas with the outside^2,3,4. The clinical manifestations are cough and dyspnea, severe cases can lead to respiratory failure. IPF mainly occurs in people at 50 years of age and older, and the incidence increases with age^5,6,7. IPF is widely believed to result from the interaction of genetics, environmental risk, and ageing⁸. Most IPF patients also have multiple complications, such as heart failure, gastroesophageal reflux disease, obstructive apnea, etc.^9,10

The challenge to clinicians is distinguishing IPF from other idiopathic interstitial pneumonias, high-resolution CT of the chest and lung biopsy are advised now, but the risk of surgical lung biopsy is greater for elder pattens¹¹. With the development of high-throughput sequencing technology, the molecular alteration can be observed and the changes at RNA level can be more accurately determined in disease^12,13,14.

In addition, with the development of artificial intelligence technology, machine learning and deep learning models have recently made significant contributions to the development of predictive medicine and modern pathological practice^15,16,17. These models enable people to better interpret high-throughput data, reduce the dependence of disease diagnosis on subjective consciousness, and provide more precise criteria for disease diagnosis^18,19,20. In this study, Gene Expression Ontology (GEO) databases were used to screen the key genes and construct an artificial neural network model for diagnosis of idiopathic pulmonary fibrosis.

Materials and methods

Statement

Our study is based on open-source databases(GEO), there are no ethical issues and other conflicts of interest. which belongs to public databases. The patients involved in the database have been obtained ethical approval. Users can download relevant data for free for research and publish relevant articles.

Data download and processing

We used keywords “idiopathic pulmonary fibrosis”, “pulmonary fibrosis”, and “lung fibrosis” to search for relevant datasets in the GEO database. Specifically, we included datasets in our analysis if they met the following criteria: (1) Adequate sample size. (2) Included both normal and disease groups. (3) RNA was extracted from lung tissue. (4) Adequate evidence for a diagnosis of IPF, such as HRCT. (5) Clinical information was collected in a standardized manner. In GSE47460, these are 582 total subjects, 254 have interstitial lung disease, 220 have COPD, and 108 are controls²¹. GSE110147 lung samples were obtained from the recipients’ organs of 22 patients with IPF, 10 with NSIP (non-specific interstitial pneumonia) and 5 with mixed IPF-NSIP undergoing lung transplantation²². In GSE53845 RNA was extracted directly from lung tissue samples from 40 IPF patients or 8 healthy controls²³. In GSE70866, BAL cells were harvested from a discovery cohort of 62 patients from Freiburg, Germany, and two independent validation cohorts, Siena, Italy (50 patients) and Leuven, Belgium (64 patients)²⁴. The GEOquery package was used to obtain the expression profiles and clinical phenotype data of the microarray datasets GSE47460, GSE53845, GSE110147, GSE32537, and GSE70866. Only IPF samples and normal samples for subsequent analysis were retained, excluding other lung samples. The annotation information of the chip probes of the corresponding platforms was obtained from the GEO database, respectively. During the conversion of the microarray probe ID and gene symbols, multiple probes corresponding to one gene symbol were found. Considering the reliability of the data, 'many-to-one' probe expression levels were used for gene expression levels. GSE47460 database had the largest sample size among the available databases and therefore provided the most possibility to identify differentially expressed genes.

Differential gene expression and enrichment analysis

Principal Component Analysis (PCA) was performed to identify spatial sample separation in the patient cohort using an R package factoextra²⁵. The R software package limma²⁶ was used to achieved differential analysis on 91 control and 122 IPF samples of GSE47460, genes differentially expressed with P values < 0.05 and fold changes > 1.5 or < 2/3 were visualized using an R package heatmap. The metascape tool²⁷ was used to carry out enrichment analysis with DEGs on multiple databases including GO database, KEGG database, Reactome database, Wikipathway database^28,29.

Random forest screens the top signatures

126 DEGs from GSE47460 (93 up-regulated and 33 down-regulated) were used to construct the random forest model (Table 1). The construction of random forest model and the chosen of top signatures were used the methods of Tian¹. The R package pheatmap was used to perform k-means unsupervised clustering of the GSE47460 dataset and visualize.

Table 1. 126 DEGs from GSE47460.

Full size table

Establishment of IPF classification model with artificial neural network

We used the top six signatures expression in another dataset of GSE32537 to construct an artificial neural network model using the R software package neuralnet. Taking the four hidden layers as the model parameters, the IPF disease classification model is constructed through the obtained gene weight information. Five-fold cross-validation were performed by the Caret package, pROC package was used to estimate the value of AUC³⁰.

Additional data verification

On three independent datasets (GSE47460, GSE53845 and GSE110147), the validity of the constructed classification scoring model of IPF disease and normal samples was verified. The clinical data of GSE70866 were used to evaluate this potential of this model to indicate the patient prognosis and survival.

Clinical parameters

Clinical straits of IPF patients and healthy control such as the age, gender, pulmonary function tests (PFT) et al. were obtained in GES47460 dataset. The commonly PFT including forced vital capacity (FVC) (% pred.), FVC (post.), forced expiratory volume in 1 s (FEV1) (%pred.), FEV1(post.), and diffusing capacity of the lung for carbon monoxide (DLCO) (%pred.)³¹ were combined into a single "meta" lung function indicator by R package factoextra. FVC(post) and FEV1(post) refer to the post-bronchodilator forced vital capacity (FVC) and forced expiratory volume in one second (FEV1). The design of this study including the main four-step process was represented by a flow chat (Fig. 1).

Statistical analysis

The R package limma is used for differential expression analysis. To fit the expression data to a linear model and perform empirical Bayes moderation to shrink the standard errors and increase the accuracy of the results. Visualize the results using limma’s built-in plotting functions or by exporting the data and using ggplot2 for visualization. To remove batch effects using the Combat package, the combat() function to adjust for batch effects by specifying the variables that contain the batch information and the variables to be adjusted. OS time and cause of death were obtained and matched to respective patients from the supplemental clinical data available from the GSE70866. Survival time was measured in days starting at diagnosis and ending on the patient’s death or end of the follow-up period. Kaplan–Meier method was used to estimate overall patient survival by genes expression. The high—and low-risk groups were differentiated according to the expression value, with each group containing at least one third of the total sample. The log–rank and Wilcoxon tests were used to compare survival distributions. Correlations were calculated using Spearman’s rank correlation (presented as Spearman rho). The resulting coefficient will range from − 1 to 1, where − 1 indicates a perfect inverse relationship, 0 indicates no relationship, and 1 indicates a perfect direct relationship.

Results

Significantly alteration of the genes and enriched signal pathway in IPF

The design of this study including the main four-step process was represented by a flow chat (Fig. 1). Totally 213 subjects in the GSE47460 dataset, including 91 healthy control and 122 IPF patients, the Bayesian test in the limma package were used to identify DEGs between normal and IPF samples. One hundred and twenty-six significantly differentially expressed proteins (DEPs) (adj.P.Val < 0.05 and a differential expression ratio [IPF/N] > 2 or < 0.5), including 93 up-regulated genes and 33 down-regulated genes were identified and heatmap in Fig. 2A, B. Then the metascape tool was used for pathway enrichment analysis of 126 important DEGs, the thresholds set at a -Log₁₀ (P value) of > 2.5. Of the DEGs upregulated in patients with IPF were enriched in pathways associated with collagen degradation, NABA CORE MATRISOME and lung fibrosis (Fig. 2C), whereas DEGs those involved in cellular response to lipid, regulation of cytokine, positive regulation of cell death (Fig. 2D). These pathways were combined with GO database, KEGG database, Reactome database and Wikipathway database.

Random forest classifies the DEGs between IPF and healthy control

The 126 DEGs were further classified by random forest classifier. In order to further obtain a model with stable error, appropriate parameters are selected by changing the number of decision trees, and finally 500 trees are set as the optimal parameters of the model (Fig. 3A). Nine was determined as the parameter of variable number, the importance of features by calculating the purity of nodes through Gini coefficient method were computed, the top 20 potential indicators were showed in Fig. 3B. Next, DEGs with importance greater than 4 or equal to 4 were screened for further analysis. which are CDH3, ADAMTS14, IL13RA2, HS6ST2, DIO2 and IGFL2 sequentially (Fig. 3B). The association between the top six genes’ expression with the age, gender, smoking history and disease stage, and status were heatmap in Fig. 3C by k-means unsupervised cluster, which indicated that the genes panel can be used to distinguish IPF patients from the control samples. Although older age and male increased susceptibility to IPF³², there was no significant correlation between the expression of the six genes and age or sex (Fig. S1). This suggests that the six genes are not affected by these factors in distinguishing between normal and IPF.

Construction of the artificial neural network model

The convolutional neural network model was constructed using another dataset GSE32537 by the neuralnet package. Before training the neural network model, we need to set important parameters, especially the number of hidden layers and the number of neurons. There were no fixed rules for the setting of these two values, which relied more on experience and constant attempt. After many tests on the number of hidden layers, we found that when the number of hidden layers was 5, the training effect of the model was the best. Six neurons were finally set as model parameters based on the size of the input layer, commonly two-thirds of the input size was recommended. In order to further strengthen the stability of the neural network model, the GSE32537 dataset was randomly divided into the training set and the validation set, the fivefold cross-validation method was used for 5 iterations of optimization. The more important DEGs and their corresponding weight coefficients were learned from the training set. The classification effect of the scoring model was proved by the validation on other datasets, and the classification accuracy of the neural network model on the verification set was recorded each time. The receiver operating characteristic (ROC) curve is used to evaluate the classification performance of the model. The fivefold cross-validation results show that the AUC value of the area under the ROC curve is close to 1 (average AUC ≈ 0.99) (Fig. 4A), which indicating that the classification accuracy of this convolutional neural network model was high.

The training of the whole neural network model was performed in 28,730 steps, In the connection weights between neurons of the network, the positive weights were connected by black lines, the negative weights were connected by gray lines, and the thickness of the lines reflected the value of the weight. The termination condition of neural network training was the absolute partial derivative of the error function was less than 0.01(almost reached threshold), and the output result of the entire training process shown in Fig. 4B.

Model accuracy verification

The trained neural network model was put into three independent datasets of GSE47460, GSE110147 and GSE53845 for verification. The data in the three datasets were standardized before verification. In GSE47460 dataset, the sensitivity was 90%, the specificity was 85%, and the AUC was 0.856 (Fig. 5A). In GSE110147 dataset, the sensitivity was 100%, the specificity was 100%, and the AUC was 1 (Fig. 5C). In GSE53845, the sensitivity was 75%, the specificity was 90%, and the AUC was 0.880 (Fig. 5E). The confusion matrix results of GSE47460, GSE110147 and GSE53845 are shown in Fig. 5B,D and F respectively. These data demonstrated that the accuracy of this model is reliable.

Survival analysis

To further estimate the prognostic effect of the identified candidate genes in IPF, the complete dataset (GSE70866) of RNA-seq samples (bronchoalveolar lavage fluid) with follow-up comprised 194 specimens from IPF patients (n = 176) and normal controls (n = 18). Cox proportional hazards regression model and Kaplan–Meier method (product-limit method) were used to calculate the correlation between gene expression and survival status. Univariate COX results showed that CDH3 was a potential prognostic marker (HR = 1.3, pvalue = 0.0013, Fig. S2). Of the six signatures, CDH3, ADAMTS14 and DIO2 showed a different significant association with overall survival in IPF (Fig. 6A–F). Patients with high expression of CDH3 and ADAMTS14 had a poor prognosis (Fig. 6A, B), while those with high expression of DIO2 had a good prognosis (Fig. 6E).

Six signatures correlation analysis with clinical features

The dataset (GSE47460) including the RNA-seq counts and clinical data from IPF patients was used to yielded the global correlation network heatmap shown in Fig. 7A. Quantification of multiple combinations of clinical lung function parameters into a single "meta" lung function measure by principal component analysis. Next, we performed linear multivariate regression analysis the mRNA expression levels associated with the meta lung function variable. CDH3, ADAMTS14, IL13RA2, HS6ST2, DIO2 and IGFL2 are positive correlated to lung function with R value at about 0.6 (Fig. 7B).

Discussion

In this study, we calculated differentially expressed genes (DEGs) related to idiopathic pulmonary fibrosis (IPF), and obtained six important candidate DEGs using a random forest classifier. we used a neural network model to determine the predicted weights of related genes and construct a classification model score for IPF. We then evaluated the classification efficiency of the model score in three independent sample datasets¹⁷. We found that the AUC efficiency of our model, called neura IPF, was excellent. However, the lack of gene data for IPF in the GEO database meant that the genetic characteristics of IPF were not included in the construction of the diagnostic model, potentially compromising its diagnostic effectiveness for IPF.

Of these six genes, a major function of DIO2 is to convert T4 to active T3 so that TH (Thyroid hormone) can be activated. IPF, diabetic nephropathy, and myocardial infarction have all been associated with a poor prognosis with hypothyroidism^{21,33,34,35,36}. The expression and activity of DIO2 are increased in the lungs of patients with IPF and are correlated with disease severity. DIO2 mainly localizes to AECs, which are thought to play a central role in the cycle of injury and repair that is characteristic of IPF²¹. DIO2 is significantly upregulated in the fibrotic state, but this upregulation is thought to be protective. This was further confirmed in our prognostic analysis²¹.

Insulin-like growth factors (IGFs) and their binding proteins (IGFBPs) play a critical role in pulmonary fibrosis development and progression³⁷. It has previously been shown that IGFBP2 and IGF-like family member 2 (IGFL2) are upregulated in SSc-PF and IPF³⁸. IGFL2 is secreted form in the ECM, its expression is also increased in IPF. IGFL2 expression levels were significantly reduced in human skin fibroblasts aged with mitochondrial function, suggesting that mitochondrial physiological processes are associated with IGFL2³⁸. IGFL2 play critical roles in cellular energy metabolism and in growth and development, especially prenatal growth. However, there has been no relevant research exploring its role in the pathogenesis of IPF.

Classical cadherins are the principle adhesive proteins at cohesive intercellular junctions and are essential proteins for morphogenesis and tissue homeostasis³⁹. P-cadherin is a calcium dependent cell–cell adhesion glycoprotein, which has a crucial role in the conservation of the structural integrity of epithelial tissues. Like other members of the cadherin family, P-cadherin (CDH3) regulates several cellular homeostatic processes that participate in embryonic development and maintain adult tissue architecture, being important for cell differentiation, cell shape, cell polarity, growth, and migration⁴⁰. It is worth noting that DIO2 and IGFL2 also play an important role in growth and development. These three genes play a major role in development, regeneration, morphogenesis and so on. This highlights the prominent role of tissue formation and development in the pathogenesis of fibrosis.

Interleukin (IL)-13 has been shown to play a role in several inflammatory and fibrotic diseases⁴¹. IL-13 modulates its effector functions via a complex receptor system that includes the IL-4 receptor (R) α, IL-13Rα1, and the IL-13Rα2. IL-13Rα1 binds IL-13 with low affinity, yet, when it forms a complex with IL-4α, it binds with much higher affinity, inducing the effector functions of IL-13. IL-13Rα2 binds IL-13 with high affinity but has a short cytoplasmic tail and has been shown to act as a nonsignaling decoy receptor. Transfection of fibroblasts and epithelial cells with IL-13Rα2 inhibited the IL-13 induction of soluble collagen, TGF-β, and CCL17. Adenoviral overexpression of IL-13Rα2 in the lung reduced bleomycin-induced fibrosis⁴¹.

Heparan sulfate (HS) proteoglycan is a glycosaminoglycan widely distributed on the surface of animal cells and extracellular matrix, and regulates cell growth, differentiation, adhesion, and migration by interacting with various ligands complement. Compared with normal lung, IPF lung showed significantly increased HS6-O-sulfuration and HS6-O-sulftransferase 1 and 2 (HS6ST1/2) mRNA overexpression. Immunohistochemistry showed that HS6ST2 was specifically expressed in bronchial epithelial cells, including IPF lung honeycomb cyst lining cells⁴². Both IL13RA2 and HS6ST2 highlight the importance of ligand receptor interactions, highlighting that cell–cell interactions in the microenvironment may be a major cause of fibrosis progression.

ADAMTS14 gene encodes a member of the ADAMTS (a disintegrin and metalloproteinase with thrombospondin motif) protein family. As reported by previously studies, the ADAMTS14 gene was discovered to play critical roles in the progress of inflammation and the immune system, through a crosstalk of the TGF-β pathways and mesenchymal cells⁴³. ADAMTS14 gene polymorphism was associated with knee osteoarthritis⁴⁴ or the osteoarthritis of the temporomandibular joint in Chinese Han women⁴⁵. But so far, no studies have been able to clarify ADAMTS14's role in pulmonary fibrosis. Like HS6ST2, ADAMTS14 is also mainly present in extracellular matrix, which may suggest that the composition or structure of extracellular matrix is also an important pathological factor that should not be ignored in pulmonary fibrosis.

This model has made significant progress compared to other models in previous studies³⁷. This progress is primarily reflected in the use of fewer feature variables, the validation of the model using a large-scale dataset, and its strong predictive performance. There still are some limitations in this study, the sample sizes of the cohorts are still relatively big enoughwhich may not be sufficient to represent the overall population precisely and could affect the generalizability of diagnostic model. Additionally, this diagnostic model is based on preliminary findings and short of the sound experimental verification to support its reliability. As such, given these limitations, this diagnostic model requires further investigation to determine whether it can be used in clinical decision-making.

In conclusion, we constructed an artificial neural network model that demonstrated robust performance across multiple cohorts. We assessed the relationship between each gene of the model and demographic variables. The majority of the genes showed no association with age or gender, but all presented close correlation with clinical features. CDH3, ADAMTS14 and DIO2 were found to be related to prognosis. These results are useful to prioritize targeting these indicators for diagnosis and drug development in future.

Data availability

The datasets generated and analysed during the current study are available in the GEO repository, [https://www.ncbi.nlm.nih.gov/geo/].

References

Xia, Y., Lei, C., Yang, D. & Luo, H. Construction and validation of a bronchoalveolar lavage cell-associated gene signature for prognosis prediction in idiopathic pulmonary fibrosis. Int. Immunopharmacol. 92, 107369. https://doi.org/10.1016/j.intimp.2021.107369 (2021).
Article CAS Google Scholar
Hogan, B. L. et al. Repair and regeneration of the respiratory system: Complexity, plasticity, and mechanisms of lung stem cell function. Cell Stem Cell 15, 123–138. https://doi.org/10.1016/j.stem.2014.07.012 (2014).
Article CAS Google Scholar
Rosmark, O. et al. Quantifying extracellular matrix turnover in human lung scaffold cultures. Sci. Rep. 8, 5409. https://doi.org/10.1038/s41598-018-23702-x (2018).
Article ADS CAS Google Scholar
Zhou, Y. et al. Extracellular matrix in lung development, homeostasis and disease. Matrix Biol. 73, 77–104. https://doi.org/10.1016/j.matbio.2018.03.005 (2018).
Article CAS Google Scholar
Glass, D. S. et al. Idiopathic pulmonary fibrosis: Molecular mechanisms and potential treatment approaches. Respir. Investig. 58, 320–335. https://doi.org/10.1016/j.resinv.2020.04.002 (2020).
Article Google Scholar
López-Muñiz-Ballesteros, B. et al. Sex differences in the incidence and outcomes of patients hospitalized by idiopathic pulmonary fibrosis (IPF) in Spain from 2016 to 2019. J. Clin. Med. https://doi.org/10.3390/jcm10163474 (2021).
Article Google Scholar
Zhang, Y., Yu, G., Kaminski, N. & Lee, P. J. PINK1 mediates the protective effects of thyroid hormone T3 in hyperoxia-induced lung injury. Am. J. Physiol. Lung Cell Mol. Physiol. 320, L1118-l1125. https://doi.org/10.1152/ajplung.00598.2020 (2021).
Article CAS Google Scholar
Stancil, I. T. et al. Pulmonary fibrosis distal airway epithelia are dynamically and structurally dysfunctional. Nat. Commun. 12, 4566. https://doi.org/10.1038/s41467-021-24853-8 (2021).
Article ADS CAS Google Scholar
Smith, R. A., Sathananthan, M., Kaur, P. & Jeganathan, N. The characteristics and outcomes of patients with idiopathic pulmonary fibrosis admitted to the ICU with acute respiratory failure. Heart Lung 50, 192–196. https://doi.org/10.1016/j.hrtlng.2020.05.006 (2021).
Article Google Scholar
Tudorache, V. et al. Impact of moderate to severe obstructive sleep apnea on the cognition in idiopathic pulmonary fibrosis. PLoS ONE 14, e0211455. https://doi.org/10.1371/journal.pone.0211455 (2019).
Article CAS Google Scholar
Mermigkis, C. et al. Obstructive sleep apnea should be treated in patients with idiopathic pulmonary fibrosis. Sleep Breath 19, 385–391. https://doi.org/10.1007/s11325-014-1033-6 (2015).
Article Google Scholar
Lynch, D. A. et al. Diagnostic criteria for idiopathic pulmonary fibrosis: A fleischner society white paper. Lancet Respir. Med. 6, 138–153. https://doi.org/10.1016/s2213-2600(17)30433-2 (2018).
Article Google Scholar
Pareek, C. S., Smoczynski, R. & Tretyn, A. Sequencing technologies and genome sequencing. J. Appl. Genet. 52, 413–435. https://doi.org/10.1007/s13353-011-0057-x (2011).
Article CAS Google Scholar
Wang, Y., Zhao, Y., Bollas, A., Wang, Y. & Au, K. F. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 39, 1348–1365. https://doi.org/10.1038/s41587-021-01108-x (2021).
Article CAS Google Scholar
Chan, H. P., Samala, R. K., Hadjiiski, L. M. & Zhou, C. Deep learning in medical image analysis. Adv. Exp. Med. Biol. 1213, 3–21. https://doi.org/10.1007/978-3-030-33128-3_1 (2020).
Article Google Scholar
Deo, R. C. Machine learning in medicine. Circulation 132, 1920–1930. https://doi.org/10.1161/circulationaha.115.001593 (2015).
Article Google Scholar
Gupta, R. et al. Artificial intelligence to deep learning: Machine intelligence approach for drug discovery. Mol. Divers 25, 1315–1360. https://doi.org/10.1007/s11030-021-10217-3 (2021).
Article CAS Google Scholar
Haider, R. Z., Ujjan, I. U. & Shamsi, T. S. Cell population data-driven acute promyelocytic leukemia flagging through artificial neural network predictive modeling. Transl. Oncol. 13, 11–16. https://doi.org/10.1016/j.tranon.2019.09.009 (2020).
Article Google Scholar
Mäkelä, K. et al. Artificial intelligence identifies inflammation and confirms fibroblast foci as prognostic tissue biomarkers in idiopathic pulmonary fibrosis. Hum. Pathol. 107, 58–68. https://doi.org/10.1016/j.humpath.2020.10.008 (2021).
Article CAS Google Scholar
Toro-Domínguez, D. et al. ImaGEO: Integrative gene expression meta-analysis from GEO database. Bioinformatics 35, 880–882. https://doi.org/10.1093/bioinformatics/bty721 (2019).
Article CAS Google Scholar
Yu, G. et al. Thyroid hormone inhibits lung fibrosis in mice by improving epithelial mitochondrial function. Nat. Med. 24, 39–49. https://doi.org/10.1038/nm.4447 (2018).
Article CAS Google Scholar
Furusawa, H. et al. Chronic hypersensitivity pneumonitis, an interstitial lung disease with distinct molecular signatures. Am. J. Respir. Crit. Care Med. 202, 1430–1444. https://doi.org/10.1164/rccm.202001-0134OC (2020).
Article CAS Google Scholar
Prasse, A. et al. BAL cell gene expression is indicative of outcome and airway basal cell involvement in idiopathic pulmonary fibrosis. Am. J. Respir. Crit. Care Med. 199, 622–630. https://doi.org/10.1164/rccm.201712-2551OC (2019).
Article CAS Google Scholar
Davis, S. & Meltzer, P. S. GEOquery: A bridge between the gene expression omnibus (GEO) and BioConductor. Bioinformatics 23, 1846–1847. https://doi.org/10.1093/bioinformatics/btm254 (2007).
Article CAS Google Scholar
Garcia-Rudolph, A., Garcia-Molina, A., Opisso, E. & Tormos Muñoz, J. Personalized web-based cognitive rehabilitation treatments for patients with traumatic brain injury: Cluster analysis. JMIR Med. Inform. 8, e16077. https://doi.org/10.2196/16077 (2020).
Article Google Scholar
Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucl. Acids Res. 43, e47. https://doi.org/10.1093/nar/gkv007 (2015).
Article CAS Google Scholar
Zhou, Y. et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat. Commun. 10, 1523. https://doi.org/10.1038/s41467-019-09234-6 (2019).
Article ADS CAS Google Scholar
Alterovitz, G., Xiang, M., Mohan, M. & Ramoni, M. F. GO PaD: The gene ontology partition database. Nucl. Acids Res. 35, D322-327. https://doi.org/10.1093/nar/gkl799 (2007).
Article CAS Google Scholar
Kanehisa, M., Furumichi, M., Tanabe, M., Sato, Y. & Morishima, K. KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucl. Acids Res. 45, D353-d361. https://doi.org/10.1093/nar/gkw1092 (2017).
Article CAS Google Scholar
Robin, X. et al. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf. 12, 77. https://doi.org/10.1186/1471-2105-12-77 (2011).
Article Google Scholar
Hyams, C., Hettle, D., Bibby, A., Adamali, H. A. & Barratt, S. L. Utility of illness severity scores to predict mortality in patients hospitalized with respiratory deterioration of idiopathic pulmonary fibrosis. QJM 114, 559–567. https://doi.org/10.1093/qjmed/hcaa214 (2021).
Article CAS Google Scholar
Richeldi, L., Collard, H. R. & Jones, M. G. Idiopathic pulmonary fibrosis. The Lancet 389, 1941–1952. https://doi.org/10.1016/s0140-6736(17)30866-8 (2017).
Article Google Scholar
Fliers, E., Bianco, A. C., Langouche, L. & Boelen, A. Thyroid function in critically ill patients. Lancet Diabetes Endocrinol. 3, 816–825. https://doi.org/10.1016/s2213-8587(15)00225-9 (2015).
Article CAS Google Scholar
Friberg, L., Werner, S., Eggertsen, G. & Ahnve, S. Rapid down-regulation of thyroid hormones in acute myocardial infarction: Is it cardioprotective in patients with angina?. Arch Intern Med 162, 1388–1394. https://doi.org/10.1001/archinte.162.12.1388 (2002).
Article CAS Google Scholar
Han, C. et al. Subclinical hypothyroidism and type 2 diabetes: A systematic review and meta-analysis. PLoS ONE 10, e0135233. https://doi.org/10.1371/journal.pone.0135233 (2015).
Article CAS Google Scholar
Oldham, J. M. et al. Thyroid disease is prevalent and predicts survival in patients with idiopathic pulmonary fibrosis. Chest 148, 692–700. https://doi.org/10.1378/chest.14-2714 (2015).
Article Google Scholar
Dai, X. et al. Identification of diagnostic gene biomarkers related to immune infiltration in patients with idiopathic pulmonary fibrosis based on bioinformatics strategies. Front. Med. (Lausanne) 9, 959010. https://doi.org/10.3389/fmed.2022.959010 (2022).
Article Google Scholar
Renaud, L., da Silveira, W. A., Takamura, N., Hardiman, G. & Feghali-Bostwick, C. Prominence of IL6, IGF, TLR, and bioenergetics pathway perturbation in lung tissues of scleroderma patients with pulmonary fibrosis. Front. Immunol. 11, 383. https://doi.org/10.3389/fimmu.2020.00383 (2020).
Article CAS Google Scholar
To, S. & Agarwal, S. K. Macrophages and cadherins in fibrosis and systemic sclerosis. Curr. Opin. Rheumatol. 31, 582–588. https://doi.org/10.1097/bor.0000000000000657 (2019).
Article CAS Google Scholar
Vieira, A. F. & Paredes, J. P-cadherin and the journey to cancer metastasis. Mol Cancer 14, 178. https://doi.org/10.1186/s12943-015-0448-4 (2015).
Article CAS Google Scholar
Lumsden, R. V. et al. Modulation of pulmonary fibrosis by IL-13Rα2. Am. J. Physiol. Lung Cell Mol. Physiol. 308, L710-718. https://doi.org/10.1152/ajplung.00120.2014 (2015).
Article CAS Google Scholar
Lu, J., Auduong, L., White, E. S. & Yue, X. Up-regulation of heparan sulfate 6-O-sulfation in idiopathic pulmonary fibrosis. Am. J. Respir. Cell Mol. Biol. 50, 106–114. https://doi.org/10.1165/rcmb.2013-0204OC (2014).
Article CAS Google Scholar
Dupont, L. et al. Spontaneous atopic dermatitis due to immune dysregulation in mice lacking Adamts2 and 14. Matrix Biol. 70, 140–157. https://doi.org/10.1016/j.matbio.2018.04.002 (2018).
Article CAS Google Scholar
Ma, S., Ouyang, C. & Ren, S. Relationship between ADAMTS14/rs4747096 gene polymorphism and knee osteoarthritis in Chinese population. Biosci. Rep. https://doi.org/10.1042/bsr20181413 (2018).
Wang, D. D., Gan, Y. H., Ma, X. C. & Meng, J. H. Association between ADAMTS14 gene polymorphism and the temporomandibular joint osteoarthritis in Chinese Han females. Beijing Da Xue Xue Bao Yi Xue Ban 50, 279–283 (2018).
CAS Google Scholar
Baskerville, C. L., Chockalingham, S., Harris, P. A. & Bailey, S. R. The effect of insulin on equine lamellar basal epithelial cells mediated by the insulin-like growth factor-1 receptor. PeerJ 6, e5945. https://doi.org/10.7717/peerj.5945 (2018).
Article CAS Google Scholar
Zhang, G. et al. Induced pluripotent stem cell consensus genes: Implication for the risk of tumorigenesis and cancers in induced pluripotent stem cell therapy. Stem Cells Dev. 21, 955–964. https://doi.org/10.1089/scd.2011.0649 (2012).
Article CAS Google Scholar
Leduc, C. et al. In vivo N-terminomics highlights novel functions of ADAMTS2 and ADAMTS14 in skin collagen matrix building. Front. Mol. Biosci. 8, 643178. https://doi.org/10.3389/fmolb.2021.643178 (2021).
Article CAS Google Scholar

Download references

Funding

The work was supported by a Startup package (GY), Henan Normal University, the 111 Project “State Innovation Base for Pulmonary Fibrosis”, the Ministry of Science and Technology, PR China, 2019YFE0119500, and Henan Province Science and Technology Project, 212102310894.

Author information

Authors and Affiliations

State Key Laboratory of Cell Differentiation and Regulation, Henan International Joint Laboratory of Pulmonary Fibrosis, Henan Center for Outstanding Overseas Scientists of Pulmonary Fibrosis, College of Life Science, Henan Normal University, 46 Jianshe Road, Xinxiang, 453007, Henan, China
Zhongzheng Li, Shenghui Wang, Huabin Zhao, Peishuo Yan, Hongmei Yuan, Mengxia Zhao, Ruyan Wan, Guoying Yu & Lan Wang

Authors

Zhongzheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Shenghui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Huabin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Peishuo Yan
View author publications
You can also search for this author in PubMed Google Scholar
Hongmei Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Mengxia Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ruyan Wan
View author publications
You can also search for this author in PubMed Google Scholar
Guoying Yu
View author publications
You can also search for this author in PubMed Google Scholar
Lan Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.Y., L.W. designed the research work, Z.L., S.W. and H.Z. performed the statistical analysis and manuscript preparation, Y.P., H.Y. managed the clinical information and revising it critically for intellectual content, M.Z., R.W. are responsible for investigating and resolving data integrity and image display content accuracy. All authors have read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Guoying Yu or Lan Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Z., Wang, S., Zhao, H. et al. Artificial neural network identified the significant genes to distinguish Idiopathic pulmonary fibrosis. Sci Rep 13, 1225 (2023). https://doi.org/10.1038/s41598-023-28536-w

Download citation

Received: 28 September 2022
Accepted: 19 January 2023
Published: 21 January 2023
DOI: https://doi.org/10.1038/s41598-023-28536-w

This article is cited by

Identifying a survival-associated cell type based on multi-level transcriptome analysis in idiopathic pulmonary fibrosis
- Fei Xu
- Yun Tong
- Qingkang Meng
Respiratory Research (2024)
Explicate molecular landscape of combined pulmonary fibrosis and emphysema through explainable artificial intelligence: a comprehensive analysis of ILD and COPD interactions using RNA from whole lung homogenates
- Nakul Tanwar
- Yasha Hasija
Medical & Biological Engineering & Computing (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.