Introduction

Dilated cardiomyopathy (DCM), which manifests clinically as ventricular dilatation and impaired progressive systolic diastole, is one of the most prevalent disease worldwide. It has a heterogeneous etiology, with viral infections, inflammatory reactions, genetic factors1, etc. It can also cause arrhythmias and atrioventricular block, resulting in sudden cardiac death and heart failure. These circumstances often occur with a poor prognosis2. It is reported that men with DCM have a higher mortality rate than women patients3. Endomyocardial biopsy (EMB) is the gold standard for the diagnosis of myocarditis and DCM. However, in clinical practice, DCM is not diagnosed and treated promptly, considering the high cardiac complications of performing EMB and treatment limitation4. Therefore, developing innovative, non-invasive biomarkers for DCM is essential to improve diagnostic accuracy.

Autophagy is a cellular self-degradation process that removes errant proteins and damaged organelles. It also eliminates intracellular pathogens and is often considered a survival mechanism5. Numerous studies have shown that autophagy genes are involved in various phenotypes and human diseases6, including neurodegenerative diseases 7, liver diseases8, muscle diseases9, cancer 10, and cardiac diseases11. Evidence shows that autophagy is essential in maintaining cardiomyocyte homeostasis12 and regulating the prognostic efficacy of cardiac diseases. In addition, an increasing number of animal models and clinical studies have reported the involvement of autophagy-related genes (ARGs) in the ventricular remodeling process, which is related to the mechanism of action of DCM13,14. However, ARGs' diagnostic performance and prognostic efficacy in DCM have not been fully elucidated.

In this study, we downloaded gene expression profile data of DCM from the Gene Expression Omnibus (GEO) database, applied bioinformatics to search for AR-DEGs in DCM, and visualized the correlation between genes. Subsequently, gene enrichment analysis was performed on tagging gene functions and exploring pathogenesis. Machine learning algorithms were afterward executed to filter and identify diagnostic biomarkers of DCM. In addition, based on the diagnostic biomarkers of DCM, transcription factors gene regulatory network and gene-targeted drugs were predicted to provide ideas for clinical precision therapy and experimental studies. The DisGeNET database was used for association analysis of DCM with other related diseases to provide a transcriptomic basis for further investigation of the potential pathogenesis of the disease.

The flow chart of this study was shown in Fig. 1.

Figure 1
figure 1

Workflow diagram of the current study. GO, go ontology; TFs, transcription factors.

Results

Identification of autophagy-related differentially expressed genes (AR-DEGs) for dilated cardiomyopathy (DCM)

The GSE4172 dataset was used to screen for DEGs in DCM. Based on the threshold set to |log2FoldChange|≥ 0.8, p-value < 0.05, 770 DEGs were acquired, containing 366 up-regulated genes and 404 down-regulated genes. In addition, the heatmap (Fig. 2a) showed the expression of the top 60 DEGs and the asymptotic volcano plot (Fig. 2b) showed the distribution of DEGs.

Figure 2
figure 2

DEGs differential analysis of GSE4172 dataset. (a) Heatmap of DEGs in GSE4172 dataset (n = 60, p < 0.05, |log2 FoldChange|≥ 0.8). (b) Asymptotic volcano map of gene expression in the GSE4172 dataset. The two vertical lines indicated gene expression ploidy changes > 0.8 and < -0.8, respectively, and the horizontal line indicated a p value of 0.05. The color of the dots represented the level of the p value. The top 10 significantly expressed genes among the DEGs were labeled on the graph.

803 ARGs were obtained through two autophagy-related gene databases, HADb and HAMdb. The Venn diagram obtained by the Omicshare online tool demonstrated 23 AR-DEGs of DCM (ADIPOQ, TRIM17, PPFIA4, CAPN12, PLEKHF1, RCAN1, RAB12, CXCR4, HSPG2, EIF4EBP1, HSF1, ZC3H12A, PRKAB1, TRIM65, ARSA, GABARAPL1, DICER1, VDAC1, CHMP4B, AGTR1, BAD, TFEB, AP2M1) (Fig. 3). The relevant functions of 23 AR-DEGs were shown in Supplementary Table S1.

Figure 3
figure 3

AR-DEGs were shown by Venn diagram. 366 DEGs-Up and 404 DEGs-Down were intersected with 232 and 796 autophagy-associated genes from the HADb and HAMDb autophagy gene pools, with 23 genes being identical. The number of intersecting genes was marked in the red box. DEGs-Up, differentially expressed up-regulated genes; DEGs-Down, differentially expressed down-regulated genes.

Correlation matrix analysis of 23 AR-DEGs and the expression of these genes in the disease and control groups were demonstrated in Fig. 4a. The absolute values of relative coefficients between genes exceeding 0.5 were considered to be of typical significance and were labeled in Fig. 4b. Moreover, some genes showed a strong association with others.

Figure 4
figure 4

23 AR-DEGs in dilated cardiomyopathy (DCM) group and control group and their correlation. (a) Box plot of the expression levels of 23 DEGs-Down in the DCM and control groups. The blue box plots above the corresponding gene names indicated expression in control groups, whereas the red box plots indicated expression in DCM groups. (b) Correlation heatmap of 23 AR-DEGs. The color within the circle shape and the magnitude of the correlation value represented the strength of the correlation; red represented positive correlation and blue represented negative correlation. The darker the color, the larger the absolute value of the correlation value represented a stronger correlation.

Gene Ontology (GO), Pathway Enrichment Analysis

GO analysis and multiple databases (KEGG, Wikipathway, Bioplanet, Reactome) pathway analysis were implemented through the Enrichr database. Three categories of GO analysis were obtained by clustering AR-DEGs of DCM, namely biological process (BP), chromosomal location (CC), and molecular function (MF). The top ten terms of each category were predicted in Table 1.

Table 1 GO category, GO pathways, corresponding p-values, and AR-DEGs.

Based on the number of gene interactions, BP was mainly focused on the regulation of autophagy, positive regulation of autophagy, positive regulation of the cellular catabolic process, and macroautophagy. For cellular components, lysosome and lytic vacuole were significantly associated with autophagy-related differential genes, ultimately pointing to inflammatory cardiomyopathy in response to the human heart. Molecular functional studies revealed that AR-DEGs were most concentrated in low-density lipoprotein particle receptor binding. A similar concentration level could be found in lipoprotein particle receptor blinding and endoribonuclease activity.

Notably, the results of the pathway analysis in this study were joint (Table 2). Through the previously set database, the Longevity regulating pathway, Macroautophagy, PI3K-AKT-mTOR signaling pathway, therapeutic opportunities, and AMPK signaling were identified as the top pathways.

Table 2 Top 10 pathways from KEGG, BioPlanet, Reactome, WikiPathways databases and their corresponding p-values and genes for AR-DEGs.

A comparison of GO terms was presented in Fig. 5a. Figure 5b provided pathway analysis from multiple databases.

Figure 5
figure 5

(a) Identification results of GO terms related to biological processes, cellular components and molecular functions based on gene enrichment analysis. Higher p value indicated a higher number of genes involved in this GO ontology. (b) Identification of results from combined multi-pathway analysis by KEGG, WikiPathways, BioPlanet and Reactome.

Machine learning screened for autophagy-related biomarkers of DCM

The expression matrices of 23 AR-DEGs were used to construct the best diagnostic model using both least absolute shrinkage selection operator (LASSO) regression and support vector machine recursive feature elimination (SVM-RFE) algorithms to finally obtain potential diagnostic biomarkers of DCM. The LASSO regression algorithm narrowed down the range of AR-DEGs of DCM and obtained 9 variables as potential diagnostic biomarkers for DCM (Fig. 6a). The SVM-RFE algorithm was implemented to identify 13 signature genes (Fig. 6b).

Figure 6
figure 6

Screening of diagnostic biomarkers for DCM by machine learning algorithms. (a) Screening of optimal genes by LASSO regression model. (b) Plot of the best gene selected by SVM-RFE algorithm. (c) Venn diagram embodying the eight diagnostic biomarkers common to both machine learning algorithms. LASSO, least absolute shrinkage and selection operator; SVM-RFE, support vector machine-recursive feature elimination.

Finally, 8 overlapping genes (PLEKHF1, HSPG2, HSF1, TRIM65, DICER1, VDAC1, BAD, TFEB) were obtained (Fig. 6c).

Construction of transcription factor (TF)-gene regulatory network

Based on the JASPAR TF binding site profile database, TF-gene regulatory network was constructed using the NetworkAnalyst 3.0 platform. The TF-gene regulatory network was constructed based on 8 diagnostic biomarkers of DCM (PLEKHF1, HSPG2, HSF1, TRIM65, DICER1, VDAC1, BAD, TFEB) (Figure). The network included 46 loci with 76 edges. In detail, these loci are combined by 8 seed genes and 38 transcription factors. TFEB was regulated by 19 transcription factors and DICER1 was regulated by 15 transcription factors. Figure 7 showed the TF-gene regulatory network.

Figure 7
figure 7

Network of transcription factors interacting with 8 potential diagnostic biomarkers. The highlighted orange nodes indicated the 8 potential diagnostic biomarkers and the other pink nodes indicated transcription factors. The network consisted of 8 core genes, 46 nodes and 76 edges.

Gene targeted drugs screening

Based on the DSigNET drug database, the Enrichr (https://maayanlab.cloud/Enrichr/) web platform was used to identify drug molecules associated with 8 diagnostic biomarkers for DCM. Gene-targeted drugs were collected based on P-values. The combined score is proportional to the gene-drug association when the p-value is satisfied. The analysis showed that Melatonin CTD 00006260 and metformin CTD 00006282 had high gene binding to DCM. Table 3 listed the top 10 drugs for DCM by the DSigDB database.

Table 3 Drugs of choice for dilated cardiomyopathy.

Genetic disease association analysis

Gene list enrichments were identified in the DisGeNET dataset. All genes in the genome had been used as the enrichment background. Terms with a p-value < 0.01, a minimum count of 3, and an enrichment factor > 1.5 (the enrichment factor is the ratio between the observed counts and the counts expected by chance) were collected and grouped into clusters based on their membership similarities. The top 10 enriched clusters were shown in the Fig. 8. The algorithm used here was the same for pathway and process enrichment analysis. Cyst, Uveal melanoma, Diabetes Mellitus, Experimental, Adult T-Cell Lymphoma/Leukemia, and Amyloidosis were identified as top 5 comorbidities of DCM.

Figure 8
figure 8

The process of identifying comorbidities in DCM.

Discussion

It is well known that DCM is impaired ventricular dilation and systolic diastole, leading to arrhythmias and heart failure in severe cases. Unfortunately, with the low prevalence of EMB, most patients with early-stage cardiomyopathy are not effectively treated. The gold standard for myocarditis and DCM is often poor prognosis in cases of concomitant arrhythmias and heart failure2. Therefore, early diagnosis, precise evaluation, and therapeutic management of patients with DCM appear crucial. Hence, researchers are increasingly looking for diagnostic markers of DCM. Meanwhile, the molecular pathogenesis of DCM, viral infections, and other factors in disease progression and prognosis are still incompletely studied15.

It is well known that autophagy plays an important role in cancer, neurodegenerative diseases, inflammatory diseases, and cardiac diseases5. Among these, autophagy mechanisms are increasingly studied in cardiac diseases, and autophagy plays a crucial role in maintaining typical cardiac structure, function, and therapy16,17. Two key autophagy-related molecules, mTOR and Beclin1, had been shown to play a regulatory role in myocardial ischemia–reperfusion injury17. Among them, mTOR is involved in the PI3K and Akt pathway to regulate myocardial ischemia/reperfusion-induced apoptosis and autophagy18. In addition, Beclin1 exerts a positive impact on myocardial ischemia and an adverse effect during myocardial ischemia/reperfusion19. Currently, studies on the role of autophagy in cardiomyopathy-related diseases are increasing13, and research has shown that damage to the autophagic lysosomal pathway (ALP) and activation of inflammatory vesicles were important factors contributing to DCM14. Improved left ventricular size and cardiac function in mice with DCM deficient in NCOA4 (nuclear receptor coactivator 4, an autophagy-associated gene that mediates ferritin degradation) inhibit free ferrous iron overload and increased lipid peroxidation20. Carolina et al.21 found that autophagy-related genes, such as CALCOCO2 and NRBP2, the former of which regulates the expression of the latter, adversely affected left ventricular function parameters in patients with DCM.

In recent years, the exploration of the diagnostic and prognostic role of genetic biomarkers targeting DCM has been on the rise. For example, CYR61 and APN were identified as two target genes for DCM by gene expression profiling studies in the GSE4172 dataset raw data22. It had been shown that RBM20 induced aberrant TNN splicing as a determinant of DCM and increased the risk of arrhythmias23. In previous bioinformatics studies, genes or transcription factors such as CTGF, POSTN, CORIN, and FIGF were closely associated with DCM24. However, few studies have been conducted on the value of autophagy-related genes in diagnosing DCM.

To the best of our knowledge, this study is the first to investigate the diagnostic role of ARGs in DCM by mining the GEO database and integrating machine learning and bioinformatics approaches. We used the NetworkAnalyst 3.0 platform to deeply analyze the GSE4172 dataset, which compares gene expression in DCM with healthy samples infected by the fine virus B19. Using differential analysis, we obtained 770 DEGs and combined them with the gene set from the autophagy databases to obtain 23 AR-DEGs of DCM. Finally, by machine learning methods such as LASSO regression and SVM-RFE, we obtained 8 (PLEKHF1, HSPG2, HSF1, TRIM65, DICER1, VDAC1, BAD, TFEB) diagnostic biomarkers of DCM. Previous studies showed significant relevance regarding DCM or cardiomyocyte remodeling in the above eight genes.

PLEKHF1 (Pleckstrin homology and FYVE domain containing 1) is located in the lysosome and plays a vital role in caspase-independent apoptosis, a process involved in autophagy25. In previous studies, PLEKHF1 is a susceptibility gene for several diseases. For example, Qi et al., identified PLEKHF1 as a potential biomarker for diabetic atherosclerosis26; also, PLEKHF1 was shown to be a potential biomarker for chronic graft-versus-host disease, the accuracy of which was confirmed by several clinical independent validation studies27. In addition, it had been shown that levosimendan ameliorated myocardial infarction and ventricular remodeling in diabetic rats, and the expression of the gene Plekhf1 received regulation by levosimendan, showing the potential of Plekhf1 as a target gene for myocardial infarction and diabetic cardiomyopathy28.

HSPG2 (Heparan sulfate proteoglycan 2) plays an important role in cancer growth, development, and metastasis29. Previous studies had shown that HSPG2 was identified in key cardiac-related regions controlled by chromosome 1p3630, and related studies had demonstrated that chromosome 1p36 deletion was responsible for cardiovascular malformations and cardiomyopathy31, suggesting an important role for HSPG2 in the pathogenesis and prognostic impact of cardiomyopathy30. In addition, HSPG2 also plays an independent predictive role in a variety of diseases. For example, HSPG2 was overexpressed in acute myeloid leukemia and can be used as a prognostic biomarker32. Recent studies had shown that HSPG2 deficiency was a risk factor for aortic coarctation33.

HSF1 (Heat shock transcription factor 1) is a significant heat stress response factor that plays an important role in inhibiting apoptosis and pathological remodeling of cardiomyocytes and is a protective factor for cardiomyocytes. In a previous quantitative transcriptomic analysis, HSF1 was found to be significantly enriched in cardiomyocytes34. It had been shown that HSF1 could be isolated by the death trap method, preventing hydrogen peroxide-induced cardiomyocyte death. It was found that overexpression of HSG1 in transgenic mice reduced ischemia–reperfusion-induced cardiomyocyte injury35. In the present study, HSF1 expression was lower in the DCM group compared with the healthy control group, which was also consistent with the findings of previous studies. In addition, it had been shown that overexpression of HSF1 in BAG mutation-associated DCM helped to attenuate pathological remodeling of cardiomyocytes and alleviate proteostatic stress36. In contrast, recent studies had shown that HSF1 overexpression lead to reduced expression of myofilament localization-associated BAG3. Decreased expression of BAG3 was strongly associated with non-inherited heart failure and was more susceptible in male patients with DCM37. Therefore, the study of relevant molecules and pathways targeting HSF1 contributes to our understanding of DCM.

TRIM65 is an E3 ubiquitin ligase involved in the positive regulation of autophagy and was expressed in vascular endothelial cells, located in the cytoplasmic lysate and nucleoplasm. Unfortunately, there are relatively few studies related to TRIM65. From the available literature, it appeared that TRIM65 was mainly involved in proteopathy and ubiquitination regulation to regulate disease progression and as a target for a variety of diseases38,39. Interestingly, although fewer studies are addressing the mechanisms associated with TRIM65 and DCM, according to recent studies, TRIM65 was closely linked to the inflammatory vesicle NRLP340, which is known to play a role in a variety of DCM14. TRIM65 was associated with antiviral innate immune mechanisms41. In addition, it had been shown that TRIM65 regulated VCAM-1 to control inflammatory responses42. All these studies point the way to exploring the molecular mechanism of TRIM65 and DCM.

DICER1 is a member of the ribonuclease III (RNaseIII) family and is involved in the production of microRNAs, which regulate gene expression at the post-transcriptional level and are more frequently studied in oncological diseases43. Evidence suggested that DICER deletion resulted in a dramatic decrease in the level of miRNAs it regulates, which led to severe DCM and heart failure in mice, a trend that was also seen in the expression of DICER proteins in diseased populations, implying an important role of DICER family genes in the pathogenesis of DCM44. Follow-up studies had shown that microRNAs act as negative regulators of genes and that specific regulation of microRNA expression could inhibit the loss of cardiac function due to DICER deficiency45,46, leading to cardioprotection. These studies suggested that endogenous microRNA competitive regulation of DICER family genes will be an essential strategy for gene targeting therapy in DCM.

VDAC (voltage dependent anion channel), including VDAC1 and VDAC2, is a mitochondrial outer membrane pore-forming protein present in all eukaryotes. As a mitochondrial transporter protein, VDAG is mostly expressed in cardiac tissue and has significant tissue specificity47,48. It is well known that Ca2+ played a detrimental role in heart failure and myocardial ischemia/reperfusion, and Ca2+ overload activated the complex matrix chaperone procyclin D (CypD), which regulated the VDAC1, Grp75, and IP3R1 complex and thus caused damage to cardiomyocytes, whereas inhibition of the CypD, VDAC1, Grp75, and IP3R1 complex could protect cardiomyocytes49. Numerous studies had shown50,51 that regulation of VDAC1 expression through microRNA targeting could regulate mitochondrial function and promoted the release of mitochondrial calcium for cell protection. Furthermore, in DCM mice, the lncRNA H19/miR-675 axis competitively downregulated VDAC1, reducing apoptosis. The above report provides a new strategy to explore the role of VDAC1 in DCM. It was shown that VDAC1 expression was upregulated in the hearts of patients with hypertrophic cardiomyopathy52. In the present study, the expression of VDAC1 was also upregulated in samples from patients with DCM. These findings could explain the unique role played by VDAC1 as a target gene for DCM.

BAD (Bcl-2 associated agonist of cell death) often follows Bcl-2 and plays an anti-apoptotic role. In a TNF-α-mediated mouse model of DCM in which apoptosis occurs, the expression of BAD was reduced in association with Bcl-2d53, which was consistent with the findings of the present study. According to previous studies, BAD played a key role in inducing β-cell apoptosis in Friedreich's ataxia, a neurodegenerative disease closely related to cardiomyopathy and diabetes54. It is well known that microRNAs regulate protein expression of mRNAs through negative regulation and play an important role in cardiovascular diseases, especially in heart failure and cardiac remodeling55. Studies had shown that multiple microRNAs played a regulatory role on BCL256 and all of them were upregulated in heart failure57. As an antagonist of apoptosis, the protective role of BAD and Bcl-2 in the pathogenesis of DCM depended on further studies.

TFEB (transcription factor EB), a transcription factor located within the cytoplasmic lysosol (cytosol), is the master gene of the autophagic machinery of lysosomal biogenesis and coordinates the autophagic process, including autophagosome formation, autophagosome-lysosome fusion, and substrate degradation by driving the expression of autophagy and lysosomal genes58. According to reports, TEFB expression was highest in 18-week-old fetal heart tissue, with significant tissue specificity59. There is growing evidence that TFEB plays an important role in various types of DCM. Lysosomal storage disorders (LSD) lead to cardiac involvement in hypertrophic cardiomyopathy and DCM60. Further studies had shown that the Yes-associated protein (YAP) and Feb signaling pathway played a role in LSD disease by eliminating autophagic lysosomes, reducing cell death, and restoring cardiac function61. Also, it was found that TFEB deficiency led to cardiomyocyte hypertrophy and DCM causing heart failure62. Therefore, the role of TFEB in targeting DCM is extremely significant.

In addition, we performed a functional enrichment analysis of the pathogenesis of DCM and related molecular pathways and found that AR-DEGs of DCM were mainly enriched in autophagy regulatory pathways and cell growth signaling, such as regulation of autophagy, macroautophagy, AMPK signaling pathway, PKB-mediated events, etc. AMPK (Adenosine monophosphate-activated protein kinase) signaling pathway had been reported to be an important intracellular signaling pathway in the heart63. As an emerging target recognized for the treatment of heart failure64, AMPK plays an important role in regulating cardiomyocyte growth65. Numerous studies had shown that the AMPK pathway and its binding autophagy-related pathways played a protective role in the pathological development of cardiomyopathy66,67,68,69. These studies have provided ideas to explore the mechanistic studies of autophagy-related DCM. PKB (protein kinase B), also known as serine/threonine kinase Akt, serves as a central node for a variety of biological processes70. It had been reported that PKB was involved in protective mechanisms against myocardial ischemia/reperfusion71. However, relatively few studies have been conducted on the association of PKB-mediated events with DCM. According to previous studies, Pleiotrophin, a pro-angiogenic factor, was significantly expressed in rat models of myocardial infarction and DCM patients. It is considered that Pleiotrophin protects the myocardium by inhibiting endogenous AKT/PKB activity72. In contrast, Alexander et al. found that PKB phosphorylation expression restored cardiac contractility in a zebrafish model of DCM73.

In addition, we constructed TF-gene regulatory networks based on 8 autophagy-related genes in DCM and predicted them to target drugs, such as Melatonin and metformin. Studies showed that Melatonin had a better inhibitory effect on left heart dysfunction and ventricular remodeling in DCM rats with cardiorenal syndrome74. Metformin was able to partially reverse ventricular remodeling in mice with DCM through an autophagic mechanism75. These studies provided a basis and direction for clinical precision targeting therapy and novel drug development in DCM. In addition, we explored the comorbidities associated with DCM, such as fatty liver disease. Some scholars found that76 NAFLD affected the cardiovascular system through metabolic and inflammatory responses, and also increased the abnormalities of cardiac anatomy including cardiomyopathy77. Furthermore, the disease pathways between the two need further investigation.

However, there are certain shortcomings in our study. First, our data set of DCM was mined and analyzed secondarily by bioinformatics means, and the results of the study need to be validated with external evidence. In addition, the results of this study need to be combined with single-cell sequencing as the multi-omics study progresses. Finally, the mechanism of action and interrelationship between these 8 DCM genes and autophagy-related genes need further investigation.

Methods

Dilated cardiomyopathy dataset acquisition

The dataset of DCM was downloaded from the GSE4172 dataset of the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo) database, which was contributed by Wittchen et al.22, piggybacked on the GPL570 [HG- U133_Plus_2] platform using Affymetrix Human Genome U133 Plus 2.0 Array, containing eight endomyocardial myocardial biopsy samples from patients with microvirus B19-associated cardiac inflammation as experimental group and four healthy human samples as a control group. Clinical information of patients from the GSE4172 dataset was presented in Table 4.

Table 4 Clinical information for the GSE4172 dataset.

Autophagy genes acquisition

A total of 232 autophagy genes were downloaded from the Human Autophagy Database (HADb, http://autophagy.lu/). Similarly, 796 autophagy genes were obtained from the Human Autophagy Modulator Database (HAMdb, http://hamdb.scbdd.com)78. A total of 803 autophagy-related genes were obtained as the autophagy gene set for this study by taking the intersection of the two.

Identification of differentially expressed genes (DEGs) in autophagy-related genes (ARGs)

NetworkAnalyst 3.0 is a user-friendly bioinformatics visualization web platform for transcriptome analysis, gene network construction, and meta-analysis of gene expression data79. The expression data and grouping information of the GSE4172 dataset were submitted to NetworkAnalyst 3.0 for identification of the DCM groups and the healthy control groups for DEGs. For mRNA in microarrays, the threshold was set to |log2FoldChange|≥ 0.8 with a p value < 0.05, and genes meeting this criterion were considered as DEGs. We used the ggplot2 package (R package version 4.1.3) and pheatmap package (R package version 4.1.3) to draw the asymptotic volcano map and heatmap to show the DEGs. Autophagy-related genes (ARGs) and DEGs from the GSE4172 dataset were taken to intersect to obtain the set of autophagy-related differentially expression genes (AR-DEGs). Venn plots were created by using the Omicshare online tool (https://www.omicshare.com/). The expression of 23 AR-DEGs in GSE4172 was demonstrated using box plots through the ggpubr package as well as the associated helper R packages. The correlation analysis of AR-DEGs was visualized using the corrplot package (R package 4.1.3).

Functional enrichment analysis

Functional enrichment consists of performing biological processes, molecular functions, and chromosomal location analysis80. Gene annotation uses gene ontology (GO) terminology and consists of biological processes, molecular functions, and cells. The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway was used to understand metabolic pathways and plays an important role in the gene annotation process81,82. In addition, the BioCarta, WikiPathways83, and Reactome84 databases were also used to analyze KEGG pathways. The Enrichr (https://amp.pharm.mssm.edu/Enrichr/) platform provides a comprehensive gene enrichment analysis applied databases containing rich gene set annotation, pathway information analysis, and screening of gene target drugs85,86. The GO terms of the AR-DEGs of DCM and all pathway information for this study were obtained from the Enrichr platform.

Machine learning identifies molecular markers of AR-DEGs in DCM

In this study, the least absolute shrinkage and selection operator (LASSO) logistic regression was used for feature gene selection to reduce the number of genes in the disease prediction model, solve the multicollinearity problem in the regression analysis, and screen the molecular markers of DCM genes87. The "glmnet" package was used to implement the LASSO regression algorithm with α set to 1 which was used to control the traits of the model when dealing with highly correlated data. In addition, the Support Vector Machine-Recursive Feature Elimination (SVM-RFE) algorithm model was also used in this study to characterize the AR-DEGs and remove irrelevant genes to make the diagnostic prediction model more robust88. The SVM-RFE was implemented by the e1071 Package R software.

Transcription Factor (TF)-gene regulatory network construction

The JASPAR (http://jaspar.genereg.net/) database was used to generate a visual analysis of the TF-gene co-regulatory network89. Based on 8 biomarkers of DCM, TFs that regulated the activity of functional pathways and gene expression levels in DCM were identified from the JASPAR database to form the TF-gene regulatory network. It is important to note that the JASPAR database is included in the NetworkAnalyst 3.0 platform.

Target drug screening

Gene target-based drug screening has become a new approach for drug molecular identification study, which helps to expand the scope of relevant drugs and reduce the process of drug development. In this study, molecular markers of DCM were screened for drug candidates through the drug Signatures database (DSigDB), which consists of 17,389 drugs and 19,531 genes associated with the drugs90. The DSigDB database can be accessed by visiting Enrichr (https://www.amp.pharm.mssm.edu/Enrichr/) website to enter relevant gene targets and download target drug information. Drugs with p-values less than 0.05 and with larger combined scores were considered to be typically significant. The combined score represents the degree to which the small molecule drug is closely linked to the gene of interest.

Genetic disease association analysis

The DisGeNET (http://www.disgenet.org) database is an open and versatile platform for studying specific human diseases and their comorbidities through genetic and molecular pathways, probing the characteristics of disease genes and offering the possibility to elucidate the mechanisms of disease91. In the present study, molecular markers of DCM were uploaded to the Metascape (https://metascape.org/gp/index.html#/main/step1) platform92, which contains the DisGeNET database. We have revealed DCM-related comorbidities through the DisGeNET database, laying the foundation for the mechanistic study of DCM.

Copyright permission of KEGG

We have contacted Kanehisa Laboratories. We do not directly use these KEGG Pathway map “images” in the article, we need not obtain copyright permission of KEGG. However, they believe that we have written our article using their data, they kindly ask us to cite the following articles in it81,93,94.