Identification of C3 as a therapeutic target for diabetic nephropathy by bioinformatics analysis

The pathogenesis of diabetic nephropathy is not completely understood, and the effects of existing treatments are not satisfactory. Various public platforms already contain extensive data for deeper bioinformatics analysis. From the GSE30529 dataset based on diabetic nephropathy tubular samples, we identified 345 genes through differential expression analysis and weighted gene coexpression correlation network analysis. GO annotations mainly included neutrophil activation, regulation of immune effector process, positive regulation of cytokine production and neutrophil-mediated immunity. KEGG pathways mostly included phagosome, complement and coagulation cascades, cell adhesion molecules and the AGE-RAGE signalling pathway in diabetic complications. Additional datasets were analysed to understand the mechanisms of differential gene expression from an epigenetic perspective. Differentially expressed miRNAs were obtained to construct a miRNA-mRNA network from the miRNA profiles in the GSE57674 dataset. The miR-1237-3p/SH2B3, miR-1238-5p/ZNF652 and miR-766-3p/TGFBI axes may be involved in diabetic nephropathy. The methylation levels of the 345 genes were also tested based on the gene methylation profiles of the GSE121820 dataset. The top 20 hub genes in the PPI network were discerned using the CytoHubba tool. Correlation analysis with GFR showed that SYK, CXCL1, LYN, VWF, ANXA1, C3, HLA-E, RHOA, SERPING1, EGF and KNG1 may be involved in diabetic nephropathy. Eight small molecule compounds were identified as potential therapeutic drugs using Connectivity Map.

Weighted gene coexpression network analysis of the GSE30529 dataset. Compared with differential analysis that focuses on the differential expression of genes, the advantage of weighted gene coexpression network analysis (WGCNA) is that it uses expression correlation information between multiple genes to identify genes of interest. Therefore, we applied two analytical methods to screen the target genes. Similarly, we  www.nature.com/scientificreports/ performed sample cluster analysis first to learn sample similarity. The results showed that there were 3 outliers (Fig. 2a); therefore, three samples (GSM757025, GSM757027 and GSM757034) were removed. When performing WGCNA, to construct a scale-free network, the scale-free topological fitting index reaches 0.85 and the mean connectivity reaches 100 by setting the soft threshold power value to 10 (Fig. 2b). Based on the weighted gene coexpression correlation, hierarchical clustering analysis was carried out to obtain different gene modules, which are represented by branches of the clustering tree and different colours. A total of 22 modules were found in the network, with module sizes ranging from 30 to 10,000 and merge cut hights of 0.25 (Fig. 2c). The 22 modules were divided into two clusters in general according to the relationships between the modules (Fig. 2d). In addition, the weighted coexpression correlations of all genes were displayed in a heatmap plot (Fig. 2e). Finally, 3,538 highly related genes were selected in the TOM matrix with a threshold greater than 0.1. The results of the two analyses can be combined to obtain more accurate targets. Therefore, a list of 345 target genes was obtained, and these genes may play a regulatory role in DN (Fig. 3a).
Potential epigenetic regulatory mechanism. It has now been recognized that the occurrence and development of DN are the result of complex interactions between genetic and environmental factors. Envi-  www.nature.com/scientificreports/ ronmental signals could change intracellular pathways through chromatin modifiers and regulate gene expression patterns leading to diabetes and its complications 12 . After determining the target genes, we studied more datasets to understand the potential mechanisms of the differential expression of the target genes, including the GSE51674 dataset 9 , which contains miRNA profiles, and the GSE121820 dataset, which contains DNA methylation profiles. Generally, gene expression could be inhibited by miRNAs via base pairing with mRNA. Differential expression analysis was performed on the miRNA profiles of the GSE51674 dataset 9 . Similarly, quality examinations of GSE51674 were performed. There were no very heterogeneous samples in the sample cluster dendrogram (Fig. 4a). PCA showed that the two main components contributed 62.71% and 15.68%, respectively (Fig. 4b).
Next, 16 downregulated miRNAs and 67 upregulated miRNAs were found with the criteria of |log2 FC| greater than 3 and adjusted p value less than 0.01 (Fig. 4c). The 16 downregulated miRNAs are shown in the hierarchical clustering heatmap in Fig. 4d. To construct a downregulated miRNA-mRNA network, the TargetScan, miRWalk, miRBase and miRTarBase databases [13][14][15][16] were used for target gene prediction of the miRNAs. Eighty-eight downregulated miRNA-mRNA pairs were obtained according to the miRNA target webtools (Fig. 5a). Among them, TGFBI, SH2B3 and ZNF652 were upregulated in the GSE30529 dataset (Fig. 5b). Therefore, the miR-1237-3p/ www.nature.com/scientificreports/ SH2B3, miR-1238-5p/ZNF652 and miR-766-3p/TGFBI axes may be involved in diabetic nephropathy. Similar work was carried out on the upregulated miRNAs, but their predicted genes did not overlap with the target genes from GSE30529. DNA methylation is the main epigenetic form of gene expression regulation. To understand the methylation level changes of the target genes, the GSE121820 dataset was downloaded as a validation dataset. Among 345 target genes, 227 genes had methylation differences between the DN group and the control group (Supplemental Table 1).
PPI network and identification of hub genes. First, the list of target genes was exported to the STRING database. By setting the interaction confidence score at the highest level at 0.9, a protein-protein interaction (PPI) network was constructed, which contained 190 nodes and 680 edges (Fig. 6a). Each node represents a protein, and an edge represents an interaction between proteins. The size and gradient colour of the nodes are adjusted by the degree, while the thickness and gradient colour of the edge are adjusted by the interaction score. To search for important nodes in the networks, all nodes were ranked by the 12 topological analysis methods provided by CytoHubba. Each algorithm computed all node scores, and then 1-50 points were assigned based on the rank. According to all points, the top 20 nodes (KNG1, C3, FN1, SYK, HLA-E, EGF, ITGB2, CXCL1, CXCL8, ITGAV, LYN, VWF, RHOA, HLA-DQA1, ITGAM, SERPING1, P2RY13, ANXA1, P2RY14 and FCER1G) were identified (Fig. 6b). Because the products of genes were at the core of the PPI network, these hub genes were considered potential therapeutic targets.
Clinical data validation and drug prediction. To verify the potential roles of the hub genes in DN, clinical data including two datasets (Woroniecka and Schmid) from Nephroseq were obtained, and Pearson correlation analysis was performed between the hub genes and clinical data. The gene expression of SYK, CXCL1, LYN, VWF, ANXA1, C3, HLA-E, RHOA and SERPING1 in DN tubule samples was negatively related to GFR, suggesting a pathogenic role of the upregulated genes (Fig. 7a, c, e). Conversely, the gene expression of EGF and KNG1 in DN tubule samples was positively related to GFR, suggesting a protective role of the downregulated genes (Fig. 7b, d, f).
Given that the effectiveness of existing treatment strategies is not entirely satisfactory, it is necessary to propose new strategies and develop new therapeutic methods. Connectivity Map 17 was used to compare the DEG list with the database reference dataset, and a correlation score (− 100 to 100) was obtained. Negative numbers indicate that the DEG list and the reference gene expression spectrum may be opposite; that is, the expression spectrum of drug disturbance is negatively correlated with the expression spectrum of disease disturbance. Twenty-three upregulated DEGs (logFC greater than 2.5) and 13 downregulated DEGs (logFC less than 1.5) were  (Table 1).

Discussion
As one of the microvascular complications of diabetes, DN is the main cause of ESRD. Existing treatments are not sufficient to control the development of disease. New treatment strategies are needed. High-throughput omics data have been widely used to study the mechanisms of disease and predict possible therapeutic targets. We performed differential expression analysis and WGCNA of GSE30529 and obtained 345 target genes. GO annotations mainly included neutrophil activation, regulation of immune effector process, positive regulation of cytokine production and neutrophil-mediated immunity. KEGG pathways mostly included phagosome, complement and coagulation cascades, cell adhesion molecules (CAMs), ECM-receptor interaction, focal adhesion and AGE-RAGE signalling pathway in diabetic complications. The results supported that the immune response may be involved in DN. Cytokine release and extracellular matrix deposition may be subsequent events and continue with the development of disease. We also studied additional datasets to understand the potential mechanisms of the differential expression of the target genes. The miRNA-mRNA network suggested that the miR-766-3p/ TGFBI, miR-1238-5p/ZNF652 and miR-1237-3p/SH2B3 axes may be involved in diabetic nephropathy and that most target genes have differences in DNA methylation levels between the DN group and the control group. Next, a PPI network was established, and the 20 hub genes were identified. Furthermore, correlation analysis with clinical data demonstrated the disease-promoting effect of SYK, CXCL1, LYN, VWF, ANXA1, C3, HLA-E, RHOA and SERPING1, which were upregulated in DN tubule samples. In contrast, EGF and KNG1, which were downregulated in DN tubule samples, were suggested to have protective effects in DN.
To date, there have been some reports about hub genes and DN. Spleen tyrosine kinase (SYK) was reported to mediate high glucose-induced TGF-β1 and IL-1β secretion 18,19 . In a diabetic animal model, C-X-C motif chemokine ligand 1 (CXCL1) was found to possibly serve as a proinflammatory mediator 20,21 . In addition, VWF was reported to be involved in intrarenal thrombosis leading to the deterioration of renal function 22 . Purvis et al. observed higher circulating plasma levels of ANXA1 in T1D and T2D patients, whereas the exogenous supplementation of ANXA1 improves insulin resistance and prevents the progression of subsequent microvascular complications in mice 23,24 . Previous studies have demonstrated that statins prevent DN by reducing the activity of Ras homolog family member A (RhoA) protein activation [25][26][27][28] . Another study reported that the activation of RhoA/ROCK may regulate the NF-κB signalling pathway 29 . In addition, sinomenine, kaempferol, catalpol and rutin have been shown to have protective effects through the RhoA/ROCK signalling pathway [30][31][32][33] . EGF was considered a urine biomarker in two studies 34,35 . Recently, the newest report about cytosine methylation differences in kidney tubule samples supported this viewpoint 36 . In addition, one large-scale linkage study revealed polymorphisms in kininogen 1 (KNG1) associated with DN in European populations 37 .
C3 was the gene of interest through differential expression analysis and WGCNA. The KEGG pathways of the target genes also included the complement and coagulation cascade. In addition, the selection of the core genes in the PPI network also indicated that C3 was centrally located. These results may prove that complement C3 serves as a therapeutic target in diabetic nephropathy. The results are consistent with knowledge that the complement system participates in DN. The development of diabetes is intimately linked to low-grade inflammation 38 . High levels of inflammatory markers such as C-reactive protein and adiponectin proved this viewpoint 39,40 . Inflammation might promote the occurrence and development of diabetic complications such as DN. However, the underlying mechanisms of the initiation of low-grade inflammation are still poorly understood. Increasing research evidence has proven that the innate immune system is closely involved in diabetes 41 . Simultaneously, the roles for pattern recognition receptors (PRRs) associated with DN have been discussed 42,43 . The complement system is not only involved in innate immune defence by PRRs (mannose-binding lectin and ficolin) but also considered an important proinflammatory factor. Several studies have pointed out that the complement system is involved in the pathogenesis of DN and might be a therapeutic target [44][45][46] 48 . Furthermore, a large-scale cohort study substantiated that diabetic patients with high plasma levels of C3 are more prone to kidney damage than the general population 49 . Another study indicated that the serum levels of C3 may help to differentiate DN patients from diabetic patients without kidney damage 50 . Blockade of C3a and C5a receptors in a T1DM model indicated a potential protective effect on renal fibrosis by improving endothelial-to-myofibroblast transition through the Wnt/β-catenin signalling pathway 51 . Similarly, blockade of C3a receptors in rats with T2DM improved renal morphology and function by inhibiting cytokine release and TGFβ/Smad3 signalling 52 . However, the best approach for targeting the complement system to prevent the development of DN still needs to be explored. Therefore, 8 potential small molecule compounds were identified by the Connectivity Map database in our study. In summary, our study has important significance in understanding the underlying mechanisms of DN and is helpful for developing new treatment strategies for DN. However, further molecular biological experiments are needed to verify the association between the identified genes and DN.  KNG1  C3  FN1  SYK  HLA-E  EGF  ITGB2  CXCL1  CXCL8  ITGAV  LYN  VWF  RHOA  HLA-DQA1  ITGAM  SERPING1  P2RY13  ANXA1  P2RY14  FCER1G   0  Data processing. All differential analyses were performed by the limma package 10 . Adjusted p values less than 0.05 and |log2-fold change (FC)| greater than 1 were considered statistically significant in the differential analysis of GSE30529. Adjusted p values less than 0.01 and |log2 FC| greater than 3 were considered statistically significant in the differential analysis of GSE51674. In addition, the TargetScan, miRWalk, miRBase and miRTar-Base databases [13][14][15][16] were used for the target gene prediction of the differentially expressed miRNAs. Weighted gene coexpression network analysis (WGCNA) allows biologically meaningful module information mining based on pairwise correlations between genes in high-throughput data using the WGCNA package 54 . The WGCNA workflow consists of gene coexpression network construction, module identification, module relationship analysis and the identification of highly related genes. The gene coexpression network was constructed with the filtering principle that the soft threshold makes the network more consistent with a scale-free topology. The modules were identified with the criterion of module size 30-10,000, merge cut height equal to 0.25 and verbose equal to 3. Highly related genes were obtained with thresholds greater than 0.1 in the topological overlap matrix (TOM).

Functional enrichment analysis and hub gene screening. Gene Ontology (GO) annotation and
Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed with the clusterProfiler package 11 . The STRING database 55 (version 11.0, https ://strin g-db.org/) was used to search for interactions between the candidate proteins based on laboratory data, other databases, text mining and predictive bioinformatics data. Cytoscape software was used to visualize the protein-protein interaction (PPI) network and perform network analysis. CytoHubba, a built-in tool in Cytoscape, uses 12 methods to explore important nodes in biological networks, such as the Degree method (Deg), Maximum Neighborhood Component (MNC), Density of Maximum Neighborhood Component (DMNC), Maximal Clique Centrality (MCC), Closeness, EcCentricity, Radiality, BottleNeck, Stress, Betweenness, Edge Percolated Component (EPC) and ClusteringCofficient 56 .