Introduction

The BK virus (BKV) is a double-stranded DNA virus, belonging to the Polyomaviridae family1. Antibodies against BKV were detectable in more than 90% of children by age 10, indicating an asymptomatic infection of BKV during the early childhood2. Once the primary infection occurs, BKV persists latently in the renal epithelium3. BKV can be reactivated after kidney transplantation and leads to BK virus nephropathy (BKVN), which is characterized by interstitial fibrosis and cellular infiltration4,5. BKVN is one of the main causes of graft dysfunction and morbidity in renal-transplant recipients6,7. The cause of increasing incidence of BKVN is still unknown7. At present, excessive immunotherapy, e.g. tacrolimus and mycophenalate mofetil, might be the primary risk factors of BKVN8,9,10.

The common therapy for BKVN is the reduction of immunosuppression, which may result in severe acute graft rejection1. Leflunomide combined with everolimus or intravenous immunoglobulin may be safe rescue therapies of BKVN6,11. However, these therapies have not been proved in preclinical experiments or large randomized controlled studies. Thus, it is important to look into the mechanism of the disease and find out the key point of BKVN for new therapeutic targets.

A few studies have explored the pathogenesis of BKVN, including the innate and the adaptive immune systems. Some researchers showed that the increasing number of dendritic cells could inhibit the immune evasion of BKV, increase magnitude of virus-specific CD8+ T cells and enhance the natural killer cells-mediated cytotoxicity in immune responses to BKV12,13,14. Some immune factors also participate in the pathogenesis of BKVN, such as interleukin-6 (IL-6), IL-8/CXCL8, RANTES/CCL5, MCP-1/CCL2, and IP-10/CXCL1015. However, previous studies simply demonstrated that expression levels of these cell factors were changed (Supplementary Table S1), but failed to display the detailed mechanisms, e.g. what are the biological functions of these factors, how they interact with each other and which cell factor plays a key role in the interaction network.

Bioinformatics is a kind of tool to collect, classify and analyze biological datasets such as the gene expression microarray dataset16,17. Gene expression analysis by bioinformatic methods has been widely used in genomics and biomedical research, providing insights into the molecular events underlying human biology and disease18. Data mining of the available microarray  datasets could help scientists to narrow down the research scope and to carry out targeted experiments.

In this study, we analyzed the public array data by bioinformatics methods to find out the important gene network of BKVN. Differentially expressed genes (DEGs) were first identified between stable and BKVN renal-transplantation recipients. Then protein-protein interactions (PPIs) were further analyzed. Finally, we attempted to identify the key genes and to obtain better insights into the pathogenesis of BKVN.

Results

Five hundred and twenty-four DEGs were selected

Microarray data of BKVN and stable kidney transplantation patients were compared by the limma package by the linear model, the contrast model and the DEG selection. A total of 502 DEGs were selected according to the criteria of P < 0.01 and fold change >2.0, which include 249 up-regulated genes and 253 down-regulated genes (Supplementary Dataset S1). The hierarchical cluster analysis was done to show the distribution of DEGs (Fig. 1). Above the heatmap, the yellow bar represents samples of stable renal allograft patients, and the blue bar represents samples of BKVN patients. In the heatmap, each column represents a tissue sample, and each row represents a single gene. The gradual color from green to red means the changing degree from down-regulation to up-regulation. Color black means no difference expressed in this gene between patients with BKVN and with stable allografts.

Figure 1
figure 1

Heatmap of differentially expressed genes. Each column represents a sample, and each row represents a gene. Above the heatmap, yellow bar represents samples of stable renal allograft patients, and blue bar represents samples of BKVN patients. In the heatmap, green means down-regulation, while red means up-regulation. Color black means no difference expressed in this gene between BKVN and stable allograft patients.

DEGs of BKVN are mainly enriched in the immune response

To investigate biological functions of DEGs, we further analyzed DEGs in DAVID with criteria of P < 0.05 and the count ≥5, including MF, BP, CC and KEGG pathway. In MF ontology, DEGs mainly enriched in 5 categories about protein interactions (Supplementary Dataset S2, Fig. 2a), such as the protein binding (262 genes), the serine-type endopeptidase activity (14 genes) and the signal transducer activity (11 genes). In BP ontology, the majority enriched categories are the negative regulation of transcription from RNA polymerase II promoter (31 genes), the inflammatory response (29 genes), the innate immune response (28 genes) and the immune response (28 genes), which are focused on the immune process. Since a total of 39 categories are involved in the BP analysis, only top 10 categories in the gene count were shown in Supplementary Dataset S3 and Fig. 2b. CC ontology displays the distribution of DEGs in cells. According to results of CC analysis, proteins of DEGs are mostly located in the nucleus (152 genes) and the cytosol (107 genes). Other important CC categories are the extracellular exosome, the nucleoplasm, the membrane and so  forth (Supplementary Dataset S4, Fig. 2c). Furthermore, 17 dysfunctional pathways in BKVN are found out by KEGG pathway analysis (Supplementary Dataset S5, Fig. 2d). Important pathways are the chemokine signaling pathway (15 genes) and the phagosome (11 genes).

Figure 2
figure 2

Results of gene ontology and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses. Blue bar chart represents the value of −log (P), whilst the orange line chart represents the gene count of each category. (a) Results of molecular function analysis. (b) Results of biological process analysis. (c) Results of cellular component analysis. (d) Results of KEGG pathway analysis.

According to results of the enrichment, DEGs between BKVN and stable renal allograft patients centrally locates in nucleus and cytosol, and probably participate in the protein interactions process in the immune response.

Fourteen significant genes constructed co-citation network in literature mining

At first, 502 DEGs were uploaded onto the STRING website. Then 219 DEGs with score >0.4 (median confidence) were selected to construct the PPIs (Fig. 3). In PPIs, genes closely associated with others were identified with the degree ≥1019, including EGF, TYROBP, PTPRC, STAT1, CXCL10, HCK, CCL5, IRF7, CXCL9, GBP5, PLEK, CD163, SMAD4, CASP1, CDH1, GBP2, GBP1, BIRC5, GZMB, C1QB, CALM1 and C1QA. In the clustering analysis, main biological keywords of hot genes reported in literature were immune responses, tumor necrosis factor and signaling transductions (Fig. 4a). Among 22 significant genes, 16 genes constructed a co-citation network according to previous studies. However, in the network, C1QA and C1QB only interacted with each other, but not with the other 14 genes (Fig. 4b, Table 1).

Figure 3
figure 3

The protein-protein interaction network was constructed with deferentially expressed genes. Red represents up-regulated genes in BKVN patients compared with stable allograft patients, and blue represents down-regulated genes.

Figure 4
figure 4

Literature mining results of proteins with degree ≥10. (a) Clustering analysis of biological functions of 22 genes in previous studies. In the heatmap, color black means that the biological function of the gene has not been reported yet. While color light green means that the gene has the biological function according to previous studies. Hot genes mainly clustered in the immune response, the tumor necrosis factor and the signal transduction. (b) Co-citation network of hot genes in protein-protein interaction. In the co-citation network, 14 genes closely interacted, while C1QA only interacted with C1QB. The numbers noted on the line indicated number of studies co-cited.

Table 1 Hub genes identified by literature mining.

CXCL10, EGF and STAT1 are significant genes in BKVN

In the CytoNCA analysis, every DEG was scored according to degree centrality, betweenness centrality and subgraph centrality respectively (Table 2). Based on the results of CytoNCA analysis, CXCL10, EGF and STAT1 were chosen as hub proteins. The network of CXCL10, EGF, STAT1 and proteins directly associated with hub proteins are described in Fig. 5, including 17 up-regulated and 5 down-regulated proteins.

Table 2 Top 5 genes evaluated by degree centrality, betweenness centrality and subgraph centrality in the protein-protein interaction network.
Figure 5
figure 5

The protein-protein interaction (PPI) network of important proteins. Red represents up-regulated genes, while blue represents down-regulated genes. The PPI network consists of 17 up-regulated proteins and 5 down-regulated proteins. CXCL10, EGF and STAT1 are identified as hub proteins.

Discussion

In this study, we aimed at finding out the key protein interaction networks in BKVN after kidney transplantation. By comparing the array datasets between BKVN and non-rejection transplantation patients, 267 up-regulated and 257 down-regulated DEGs were identified. Then the GO and KEGG analyses show the important role of innate immune system in BKVN. Finally, PPIs were constructed by 219 DEGs and 22 key proteins were selected, including CXCL10, EGF and STAT1.

By the GO annotation in DAVID, we further analyzed biological functions of DEGs, which helped us to infer the pathogenesis of BKVN. First, in the MF ontology, the enriched ontologies focused on the alteration of protein activities, including the protein binding, protein homodimerization activity, serine-type endopeptidase activity and the receptor activity. Both protein homodimerization and serine-type endopeptidase could activate or inhibit signaling pathways by changing structures of important proteins, such as receptors20, and further affect cellular processes, including inflammation, cell death and development21,22. In the BP ontology, the majority of DEGs were enriched in immune process, such as innate immune response, the inflammatory response and the immune response. It has been reported that BKVN is associated with the innate and specific immune system1. BKV may lead to nephropathy via cell lysis, stimulation of the immune system and induction of inflammation23. As one of the DEGs and a ligand of CXCL10, CXCR3 is expressed on T cells, dendritic cells and natural killer cells, and can stimulate the migration and activation of these immune cells in immune responses against BKV24,25,26. In the CC ontology, proteins of DEGs are mostly located in the nucleus and the cytosol. When BKV infects host cells, the virus enters into the host nucleus and lies episomally1. Once reactivated, BKV regulates the transcription of host cells. The other important CC category is the extracellular exosome, which contains all type of biomolecules, including proteins, lipids and so forth27. A number of pathogen-derived components, even RNAs, have been found in exosomes after viral infection28. Exosomes were involved in virus transmission in the infection process. However, little is known about the functions of exosomes in BKVN. Exosomes might offer new insights into the inhibition strategies against BKV reactivation. Taken together, DEGs may affect the structural changes of proteins in nucleus, cytosols and exosomes to participate the immune response in BKVN.

Both the literature mining and the CytoNCA analysis revealed core positions of CXCL10, EGF and STAT1 in the PPI network. CXCL10, a proinflammatory cytokine, has been reported to participate in the pathogenesis of BKVN29. The levels of CXCL10 were found to increase in serum and renal tissue of patients with BKVN as compared with those with non-rejection allografts23,30. We also demonstrated that CXCL10 was more expressed in patients with BKVN. All these findings indicate that CXCL10 plays a pivotal role in the immune response against BKV. According to our results, EGF appears to be another important hub protein. However, few researchers have reported the relationship between EGF and BKVN thus far. Rintala and his colleagues found that EGF played an important role in chronic allograft injury31. EGF interacts with TGF-β, VEGF and some other cytokines to promote tissue repair32,33. As per our analysis, the up-regulated CXCL10 may interact with the down-regulated EGF in the pathogenesis of BKVN. This suggests that BKV may induce tissue repair by promoting the inflammation and inhibiting the tissue repair in renal-transplantation recipients. Thus far, however, STAT1 has not been reported to associate with BKVN. Giacobbi et al. found that STAT1 was necessary in antiviral state and that induction of STAT1 mediated innate immune responses34.

We investigated the crucial proteins in BKVN through various data mining methods including the DEG analysis, GO, KEGG, literature mining, STRING and the PPI analysis. These bioinformatics methods may corroborate each other and make the result reliable. The fundamental aim of our study was to infer the potential mechanism of BKVN via bioinformatic analysis. We did not attempt to find diagnostic or prognostic biomarkers for BKVN in this sole study, because renal diseases are frequently associated with immune dysfunction. It appears difficult to identify a single renal disease only by cell factors as biomarkers. Though CXCL10 and EGF have been reported altered in a variety of renal diseases, even in the kidney rejection, the network of CXCL10, EGF and STAT1 in BKVN has not been reported. We believe that this network may provide new ideas for the elucidation of the immunological and biological mechanisms of BKVN.

Our study has some limitations. First, other non-technical site- based microarray data were not integrated in our study. Second, as Sigdel and his colleague reported, the relation between transcriptome to proteome may not be strong enough35. In this regard, making a protein-to-protein network from transcriptomic data might be risky. Due to the bioinformatic nature of our study, the specific mechanism and pathways of CXCL10, EGF, STAT1 and other important proteins in the PPI network were not further investigated. Therefore, animal and laboratory experiments are mandatory to further clarify the pathogenesis of BKVN. Finally, acute T cell-mediated rejection is a well-known confounder of BKVN, and we cannot rule out this confounding factor by this bioinformatic study.

In summary, we investigated the potential crucial protein network of BKVN patients. A protein network was selected by DEG, GO, KEGG and PPI analyses. CXCL10, EGF and STAT1 are hub proteins in the pathogenesis of BKVN. BKV may induce kidney injuries by promoting inflammation and prohibiting tissue reparation.

Materials and Methods

Affymetrix microarray data

To identify DEGs between BKVN patients and stable allograft recipients, the microarray dataset GSE75693 were downloaded from the public Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/). The dataset GSE75693 was deposited by Sigdel et al.35, containing information of renal bioptic tissues from 30 stable renal allografts, 15 acute rejection patients, 15 BKVN patients and 12 chronic allograft nephropathy patients. Here, we selected the 30 stable renal allografts and the 15 BKVN patients as study subjects. The array data were based on the GPL570 Affymetrix Human Gene U133 Plus 2.0 Array (Affymetrix Inc., Santa Clara, CA, USA) sourced from renal bioptic tissues of patients. Microarray data were processed by a series of bioinformatic methods to identify the possible protein interaction network and to infer the functional process in the pathogenesis of BKVN (Supplementary Fig. S1). The raw data was preprocessed by Robust Multi-array Average36 algorithmin affy package of Bioconductor (http://www.bioconductor.org/), including background correction, normalization and calculation of gene expressions. For all samples in the dataset, probes for the same gene were reduced to a single value according to the maximum one37.

DEGs analysis

DEGs between BKVN and non-allograft injury patients were analyzed by the limma package of Bioconductor. Linear models were constructed for gene expression data of BKVN and stable renal allograft samples respectively. Then the contrast model was used to compare gene expression differences between the two groups. The Student’s t-test was used to calculate the P values. DEGs were selected based on the threshold P < 0.01 and fold change >2.0. P value here was used to test if the gene was differentially expressed between the BKVN and the stable groups with the fold change >2.0.

Enrichment analysis of DEGs

Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) are two aspects of DEGs enrichment analysis, which helps us to learn the potential mechanism of BKVN. GO annotated genes by a defined, structured and controlled vocabulary38, including molecular function (MF), biological process (BP) and cellular components (CC), while KEGG assigns DEGs to specific pathways39. GO and KEGG can be performed by Database for Annotation, Visualization and Integrated Discovery (DAVID, http://david.abcc.ncifcrf.gov/). We analyzed biological functions behind massive genes with P < 0.05 and the count ≥5.

PPI network construction and literature mining

PPI shows the potential network and connections of DEGs. PPI is usually done by STRING (Search Tool for the Retrieval of Interacting Genes/Proteins, http://string-db.org/), which is a web source of biological database. List of DEGs was uploaded onto the STRING. According to the official explanation of STRING, the confidence score is the approximate probability that a predicted link exists between two proteins in the same metabolic map in the KEGG database (https://string-db.org/cgi/help.pl?UserId=rtxtBR80pDyg&sessionId=si2cM9wdJB3P). Thus, PPIs of DEGs were selected with the threshold of score (median confidence) >0.436,39. Then the analysis results of PPIs were downloaded from STRING, and modified by Cytoscape (http://www.cytoscape.org/). According to the analysis of STRING, nodes with higher degree in the PPI were put into GenCLiP 2.0 (http://ci.smu.edu.cn/GenCLiP2.0/confirm_keywords.php), which is an online tool for literature mining of genes40. GenCLiP could generate keywords of genes in previous literatures to help us infer the possible gene function40. In GenCLiP, biological keywords of hot genes in previous studies were analyzed by the “Gene Cluster with Literature Profiles” module with P ≤ 1 × 10−4 and hit ≥4. And the co-citation network of hot genes was selected by “Literature Mining Gene Networks” module.

Hub protein selection by CytoNCA

In Cytoscape, scattered proteins were removed from the final PPIs. The hub protein, which interacts most frequently with other proteins and works like a hub in the network, were selected by CytoNCA according to degree centrality, betweenness centrality and subgraph centrality36. Finally, proteins associated with hub proteins at the degree ≥10 were selected and constructed the significant network of BKVN mechanism39.

Data availability statement

The GSE75693 dataset analyzed during the current study is available in the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo/).