Prediction of co-expression genes and integrative analysis of gene microarray and proteomics profile of Keshan disease

Keshan disease (KD) is a kind of endemic cardiomyopathy which has a high mortality. However, molecular mechanism in the pathogenesis of KD remains poorly understood. Serum samples were collected from 112 KD patients and 112 normal controls. Gene microarray was used to screen differently expressed genes. Genevestigator was applied to forecast co-expression genes of significant gene. iTRAQ proteomics analysis was used to verify significant genes and their co-expression genes. GO, COG, IPA and STRING were applied to undertake function categorization, pathway and network analysis separately. We identified 32 differentially expressed genes; IDH2, FEM1A, SSPB1 and their respective 30 co-expression genes; 68 differential proteins in KD. Significant proteins were categorized into 23 biological processes, 16 molecular functions, 16 cellular components, 15 function classes, 13 KD pathways and 1 network. IDH2, FEM1A, SSBP1, CALR, NDUFS2, IDH3A, GAPDH, TCA Cycle II (Eukaryotic) pathway and NADP repair pathway may play important roles in the pathogenesis of KD.

In order to screen significant genes and proteins associated KD and build protein network in this study, the expression levels of the 78 differentially expressed genes observed in our previous KD whole genomics study 4 were evaluated in the peripheral blood of KD patients and healthy controls via an oligonucleotide microarray analysis. Then, Genevestigator online tool was used to forecast the co-expression genes of the differently expressed genes. Finally, an iTRAQ-coupled 2D LC-MS/MS technique was employed to analyze the differently expressed protein in the sera of KD and normal individuals. Differently expressed proteins pathways and networks were analyzed using IPA (Ingenuity pathway analysis) and STRING (search tool for the retrieval of interacting genes) online systems separately. Besides KD, the approach of this study may also provide new insights to the underlying molecular mechanisms of other heart diseases.

Results
Microarray data analysis. The analysis of microarray data using the fold change criteria ≥3 and ≤0.33, revealed 32 up-regulated genes in KD compared to normal controls were noticed ( Table 1). The fold changes ranged from 3.26 to 20.47. Notably, no down-regulated genes were observed.
qRT-PCR validation of microarray data. To confirm the differential expression of the genes revealed by microarray analysis, the genes of IDH2, FEM1A and SSBP1, which had different magnitudes of fold changes of gene expression, were selected for the assessment by qRT-PCR. The up-regulated expressions of these three genes were in accordance with those obtained from the microarray analysis ( Figure S1). Clusters of co-expressed genes according to tissue or organ. Hierarchical clustering of IDH2 ( Fig. 1A), FEM1A (Fig. 1B) and SSPB1 (Fig. 1C) showing their 30 most co-expressed genes were screened by using Genevestigator tool. The color rectangles from white to red showed the percent of the gene expression potential. The genes expressed in the peripheral blood mononuclear cell or cardiomyocyte were considered as significant co-expressed genes. Protein identification and quantification. A total of 3134 unique peptides recognizing in 787 proteins were detected by iTRAQ proteomic analysis. We identified 68 significant differentially expressed proteins (Table 2), which had relative ratios higher than 1.5 or lower than 0.67. Among them, 49 proteins (72.1%) were up-regulated, while 19 proteins were down-regulated (27.9%).
Functional categorization, pathway and network analysis. Among the 68 differently expressed expression proteins, the proteins were categorized and displayed in the way of percentage by GO in three domains:23 biological processes ( Fig. 2A), 16 molecular functions (Fig. 2B) and 16 cellular components (Fig. 2C). Fifteen function classes were categorized by COG. The function classifications and the number percentages of the protein in each classification were shown as follows: inorganic ion transport and metabolism (12%); cell motility (10%); translation, ribosomal structure and biogenesis (9%); nucleotide transport and metabolism (8%); cell wall/membrane/envelope biogenesis (8%); signal transduction mechanisms (8%); carbohydrate transport and metabolism (7%); energy production and conversion (6%); lipid transport and metabolism (6%); replication, recombination and repair (5%); amino acid transport and metabolism (5%); defense mechanism (4%); chromatin structure and dynamics (4%); cytoskeleton (3%) and those with unknown function (5%). Thirteen canonical pathways were obtained from IPA ( Figure S2). Z-score value was used to predict the activation state of the upstream regulator. The absolute z-score above 2 is considered significant. According to z-score,  orange bar indicates that the activity pattern of this pathway was predicted; blue bar indicates that the inhibition pattern of this pathway was predicted and grey bar indicates that the pathway can't be assessed by this method. Ratio value indicates the number of molecules in a given pathway that meet cut criteria, divided by total number of molecules that make up that pathway 8 . Among the pathways, TCA Cycle II (eukaryotic) pathway ( Figure S3) and NADH repair pathway ( Figure S4) were associated with the energy metabolism of cardiomyocyte. Isocitrate dehydrogenase 3 (NAD(+)) alpha (IDH3A) was the significant and up-regulated protein in TCA Cycle II (eukaryotic) pathway. Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was the significant and up-regulated protein in NADH repair pathway. STRING successfully identified 65 differently expressed proteins and resulted in one network (Fig. 3). The most dense network of associations was observed around GAPDH, while calreticulin (CALR) had five, IDH2 three, SSBP1 one and IDH3A one experimentally determined relationships. Altogether 16 genes were single nodes, with no associated genes. IDH2 has relationship with IDH3A. GAPDH has relationship with CALR and IDH2.

Discussion
KD is an endemic mitochondrial cardiomyopathy. The crista membranes of mitochondria in myocardium were swollen and enlarged. Meanwhile, the activity of oxidative phosphorylation enzymes was significant reduced in the mitochondria of KD patients 9 . Mitochondria-related gene expression profiles have been shown to be different between KD patients and healthy controls 10 .
These studies on interactions between multiple genes are becoming more and more meaningful than a single gene analysis in order to gain deeper understanding on the pathogenesis of complex diseases. Investigation of the genes that co-expressed in the same tissue with the target gene may be an important approach to identify the interacting genes. In this study, we used microarray to screen target genes associated with KD. Then, we combined the co-expression analysis and proteomics analysis to verify significant co-expressed genes and the proteins they encode with the target genes. Finally, we applied function categorization, pathway and network analysis to analyze these differently expressed proteins.
In our study, microarray analysis revealed that IDH2, FEM1A and SSBP1 were up-regulated genes in KD patient compared with the normal group. While proteomic iTRAQ analysis showed that proteins IDH2, FEM1A, SSBP1, CALR and NADH: ubiquinone oxidoreductase core subunit S2 (NDUFS2) were present at elevated level in KD patients' serum. Genevestigator analysis predicted also CALR and NDUFS2 to co-express with IDH2 in peripheral blood mononuclear cell and cardiomyocyte, and their elevated protein levels in KD sera were confirmed by iTRAQ. Target gene analysis of FEM1A and SSBP1 gave no predicted co-expressed genes, which were common with the differences at protein level.
In mitochondria, IDH2 encodes isocitratedehydrogenase2, which catalyzes isocitric acid to generate α-ketoglutarate in tricarboxylic acid cycle and restores the NAD + or NADP + to NAD or NADP. These two processes are important for the energy metabolism, biological synthesis and anti-oxidative stress 11 . Simvastatin can decrease the NAD in myocardium of mice, which can relieve myocardial ischemia-reperfusion injury. In this study, IDH2 was up-regulation in KD than normal control. IDH2 caused the up-regulation of NAD, which may lead to the cardiomyopathy of KD.
FEM1A is also a mitochondrial protein, and preferentially expressed in organs enriched in mitochondria, such as cardiac muscle, brain and liver. It has also been observed that FEM1A expression is increased in hearts from mice subjected to ischemia-reperfusion injury. It is localized within mitochondria of C2C12 myoblasts and cardiomyocytes 12 . In this study, FEM1A was up regulated in KD, which indicates the mitochondrial dysfunction in KD may associate with FEM1A.
SSBP1 is believed to maintain the stability of genome, and coordinate the functions of DNA polymerase and the mitochondrial DNA helicase 13 . Over-expression of SSBP1 increased the amount of elongated or fragmented mitochondria in murine C2C12 myoblast cells. On the other hand, the silencing of SSBP1 by RNA interference led to an increase in fragmented or elongated mitochondria in the cell, suggesting that SSBP1 was involved in the processes of mitochondrial fusion and fission. The expression of SSBP1 changes may cause the abnormality of  14 . Thus, it is obvious that the increased expression of SSBP1, as noticed in KD samples of this study, may cause the abnormalities of mitochondrial morphology and function. CALR is a 46 kDa, ER-lumenal Ca2 + -binding protein and molecular chaperone, which is required for proper heart development, and is induced in animal models of hypertrophic heart disease 15 . CALR expression increased in the pathologic heart, where it modulates hypertrophic growth, potentially reducing the impact of the pathology 16 . The pathological feature of KD is ventricular dilatation 17 , thus, the ventricular dilatation of KD may associate with elevated level of CALR noticed in this study. NDUFS2 is located in the hydrophilic arm of mitochondrial respiratory chain complex I, close to the membrane domain 18 , and it has been shown that NDUFS2 mutation affects mitochondrial respiratory chain complex I enzymatic function 19 . In this study, NDUFS2 was up regulated in KD patients. The result indicates that the mitochondrial respiratory chain complex I enzymatic dysfunction in KD may be associated with NDUFS2.
In TCA Cycle II (Eukaryotic) pathway, IDH3A was up regulated. Among the mammalian, IDH3A is supposed to play a major role in mitochondrial isocitrate decarboxylation in the TCA cycle 20 . GAPDH is a multifunctional protein that also mediates cell death under oxidative stress. In NADP repair pathway, it was up regulated. It reported previously that the active-site cysteine (Cys-152) of GAPDH plays an essential role in oxidative stress-induced aggregation of GAPDH associated with cell death. In isolated mitochondria, aggregates of WT-GAPDH directly induced mitochondrial swelling and depolarization, whereas mixtures containing aggregates of Cys-152 GAPDH reduced mitochondrial dysfunction 21 . The dysfunction of mitochondria in KD patient is verified by many studies as mentioned above. The up regulated of IDH3Aand GAPDH may play roles in the dysfunction of mitochondria in KD.
IDH2 has relationship with IDH3A. IDH3A and IDH2 both belong to the isocitrate dehydrogenase gene family 22 . GAPDH has relationship with CALR and IDH2 separately in protein network. GAPDH and IDH2 are both associated with the function of mitochondria 23 . The interactions of GAPDH with IDH2 and CALR will be our next step study aims.
In conclusion, Genevestigator can forecast the protein expression of the co-expression genes CALR and NDUFS2 of IDH2 in the same tissue or organ. The genes IDH2, FEM1A, SSBP1, CALR, NDUFS2, IDH3A, GAPDH and their encoded proteins may play important roles in the pathogenesis of KD. In addition, TCA Cycle II (Eukaryotic) pathway and NADP repair pathway may also participate in the pathogenesis of KD. Overall, the gene microarray combined iTRAQ comparative proteomics analysis may be a beneficial approach to reveal the molecular mechanisms of other diseases.

Materials and Methods
Collection of peripheral blood samples. Patients with KD came from the KD-affected areas, which are Xunyi and Huangling Counties at Shaanxi Province of China. Clinical evaluation was according to the KD diagnosis criteria in China (WS/T 210-2011). Peripheral blood samples were collected from 112 KD patients (61 females and 51 males, 43-70 years old) and 112 normal controls (60 females and 52 males, 45-67 years old) were matched by age and gender. Each sample is 2.5 ml. Whole blood samples from 106 KD patients and 106 normal controls were collected in heparinized vacutainer tubes containing RNA stabilizing solution. Sera of other 6 KD and 6 normal controls were collected after centrifugation of whole blood, 3000 rpm, 5 min. All samples were stored at −80 °C.
The normal control subjects didn't contain cardiovascular disease, hypertension or diabetes patients. Every subject involved in the investigation signed the informed consent. This investigation obtained the approval of Human Ethics Committee of Xi'an Jiaotong University, Xi'an, Shaanxi, China. We confirmed that all experiments were performed in accordance with the relevant guidelines and regulations.
RNA extraction and microarray analysis. Peripheral blood mononuclear cell (PBMC) RNA was extracted from the peripheral blood of 106 KD and 106 normal controls. One hundred KD (53 females and 47 males, 43-70 years old) and 100 normal controls (52 females and 48 males, 46-66 years old) were used for the oligonucleotide microarray. The rest of the RNA samples were used for quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. The protocol of PBMC RNA extraction and oligonucleotide microarray analysis were done as previously described 24 . The oligonucleotide microarray contained altogether 78 probes for the 78 differentially expressed genes 5 . Fold changes ≥3 and ≤0.33 between KD and control samples were considered as up-and down-regulated genes, respectively.
Quantitative RT-PCR. Six KD (4 females and 2 males, 45-60 years old) and 6 normal controls (4 females and 2 males, 45-58 years old) were matched by gender and age. RNA was extracted from the peripheral blood of all the samples. To validate the microarray results, three differently expressed genes isocitrate dehydrogenase [NADP(+)] 2, mitochondrial (IDH2), fem-1 homolog A (FEM1A) and single stranded DNA binding protein 1 (SSBP1) genes were selected as three differently expressed target genes for qRT-PCR. qRT-PCR was done by following the manufacturer's recommended protocol (Invitrogen, Germany). The qRT-PCR data were log-transformed to ensure normal distribution and analyzed using paired t-tests.
Co-expression gene forecast analysis. Genevestigator (https://genevestigator.com/gv/) is a powerful tools search engine for gene expression with advanced analysis possibilities, including the search for genes that are specifically expressed under certain conditions, and the search for groups of genes sharing similar expression patterns by means of clustering and biclustering algorithms 25 . In this study, the co-expression function of Genevestigator was used to find co-regulated genes with a gene of interest. IDH2, FEM1A and SSBP1 were chosen as target genes separately. Firstly, in order to find conditions relevant for the gene of interest, perturbations tool was applied. P-value 0.05 and fold-change 2 or 0.5 were selected as criteria. Secondly, in order to find other genes regulated under the same conditions, co-expression tool was used to find the top 30 genes most likely co-expressed with the target gene across these relevant conditions. Finally, hierarchical clustering tool was applied to cluster the top 30 co-expressed genes by similarity across anatomical parts.
Mass spectrometry (LC-MS/MS) analysis. Sera of 6 KD (4 females and 2 males, 58-66 years old) and 6 normal controls (4 females and 2 males, 55-67 years old) were used for iTRAQ analysis. The total protein of each sample was extracted using ProteoMiner TM Kits (Bio-Rad, USA) according to the instruction. One hundred μg total protein was taken from each sample. After digested with Trypsin Gold (Promega, USA), the tryptic peptides were processed using iTRAQ reagent (Applied Biosystems, USA) according to the manufacturer's protocol. LC-20AB HPLC Pump system (Shimadzu, Japan) was used for SCX chromatography. TripleTOF 5600 System (AB SCIEX, USA) was used for data acquisition. Mascot 2.3.02 (Matrix Science, UK) was used for protein identification and quantification. The protein abundance with fold changes ≤0.67 and ≥1.5 (KD patients/normal ones) was considered as down-and up-regulated ones, respectively. Function analysis. Functional annotations of the differently expressed proteins were conducted using Blast2 GO (Gene Ontology) program against the non-redundant protein database (NR, NCBI). GO is an international standardization system of gene function classification. It has three ontologies, which can describe molecular function, cellular component, and biological process. COG (Cluster of Orthologous Groups of proteins) database (http://www.ncbi.nlm.nih.gov/COG/) was used to classify and group these identified proteins.
IPA (Ingenuity Pathway Analysis, http://www.ingenuity.com) is a repository of biological interactions and functional annotations from many relations between proteins, genes, complexes, cells, tissues, metabolites, drugs and diseases. IPA selected sources and databases from NCBI databases (for instance, Entrez Gene, Ref-Seq, OMIM disease associations, miRNA-mRNA target databases, GWAS databases and KEGG) 24 . The function of "Canonical pathway" in IPA was used to analyze the significant pathway. The genes from the dataset that met the log ratio cut-off of 3were considered relevant to be included in the analyses. We chose pathways that were statistically significant with a P-value ≤ 0.05 using the Fisher's exact test.
STRING database (functional protein association networks, http://string.embl.de/) provides a score for each gene-gene or protein-protein interaction, which is computed as the joint probability of the probabilities from the different evidence channels (e.g., protein interaction, fusion, co-expression, text mining), correcting for the SCIENTIFIC RePoRTS | (2018) 8:231 | DOI:10.1038/s41598-017-18599-x probability of randomly observing an interaction 26 . All the differently expressed proteins were analyzed using STRING-10 server. An interactome network was built for these sets of proteins to find out protein-protein interaction and to predict functional associations. Data Availability. The datasets generated during the current study are available from the corresponding author on reasonable request.