Identification of novel genes and pathways in carotid atheroma using integrated bioinformatic methods

Atherosclerosis is the primary cause of cardiovascular events and its molecular mechanism urgently needs to be clarified. In our study, atheromatous plaques (ATH) and macroscopically intact tissue (MIT) sampled from 32 patients were compared and an integrated series of bioinformatic microarray analyses were used to identify altered genes and pathways. Our work showed 816 genes were differentially expressed between ATH and MIT, including 443 that were up-regulated and 373 that were down-regulated in ATH tissues. GO functional-enrichment analysis for differentially expressed genes (DEGs) indicated that genes related to the “immune response” and “muscle contraction” were altered in ATHs. KEGG pathway-enrichment analysis showed that up-regulated DEGs were significantly enriched in the “FcεRI-mediated signaling pathway”, while down-regulated genes were significantly enriched in the “transforming growth factor-β signaling pathway”. Protein-protein interaction network and module analysis demonstrated that VAV1, SYK, LYN and PTPN6 may play critical roles in the network. Additionally, similar observations were seen in a validation study where SYK, LYN and PTPN6 were markedly elevated in ATH. All in all, identification of these genes and pathways not only provides new insights into the pathogenesis of atherosclerosis, but may also aid in the development of prognostic and therapeutic biomarkers for advanced atheroma.

extension of cDNA array studies on animal material to clinical use 11 . Features of unstable plaques, such as surface ulceration, rupture, intraplaque hemorrhage and thrombus, may also occur in both asymptomatic and symptomatic patients, which may also confound studies 6 that classify samples according to patient symptomatology 12 . Additionally, the relative lack of systematic bioinformatic analysis of cDNA microarrays in current studies limits the effective exploitation of gene-expression data sets 10 . Therefore, an integrated bioinformatic analysis based on cDNA microarray studies of human tissues may help to clarify the mechanisms underlying the development and progression of atherosclerosis.
To our knowledge, the variations between different individuals or blood vessels may affect the reliability of studies and it is very difficult to obtain healthy and diseased tissue from the same blood vessel of the same individual in human studies. To overcome the difficulty, we used a gene expression dataset from a previously published study 13 , comparing atheroma and its surrounding tissues from the same individual to track gene changes with disease progression and validated our findings with similar tissues. Besides, to interpret the biological relevance of these changes in gene expression, the microarray data were analyzed by integrated bioinformatic analysis expanding on traditional microarray analysis methods, namely gene-ontology and pathway analysis, thereby allowing the construction of interaction networks, that might identify novel prognostic markers and therapeutic targets.

Results
Identification of differentially expressed genes. Through our microarray analysis, a total of 816 differentially-expressed genes (DEGs) were identified between atheroma plaque (ATH) and macroscopically-intact tissue (MIT), including that 443 genes were up-regulated and 373 genes were down-regulated (Fig. 1A,B). The greatest fold differential expressions were a six-fold up-regulation of the FABP4 gene (fatty acid-binding protein 4) and a 3.3-fold down-regulation of the CNTN1 gene (contactin 1) in ATH compared with MIT.
Gene ontology and pathway analyses. Two hundred-ninety and 26 GO terms were significantly enriched among the up-regulated and down-regulated genes, respectively. Table 1 shows the ten most overrepresented GO terms for the up-regulated and down-regulated DEGs, including immune-related biological process, such as "cell activation" and "cytokine production", and "muscle system process". Meanwhile, the KEGG pathway analysis identified 77 and 26 significantly enriched pathways for up-regulated and down-regulated genes, respectively. The ten most overrepresented KEGG pathways for up-regulated and down-regulated DEGs are shown in Table 2, with "B cell receptor signaling pathway" and "TGF-beta signaling pathway" being most significantly enriched.
Protein-protein interaction (PPI) network analysis. We constructed a PPI network to identify more important proteins and biological modules that may play crucial roles in the development of atherosclerosis. To confine the interactions only to those close to the DEGs, only first level interactions between DEGs and their neighbors were selected. There were 3,990 PPI pairs and 2,491 nodes in our constructed PPI network. The degree represents the number of neighboring nodes in the network and changes in the proteins/genes with higher degrees have more effects than changes in those with smaller degrees. SMAD9, LYN, PTPN6, ZBTB16, SYK, PRKCB, SVIL, VAV1, BMPR1B and BTK were located in the more important positions of network with higher degrees of 108, 102, 95, 81, 74, 66, 63, 62, 60 and 56, respectively, indicating those proteins play irreplaceable and critical roles in the maintaining the whole protein interactions in the network ( Fig. 2A).
CFinder software was used to identify the disease-relevant modules in the PPI network. Figure 2B shows the module containing the most nodes with parameter k = 5. This module contained eight DEGs, including CSF2RB, The dots indicate that up-(red) and down-regulated (blue) DEGs were significant both at false-discovery rate < 0.05 and Fold-change > 1.5 or < 0.667.  Table 2. Top 10 most overrepresented KEGG pathways for the DEGs. Count: the number of differentially expressed genes; P-value was obtained by hypergeometric test; P-value was adjusted by Benjamini-Hochberg method. If the p-value is less than 2.2E-16 in R tool, it will be automatically changed to 0, and FDR should also be 0.
LCP2, LYN, PLCG2, PTPN6, PTPRC, SYK and VAV1. Jointly using topology of network and module analysis to select candidates would identify genes that have higher significance in the PPI network. Finally, we focused on four genes (SYK, LYN, PTPN6 and VAV1) that were overlapped between the top 10 nodes ranked by degrees and the disease-relevant modules in the PPI network ( Table 3). The results from qRT-PCR showed that mRNA level of VAV1 (12.6 ± 5.0), LYN (3.7 ± 1.1), SYK (13.1 ± 4.4) and PTPN6 (11.2 ± 4.9) increased by 12.6, 3.7, 13.1 and 11.2 folds respectively in ATH compared with in MIT (Fig. 3A). The results are consistent with data from microarray analysis although the differences in mRNA level were even higher than the differences determined in the microarray analysis. Moreover, the whole lysates from four sets of atherosclerotic samples (4 ATH and 4 MIT, n = 8) were analyzed by western blot. As Fig. 3B

Discussion
Microarray studies have great potential to provide novel insights into the pathogenesis of complex diseases. In the present study, we systematically applied integrated bioinformatic methods to mine new candidate players in the process of atherosclerosis and validated our findings in an independent set of samples at both mRNA and protein levels. In our study, we identified a total of 816 genes differentially expressed in ATH compared with MIT, including 443 up-regulated and 373 down-regulated DEGs. GO functional-enrichment analysis of these DEGs showed that genes mainly related to inflammation and immune responses were altered with disease progression. "Cell adhesion", "proliferation", "differentiation", "motility", "cell death", "lipid metabolism" and "immune response" have   Table 3. The candidate genes selected by our analysis. * If the p-value is less than 2.2E-16 in R tool, it will be automatically changed to 0, and FDR should also be 0. all been reportedly associated with atherosclerosis 14 , and these processes were also identified in our enrichment results. Interestingly, although we identified similar numbers of up-regulated and down-regulated genes in ATHs, we observed an excess of significant GO categories for up-regulated genes, suggesting that the up-regulated genes are functionally more important in atherosclerosis progression. Pathway-enrichment results showed an overabundance of immune and inflammatory signals, represented by the "chemokine-signaling pathway", "natural killer cell-mediated cytotoxicity", and "Fcε RI-signaling pathway" in atherosclerosis. Our finding indicates that innate and adaptive immune cells might contribute to the development and progression of atherosclerosis, especially in the advanced stages. Hypercholesterolemia was initially considered to be the major risk factor for atherosclerosis, but recent advances have proven that chronic inflammation and autoimmunity play major roles in the initiation and progression of the disease 15 , as supported by our pathway-enrichment results. In addition, the "TGF-β signaling", "calcium signaling", and "osteoclast differentiation" were all significantly enriched among DEGs.
Mast cells are frequently in an activated state and participate in the process of atherosclerosis 16 . "Fcε RI-mediated signaling" in mast cells is initiated by the interaction of an antigen with IgE bound to the extracellular domain of the α -chain of Fcε RI. The mast cells activated by crosslinking of the Fcε RI via IgE-antigen complexes could release and secrete biogenic amines, cytokines, lipid mediators and proteoglycans, which contribute to inflammatory responses. IgE and Fcε RI have been implicated in several aspects of autoimmunity and chronic inflammatory diseases 17 . LYN, SYK and VAV1, the key modulators in our PPI network and module, are also implicated in this pathway ( Figure S1). Our results indicated that this pathway might play a crucial role in the process of atherosclerosis; however, there is currently no proof of a relationship between "Fcε RI-mediated signaling" and atherosclerosis. And it is worth noting that calcium signaling is associated with Fcε RI -mediataed signaling. "Calcium signaling" 18 , "TGF-β signaling" 19 and "osteoclast differentiation" 20 have all been shown to be involved in the process of atherosclerosis. Our pathway-enrichment analysis supported the involvement of some pathways known to be associated with atherosclerosis initiation and progression, and also highlighted the Fcε RI-mediated signaling pathway, which, to the best of our knowledge, has not previously been reported in association with carotid atheroma.
Our PPI network and further module analysis showed that VAV1, SYK, LYN and PTPN6 overlapped between the top 10 nodes and the disease-relevant module, suggesting that these genes play more crucial roles in the pathogenesis of carotid atheroma. Several studies have indicated important related functions for SYK and VAV1, suggesting that they play significant roles in atherosclerosis. SYK has been reported to be involved in the pathogenesis of atherosclerosis by activating monocyte chemotactic protein-1 expression 21 . Choi et al. 22 found that TLR4/SYK-mediated macrophage responses may contribute to chronic inflammation in human atherosclerosis. Furthermore, the SYK inhibitor fostamatinib attenuated atherogenesis in mice, suggesting that SYK is a potential anti-inflammatory therapeutic target in atherosclerosis 23 . The validation study also indicated SYK was up-regulated both at the level of mRNA and protein which strengthens the assertion that it might play an important role in atherosclerosis development.
VAV1, a member of the VAV gene family, is expressed exclusively in hematopoietic cells. It is a signal transduction molecule that acts as guanine nucleotide exchange factor for Rac1 and Rho GTPases, and also functions as an adaptor platform 24 . VAV1 impacts on processes that are highly relevant to atherogenesis, such as NADPH oxidase-mediated generation of reactive oxygen species, cell death, and leukocyte activation. An in vivo carotid artery thrombosis model showed that genetic deletion of Vav1 and Vav3 together may prevent the development of occlusive thrombi in mice fed a high-fat diet 25 . Deletion of Vav1 alone led to modest inhibition of oxidized low-density lipoprotein uptake and foam-cell formation, while deletion of both Vav1 and Vav3 led to nearly complete inhibition of oxidized low-density lipoprotein uptake and foam-cell formation, suggesting that Vavs act as a critical regulator in the process of atherogenesis, and thus represents a novel therapeutic target 26 . In our validation study, VAV1 was not detected in the atherosclerotic samples at the protein level by current antibody, which worked well in the experiment of positive control tissues. Another VAV1 antibody was also applied, which still could not dectect the expression of this gene. Though we found similar expression patterns for upregulated VAV1 in both the qPCR and microarray analyses of ATH samples, VAV1 may not play a major role in the progression of atherosclerosis because its expression may be blocked at the translation phase.
LYN encodes a tyrosine protein kinase that is involved in the regulation of mast-cell degranulation. Lyn is the major Src-family kinase regulating glycoprotein VI signaling, and its absence caused a delay in activation and a marked reduction in platelet aggregation on collagen in a laser-injury model 27 . However, an apparently contradictory study showed that Lyn inhibited platelet activation, and that Lyn was increasingly inactivated as platelet aggregation progressed 28 . Miki et al. 29 suggested that Lyn plays an important role in the metabolism of serum lipids, and could induce the expression of monocyte chemotactic protein-1, which is related to atherosclerosis, during the development of atherosclerotic lesions on high-fat diets. Previous studies reflect the complex roles of LYN and our validation experiments showed that LYN was up-regulated in the atheroma at the levels of mRNA and proteins, which might promote the progression of this disease.
PTPN6 is a member of the protein tyrosine phosphatase family of signaling molecules. It regulates a variety of cellular processes including cell growth, differentiation, mitotic cycle, and oncogenic transformation 30 . Kamata et al. 31 suggested that PTPN6 acts as a negative regulator in the development of allergic responses such as allergic asthma. Dubois et al. 32 concluded that PTPN6 played a crucial role in the negative modulation of insulin action and clearance in the liver, thus regulating whole-body glucose homeostasis. However, here we found PTPN6 up-regulated in atheroma at both the mRNA and proteins level and for the first time linked PTPN6 to atherosclerosis.
Additionally, to ensure the robustness of our candidate DEGs, we confirmed the expression of our candidate genes in another dataset GSE28829 (Supplementary Figure S2). The analysis revealed that a similar representation of the gene expression patterns of our candidate DEGs was seen in the dataset GSE28829, suggesting that VAV1, SYK, LYN and PTPN6 genes may play an important role in progression of atherosclerosis. To our knowledge, this is the first integrated bioinformatic analysis comparing gene expression between carotid plaque and macroscopically intact arterial tissue. The large-scale gene expression profile analysis in our study is a significant strength in addition to the fact that paired tissue samples were obtained from the same individual. Our integrated methods are based on pre-specified specific algorithms, established topology information of networks and existing knowledge from databases and literature. The integrated methods have an advantage over traditional, single-analysis microarray approaches and other enrichment-analysis methods, such as DAVID 33 and NetGestalt 34 and ensure the reliability and accuracy of the results. Key candidates were also validated in Chinese patients, indicating the generalization of molecular mechanisms among different ethnic groups. Meanwhile, our integrated bioinformatic analysis might reveal the relationship between DEGs at molecular interaction and pathway levels, which provided some clues for the deeper mechanism of our candidate DEGs.
The discrepancies of expression between the qRT-PCR and microarray results may have been caused by a sensitivity bias between the two methods, difference in ethnicity, diet and lifestyle between French and Chinese people or by the use of different statistical methods in qRT-PCR and microarray. However, there are also some limitations in our study. First, the study population from the microarray analysis underwent carotid endarterectomy at the university hospital of Lyon, so the gene expression profile may be influenced by their ethnicity, diet and lifestyle. Secondly, the cohort consisted of older subjects that were predominantly male and the majority had hypertension. The generalization of our findings is unknown as the present results are limited to high risk populations with signs of atherosclerosis and severe symptoms. Finally, our work is a reanalysis of previously published dataset and although some previous work and our validation experiments support our gene expression analysis results, the work requires further study to identify the mechanism of action and to assess the relevance of our findings. This work serves as an excellent foundation and reference for further studies to expand on these findings in the future.

Conclusion
This study identified SYK, LYN, PTPN6 and the "Fcε RI-mediated signaling pathway" as potential candidate players involved in the pathogenesis of atherosclerosis. These findings enhance our understanding of the molecular mechanisms of this important disease. Further studies, such as gene functional studies, are needed to support the results of our study, with the aim of identifying candidate biomarkers with sufficient predictive power to act as prognostic and therapeutic biomarkers for advanced atheroma.

Methods
Source of data. An existing dataset GSE43292 within the Gene Expression Omnibus database was used for this work and obtained through approved access. The dataset was generated using the Affymetrix Human Gene 1.0 ST Array 13 . Ethical approval, sample tissue collection and preparation methods and characteristics of study participants were described in a previous report 13 . In brief, the dataset included 32 from 34 consecutive patients admitted to the university hospital of Lyon in 2009 for carotid endarterectomy. Paired samples were taken from individuals meaning that 64 carotid artery samples were analyzed. The mean age of participants was 70 years (±10 years) and the majority were male and with hypertension, with elevated blood lipid levels and just over one third of the sample were diabetic (Supplementary Table S1) 13 . Tissue samples were removed during surgery and dissected into two fragments: atheroma plaque tissue (ATH, subsequently identified as mostly stage IV and/or V lesions according to the American Heart Association classification) and macroscopically intact tissue (MIT, almost exclusively composed of stage I and II lesions) 13 .
Identification of differentially expressed genes. The raw gene dataset obtained from the previous work 13 was converted into expression measures, and background correction and quantile data normalization were performed using the robust multiarray average algorithm from the Affy package to obtain the expression profile data 35 . After deleting duplicated probes and averaging the multiple probes values for the same Entrez Genes (the unique integers as identifiers for gene records) 36 , we finally obtained expression profiles for 19,924 genes in the 64 samples.
Because the differentially expressed genes (DEGs) might have stronger relationship with the development of disease, the significance analysis of microarrays 37 and fold-change methods were jointly used to identify DEGs between ATH and MIT.
Functional-enrichment analysis. We integrated GO annotation into the total DEGs by mining for enriched GO terms of proteins using the R-based GO function software packages, which extracts biologically relevant terms from statistically significant GO terms for a disease 38,39 .
The DEGs were chosen for further analysis of Kyoto Encyclopedia of Gene and Genome (KEGG) enrichment. The SubpathwayMiner is a pathway identification system 40 and accurately assessed the pathway structure to locate disease-relevant KEGG pathways 41 and subpathways in DEGs relative to the genomic background.

Protein-protein interaction network construction.
In the study, we downloaded protein-protein interaction (PPI) data from human protein reference database (Release 9) on the website (http://www.hprd. org/) 42 . These interactions were derived from literature of experimental validation, including physical interactions and enzymatic reactions found in signal transduction pathways. The PPI data were preprocessed, including removing redundancy and self-loops, resulting in a connected network of 9,618 nodes (unique Entrez IDs) and 39,240 documented interactions. PPI networks are visualized in Cytoscape 43 with the nodes representing the proteins/genes and the edges representing interactions between any two proteins/genes.
We constructed the PPI network by mapping the DEGs to the PPI network using the following steps. First, we extracted the nodes and relationships between DEGs and their direct interacting neighbors to confine the interactions only to those close to the DEGs using R software 44 , with each pair of interacting proteins in two lists of a text file. The DEGs (gene symbols) were listed in a NOA file with different node attribution annotations (down-regulated genes, up-regulated genes) and mapped to the constructed PPI network by the menu of "File-Import-Node Attributes". Second, the degrees of nodes in the PPI networks were calculated by Network Analysis plugin by the menu of "Plugins-Network Analysis-Analyze network". In our network, the degree of a node was the number of neighboring nodes in the network and node size was proportional to the degree of the protein. Third, CFinder software 45 was used to find disease-related modules based on the Clique Percolation Method 46 , which is a free software for finding and visualizing overlapping dense groups of nodes in networks. PPI data from a text file was imported using the menu of "File-Open new network-run" with default parameters. The results of CFinder are highly correlated to the value of the parameter k. Larger k values correspond to smaller subgraphs with a higher density of links within them.

Ethics statement and validation study sample collection. The collection of clinical samples was
under approval by the Medical Ethics Committee of NanFang Hospital (Number: NFEC-2014-117) and informed consent was obtained from all subjects. The study was carried out in accordance with the standards set by the Declaration of Helsinki and Good Clinical Practice guidelines.
Human carotid atherosclerotic plaques were obtained from patients who underwent endarterectomy at the Vascular Surgery Department of Nanfang hospital of Southern medical university (Guangzhou, China). Patients (n = 8, mean age: 67.3 years, range: 53-80 years) with internal carotid artery stenosis > 70% were included. The carotid atheromas were separated as ATH and MIT according to macroscopic observation. Dissected vascular tissues were rapidly frozen in liquid nitrogen and stored at -80 °C. Sample characteristics used for each experiment are shown in supplementary information Table S2.
Quantitative real-time PCR (qRT-PCR). Total RNA from specimens (n = 16) was isolated using Trizol reagent (TaKaRa Bio Inc, Japan) according to the manufacturer's instructions. Complementary DNA was synthesized from 1000 ng of total RNA using the PrimeScript TM RT reagent Kit with gDNA Eraser (TaKaRa Bio Inc, RR047Q) according to the manufacturer's instruction, including the DNase step. Amplification was performed using SYBR ® Premix Ex Taq TM (TaKaRa Bio Inc, RR420A). Quantitative real-time polymerase chain reaction (q-PCR) analysis was performed on Lightcycler96 (Roche Applied Science) according to the manufacturer's protocol. GAPDH was used as internal control to normalize mRNA levels. All experiments were repeated three times. Primer sequences are listed in Table 4. Analysis was performed by the comparative delta-delta-Ct method 47 .
Western blotting. Samples (n = 8) were taken from ultra-low temperature freezer and crushed in lysis buffer under liquid nitrogen. After detecting the concentration, proteins were separated on 7% SDS-polyacrylamide gel electrophoresis and transferred onto nitrocellulose membranes. Membranes were blocked with TBS-T(TBS/0.1% Tween-20) containing 5% non-fat dry milk for 1.5 hour at room temperature. Then, they were probed successively with mouse monoclonal anti-VAV1 (Cell Signaling Technology, Danvers, MA, USA), rabbit monoclonal anti-LYN (Abcam, Cambridge, MA, USA), rabbit monoclonal anti-PTPN6 (Abcam, Cambridge, MA, USA), and rabbit monoclonal anti-SYK (Abcam, Cambridge, MA, USA) antibodies at 4 °C overnight. Mouse monoclonal antibodies against GAPDH (Cell Signaling Technology, Danvers, MA, USA) were used as a loading control. Membranes were washed in TBS-T (TBS/0.1% Tween-20) three times for 5 min and probed with an anti-rabbit or -mouse HRP-conjugated secondary antibody in TBS-T with 5% of nonfat dry milk at room temperature for 1.5 h.
Protein detection was performed using ECL reagents. Western blot bands were scanned using the ChemiDoc ™ XRS Imaging System (BioRad, USA). Western blot bands were quantified using ImageJ software by measuring the band intensity for each group and normalized by GAPDH. The final results are expressed as fold changes by normalizing the data to the control values.
Statistical analyses. Microarray analysis: DEGs for the microarray were identified using the fold-change and significance analysis of microarrays methods, with multiple testing corrections applied using the Benjamini-Hochberg method 48 . False-discovery rate < 0.05 and fold-change > 1.5 or < 0.667 were set as the cutoff values of DEGs. For the functional enrichment analysis, significantly enriched GO terms in DEGs relative to the genomic background by GO function software packages were identified using the hypergeometric tests with an adjusted p-value < 0.01, calculated by the Benjamini-Hochberg method 48 . Pathway-enrichment analysis was  done using the R-based SubpathwayMiner software packages. Significantly enriched pathways were identified using hypergeometric tests and a p-value < 0.01 was applied as the cut-off value for statistical significance.
Validation study: The data in this study are shown as the mean ± S.D. For the real time PCR, groups were compared using the Wilcoxon signed-rank test for continuous variables (SPSS 19.0, Chicago, IL) and a 2-sided p value < 0.05 was considered statistically significant.