Integrative approach for differentially overexpressed genes in gastric cancer by combining large-scale gene expression profiling and network analysis

Gene expression profiling is a valuable tool for identifying differentially expressed genes in studies of disease subtype and patient outcome for various cancers. However, it remains difficult to assign biological significance to the vast number of genes. There is an increasing awareness of gene expression profile as an important part of the contextual molecular network at play in complex biological processes such as cancer initiation and progression. This study analysed the transcriptional profiles commonly activated at different stages of gastric cancers using an integrated approach combining gene expression profiling of 222 human tissues and gene regulatory dynamic mapping. We focused on an inferred core network with CDKN1A (p21WAF1/CIP1) as the hub, and extracted seven candidates for gastric carcinogenesis (MMP7, SPARC, SOD2, INHBA, IGFBP7, NEK6, LUM). They were classified into two groups based on the correlation between expression level and stage. The seven genes were commonly activated and their expression levels tended to increase as disease progressed. NEK6 and INHBA are particularly promising candidate genes overexpressed at the protein level, as confirmed by immunohistochemistry and western blotting. This integrated approach could help to identify candidate players in gastric carcinogenesis and progression. These genes are potential markers of gastric cancer regardless of stage.

Gastric cancer remains a major cause of cancer deaths worldwide despite early detection and curative surgery. Prognosis is favourable in early-stage disease with 5-year survival rates of 90% reported following gastrectomy and lymph node dissection. In contrast, patients diagnosed with advanced-stage cancer have 5-year survival rates of 20 -30%, and the overall poor survival outcome for gastric cancer is attributed to these patient populations (Dicken et al, 2005). An efficient system for detecting disease status in gastric cancer regardless of its clinical stage is clearly needed to improve overall survival.
Gastric cancer is routinely classified according to the tumournode-metastasis parameters of the primary tumour, lymph nodes, and metastasis. This classification helps the clinician to stage the tumour and develop a management strategy, as well as to provide an indication of prognosis. However, this conventional classification is not strong enough to predict individual prognosis, rendering uniform adjuvant therapy of limited value because of unnecessary adverse events. The use of molecular markers or gene profiling coupled with multivariate predictive models is designed to attain more accurate prognostic models. Recent molecular analyses revealed that gastric cancers closely associate with alterations in several interesting genes, such as p53 (Tamura et al, 1991;Uchino et al, 1993), p21 (Czerniak et al, 1989), c-met (Kaji et al, 1996), TGF-b (Park et al, 1994;Nakamura et al, 1998), and b-catenin (Park et al, 1999). However, these single candidate molecules yield different results among studies and the available data are unconvincing. Thus, the potential use of combinations of multiple markers instead of a single marker has been previously commented upon for the understanding of cancer biology or the prediction of patient prognosis (Lee et al, 2007).
The past decade has seen a revolution in high-throughput technologies for molecular profiling in cancer research. Particularly, gene expression profiling has enabled researchers to quantify biological states and consequently uncover subtle phenotypes important in cancer. Such analyses of tumour tissues have provided unique opportunities to develop profiles that can distinguish, identify, and classify discrete subsets of disease, predict the disease outcome, and even predict the response to therapy (Golub et al, 1999;Perou et al, 2000;van 't Veer et al, 2002;van de Vijver et al, 2002;Pittman et al, 2004). For example, expression profiling in gastric cancer identified novel target molecules involved in gastric carcinogenesis by comparing cancerous and healthy tissues (Boussioutas et al, 2003;Kim et al, 2003Kim et al, , 2005.
Despite their potential power, gene expression profiling has major limitations. Interpreting the significance of identified genes without any unifying biological theme can be difficult, makeshift, and dependent on the biologist's area of expertise. It is frequently challenging to understand a specific regulatory network involving enormous numbers of proteins. Furthermore, an approach that ignores biological cues may generate poor reproducibility among different studies of the same biological system. To overcome these analytical challenges, several recent studies have focused on phenotypic analysis of primary tumours using gene expression profiling, with a view to further understanding the roles of signalling pathways deregulated by the oncogenic process .
This study sought to identify transcriptional profiles commonly activated across a wide range of stages in gastric cancer, as well as core networks in gastric carcinogenesis. It used an integrated approach combining gene expression profiling of over 200 human tissues with dynamic gene mapping. We identified seven candidates among the network that reflected essential transcriptional features of neoplastic transformation and progression, and validated these quantitatively by real-time reverse transcription (RT) -PCR. We also evaluated the expression of the encoded proteins in gastric cancer tissues by immunohistochemistry and western blotting, and identified novel potential markers for detecting gastric cancers.

Tissue samples
Samples were obtained from 222 patients with gastric cancer who underwent curative resection at the following institutions: Osaka University Hospital, National Osaka Hospital, Osaka Medical Center for Cancer and Cardiovascular Diseases, Sakai Municipal Hospital, Toyonaka Municipal Hospital, Mino Municipal Hospital, NTT West Osaka Hospital, Kinki Central Hospital, Suita Municipal Hospital, and Kansai Rosai Hospital. None of the patients received chemotherapy or radiotherapy before surgery. Tissues were evaluated macroscopically and microscopically according to the general rules for gastric cancer study in surgery and pathology in Japan. All cancers showed a depth of invasion beyond the subserosa. The clinical and pathological features are listed in Table 1. All aspects of our study protocol were performed according to the ethical guidelines set by the committee of the three Ministries of the Japanese Government, and each subject provided informed consent.

Extraction of RNA and quality assessment
The tumour specimens were cut into pieces (approximately 8 mm 3 ) within 2 h after surgical resection and stored in RNAlatert (Ambion, Austin, TX) at À801C until use. Total RNA was purified from clinical samples using TRIzol reagent (Invitrogen, San Diego, CA, USA) according to the protocol supplied by the manufacturer. RNA integrity was assessed using an Agilent 2100 Bioanalyzer and RNA 6000 LabChip kits (Yokokawa Analytical Systems, Tokyo, Japan). Only high-quality RNAs with intact 18S and 28S sequences were used for the subsequent analysis. Fifteen RNA samples extracted from normal gastric epithelium were mixed as a reference control.

Preparation of fluorescently labelled aRNA targets and hybridisation
Extracted RNA samples were amplified with T7 RNA polymerase using the Amino Allyl MessageAmpt aRNA kit (Ambion) according to the protocol provided by the manufacturer. The quality of each Amino Allyl-aRNA sample was checked on the Agilent 2100 Bioanalyzer. Five mg of control and experimental aRNA samples were labelled with Cy3 and Cy5, respectively, mixed, and then hybridised on an oligonucleotide microarray covering 30 000 human probes (AceGene Human 30K; DNA Chip Research and Hitachi Software Engineering Co, Yokohama, Japan). The experimental protocol is available at http://www.dna-chip. co.jp/thesis/AceGeneProtocol.pdf. The microarrays were scanned using a ScanArray 4000 (GSI Lumonics, Billerica, MA, USA).

Analysis of microarray data
Signal values were calculated by DNASISArray software (Hitachi, Tokyo). Following background subtraction, data with low signal intensities were excluded from additional investigation. In each sample, the Cy5/Cy3 ratio values were log-transformed and globally equalised to remove deviation of the signal intensity between whole Cy3-and Cy5-fluorescence by subtracting the median of all log (Cy5/Cy3) values from each log (Cy5/Cy3) value. Supplementary information is available on our website (http://www.dna-chip.co.jp/).

Network analysis
The Ingenuity Pathway (INGP) analysis was used to depict several networks in gastric cancer. The INGP software is a web-delivered application that enables biologists to discover, visualise, and explore therapeutically relevant networks significant to gene expression data sets. A detailed description of INGP analysis is available at Ingenuity Systems website (http//www. ingenuity.com). The average log 2 expression values were used to calculate the fold change between gastric cancer and normal epithelium. The data set containing gene identifiers and their corresponding expression values were then uploaded into the INGP as a tab-delimited text file for analysis. Each gene identifier was mapped to its corresponding gene object in the Ingenuity Pathway Knowledge Base.
To understand how the genes identified by inferential statistics are related as focus genes, we uploaded the target genes into the Ingenuity Knowledge Base and generated several networks. On the basis of focus genes, new and expanded pathway maps, connections, and specific gene -gene interactions were inferred, functionally analysed, and used to build on the existing pathway knowledge base. To generate networks, the knowledge base was queried for interactions between focus genes and all other gene objects stored therein. The output, displayed graphically as nodes (genes) and edges (the biological relationship between the nodes), represented a significantly consistent number of biological pathways and functions implicated by the empirical data sets.

RT reaction
Complementary DNAs (cDNAs) were generated with avian myeloblastosis virus reverse transcriptase (Promega, Madison, WI, USA) using the protocol recommended by the manufacturer. Briefly, 1 mg of RNA was mixed with RT reagents including oligo-(dT) 15 primer and incubated at 421C for 15 min, followed by heating at 951C for 5 min for enzyme inactivation.

Analysis of microarray data
The gene expression profiles of 222 primary gastric cancers were analysed on a 30K oligonucleotide DNA microarray. Of the full gene sequences (29 638 expressed genes excluding control spots), 271 (0.9%) genes showed 41.5-fold change in differential expression in at least 100 samples. Among these 271 genes, 50 had been described previously in gastric cancers, whereas 187 genes were previously not described in gastric cancer and 34 genes were categorised into ESTs (expressed sequence tags).

Network analysis
Analysis of the commonly overexpressed 271 genes using the Ingenuity Knowledge Base generated several networks that identified 203 genes as focus genes. The knowledge base generated 17 networks composed of focus genes and all other gene objects stored in the base (Table 2). On the basis of overlapping networks, network-5 was found to be central (Supplementary Figure 1). The centred network-5 (network-5 and close relevant networks) included a substantial number of genes already implicated in gastric carcinogenesis (Figure 1), with numerous focus genes connected by several neighbourhood genes. Furthermore, the network analysis mapped CDKN1A (p21 WAF1/CIP1 ) to the core of the centred network-5, acting as a hub by interacting with surrounding focus genes. CDKN1A is associated with disease progression and prognosis in gastric cancer (Czerniak et al, 1989;Kasper et al, 1998).
We selected seven focus genes showing 42-fold change in differential expression for further analysis. Three of these are known to be involved in gastric cancer: MMP7 (Yamashita et al, 1998), SPARC (Wang et al, 2004), and SOD2 (Janssen et al, 2000), and the other four have no such reported associations (INHBA, IGFBP7, NEK6, and LUM).
expression of the seven genes correlated with the pathological stage (P ¼ 0.011) ( Figure 2D).

Validation of mRNA levels for selected genes using quantitative RT -PCR
To provide further quantitative validation of our microarray data for the 7 genes, we analysed 13 test tumour samples by quantitative RT -PCR and compared the results with the quantified mRNA expression levels on the microarray (Figure 3). All 7 genes were highly expressed across the 13 cancers and the microarray data agreed with those obtained by quantitative RT -PCR. Similar agreement was found in a subsequent comparative analysis of 14 validation tumour samples (Figure 3). We also compared the expression of the candidate genes with the mean expression level of the corresponding genes in 8 normal tissues that were used for microarray reference control. The results showed upregulation of each candidate gene compared with that in the normal tissues ( Figure 3).

Protein expression of selected genes by immunohistochemistry and western blotting
Finally, we tested the encoded protein expression for each identified focus gene using immunohistochemistry and western blotting. Immunohistochemistry showed high expression of INHBA and NEK6 proteins in 14 of 20 and 24 of 27 tumour tissues, respectively ( Figure 4A-D), whereas IGFBP7 and LUM    proteins showed little immunoreactivity in tumour tissue relative to adjacent healthy tissue (data not shown). Each of these proteins was expressed in 450% cells in each tissue examined and all were localised into the cytoplasm. Western blotting showed strong bands for both NEK6 and INHBA in gastric cancer tissues compared to normal tissue in all three pairs ( Figure 4E).

DISCUSSION
Comprehensive gene expression profiling is a useful tool for analysing several thousands of genes in multiple samples simultaneously. In gastric cancer, this approach successfully discriminated cancerous and noncancerous tissues (Hippo et al, 2002). Since then, several studies have searched for novel genes related to carcinogenesis of gastric cancer and novel clinical subtypes related to biological malignancy using comprehensive gene expression profiling (Hasegawa et al, 2002;Ji et al, 2002;Boussioutas et al, 2003;Kim et al, 2003Kim et al, , 2005Oien et al, 2003;Jinawath et al, 2004;Motoori et al, 2005). However, these data were generally obtained from human cell lines or small-scale tissue samples. Here, we analysed the gene expression profiles of more than 200 tissue samples covering every pathological stage, and verified the findings at both the mRNA and protein levels to increase the universality of our microarray data. Such a study is more likely to identify specific expression profiles that are commonly activated and thus more reflective of crucial transcriptional features of neoplastic transformation and progression in gastric cancers. In fact, increasing recognition that this large-scale, systematic approach is necessary to view the overall molecular events responsible for carcinogenesis has spawned several recent studies combining large-scale analysis of gene expression with knowledge-based and relevance network analysis (Bredel et al, 2005;Abdel-Aziz et al, 2007). Using such an approach also Bar chart shows mRNA levels of candidate genes using quantitative reverse transcription -PCR in normal gastric tissue (n ¼ 8, microarray reference control), test samples (n ¼ 13), and validation samples (n ¼ 14). Data are mean expression level of candidate gene relative to that of GAPDH in the examined tissues.
identified significantly upregulated genes linked to activated pathways as potential key molecules in hepatocellular carcinoma (Kittaka et al, 2008). Dynamic mapping of 271 genes differentially expressed in gastric cancer tissues in this study revealed links among the majority of genes (203 genes, 84%) based on the Ingenuity Pathway Knowledge Base. This finding indicates that such gene populations do not act as individual units, but rather collaborate closely in overlapping networks during gastric carcinogenesis. Among the 17 networks identified here, network-5 was mapped to the centre of the overlapping network and contained the largest number of focus genes, implicating it as a key network. Furthermore, the identified networks assumed a cluster of robust genes implicated in gastric cancer-related genes. Our network analysis also revealed CDKN1A (p21 WAF1/CIP1 ) as a hub gene that links to a large number of nodes and possibly determines the fundamental behaviour of the network.
The clinical significance of activation of our seven selected genes was further investigated by correlating the microarray expression data with the pathological stage. As indicated in Figure 2, we found these genes could be classified into two groups: the expression levels of genes of group 1 (MMP7, IGFBP7, and NEK6), but not those of group 2 (SOD2, SPARC, LUM and INHBA), correlated significantly with pathological stage. This finding indicates that although genes of group 2 may be involved in tumour formation and survival, those of group 1 may be involved in tumour progression. Their common activation seems to serve gastric carcinogenesis and tumour survival regardless of the pathological stage, based on the finding of overexpression of all seven genes in all samples. Furthermore, the gradual increase in the mean expression with cancer stage suggests that these genes cooperate in tumour progression. These results strengthen our proposal that such candidate genes are commonly activated during gastric carcinogenesis.
We also analysed the expression of the seven candidate genes based on age, sex, location, and histopathological type. Although the expression levels of MMP7, NEK6, SOD2, SPARC, and INHBA did not correlate with any of the above factors, IGFBP7 and LUM were significantly upregulated in undifferentiated tumours compared to differentiated tumours (data not shown). These results suggest the involvement of these genes in tumour differentiation.
We also postulated that these genes are regulated by complex linkage between specific signalling pathways such as cell cycle signalling and TGF-b signalling, and that targeting several genes around CDKN1A (p21 WAF1/CIP1 ), which functions as a hub, can compensate each other. The differential expressions were also corroborated by quantitative RT-PCR data in some of the previously tested tissue samples and in 14 validation samples. Together, these findings implicate all seven genes in gastric carcinogenesis, including the four that were not previously related to human gastric cancer.
Transcript profiling studies require complementary protein analysis to fully understand the associated regulatory process in living organisms. By itself, profiling does not adequately reflect the fluctuating signalling events occurring at the proteomic level, based on the evidence that only a subset of proteins correlate significantly with mRNA abundance Nishizuka et al, 2003;Tian et al, 2004). These seemingly anomalous results are explained partly by translational processes whereby microRNAs repress the translation of mRNA into proteins, and partly by post-translational modifications such as phosphorylation, methylation, acetylation, and ubiquitination. For that reason, the expression levels of proteins encoded by highly overexpressed genes related to gastric carcinogenesis require further investigation. This study detected protein expression for two gene products among the four previously noncancerrelated genes. Furthermore, NEK6 protein was strongly stained in most of the cancer tissues, but showed less mRNA signal compared to the remaining six genes. This finding suggests that NEK6 might be significantly modified post-translationally.
Matrix metalloproteinases including MMP7 play important roles in determining tumour invasion and metastasis and MMP7 gene expression correlates with vessel invasion and both lymphatic and hematogenous metastases (Yamashita et al, 1998). Increased SPARC expression is linked to advanced gastric cancer (Wang et al, 2004), although the expression of SOD2 (Mn-SOD; manganese superoxide dismutase) was significantly enhanced in cancer tissues compared with normal mucosa, and the Mn-SOD ratio was proposed as an independent prognostic parameter (Janssen et al, 2000). The IGFBP7 gene was upregulated in diffusetype gastric cancer (Boussioutas et al, 2003) and in 22 gastric cancer/nontumour mucosa paired tissues samples (Kim et al, 2003). Interestingly, recent study revealed that TGF-b signalling including INHBA accounted for some of the main differences between normal tissue and gastric cancer at the transcript level .
As stated, this study identified several genes, such as LUM and NEK6, which were not previously associated with human gastric cancer. LUM is a member of the small leucine-rich proteoglycan family that induces apoptosis and suppresses cell proliferation. Its reduced expression has been associated with poor outcome in invasive carcinoma (Vuillermoz et al, 2004;Schuetz et al, 2006). NIMA (never in mitosis, gene A) was originally identified in Aspergillus nidulans as a serine/threonine kinase critical for cell cycle progression (Osmani et al, 1988). Human NIMA-related kinases (Neks) have high homology to NIMA in the N-terminal catalytic domain sequences. NEK6 is a Neks-family gene required for mitotic progression in human cells (Roig et al, 2002). Inhibition of NEK6 by either overexpression of an inactive NEK6 mutant or elimination of endogenous NEK6 using siRNA-arrested cells in M phase and triggered apoptosis (Belham et al, 2003;Yin et al, 2003). A recent study demonstrated overexpression of NEK6 transcripts in hepatocellular carcinoma (Chen et al, 2006), although it was found to be frequently expressed among 125 serine/threonine kinase genes implicated in breast cancer, colorectal cancer, lung cancer, and laryngeal cancer by in situ hybridisation (Capra et al, 2006). However, no previous studies have shown NEK6 expression in gastric cancers or NEK6 protein expression in any cancerous tissues. In data not shown here, we also found higher levels of NEK6 protein in advanced cancer compared to early-stage samples by immunohistochemistry.
In conclusion, this study used an integrated approach combining gene expression profiling and dynamic mapping of gene expression data on large sample numbers to identify novel candidate genes that may contribute to gastric carcinogenesis. The identified genes were universally validated in additional samples. In particular, NEK6 and INHBA are promising potential markers of gastric cancer regardless of disease stage.