Predicting biomarkers for ovarian cancer using gene-expression microarrays

Ovarian cancer has the highest mortality rate of gynaecological cancers. This is partly due to the lack of effective screening markers. Here, we used oligonucleotide microarrays complementary to ∼12 000 genes to establish a gene-expression microarray (GEM) profile for normal ovarian tissue, as compared to stage III ovarian serous adenocarcinoma and omental metastases from the same individuals. We found that the GEM profiles of the primary and secondary tumours from the same individuals were essentially alike, reflecting the fact that these tumours had already metastasised and acquired the metastatic phenotype. We have identified a novel biomarker, mammaglobin-2 (MGB2), which is highly expressed specific to ovarian cancer. MGB2, in combination with other putative markers identified here, could have the potential for screening.

Ovarian cancer is the leading cause of death from gynaecological malignancy, with an estimated 24 000 and 6800 new cases in the US and UK, respectively, during 2001 (Greenlee et al, 2001;Swerdlow et al, 2001). Early-stage disease is largely asymptomatic, and most patients are diagnosed when disease has spread beyond the pelvis with an associated 5-year survival of less than 20% (Ozols, 2002). This is partly due to the lack of reliable screening strategies. While around 90% of women with advanced disease have elevated serum CA125, this marker alone is neither sufficiently sensitive nor specific for use as a screening tool (Menon and Jacobs, 2001). Despite new cytotoxic regimens, survival has remained largely unchanged over the past 20 years. Identification of new molecular signatures of early disease is a key goal of ovarian cancer research.
Gene-expression microarray (GEM) profiles have previously been used to compare the expression profile of ovarian cancer with that of the normal ovary (Ono et al, 2000;Welsh et al, 2001). We extended this approach by using a more extensive set of probes (Affymetrix U95Av2), and also characterised metastatic disease in a search for molecular markers of progression. We investigated the potential specificity of a number of putative biomarkers by examining their expression in a panel of other epithelial tissues and tumours.

Ovarian tissue samples
Four snap-frozen normal ovarian samples, and six pairs of primary and omental serous adenocarcinoma (Stage IIIC) from the same individuals were collected at the time of surgery at the University College Hospitals NHS Trust. The six paired samples of primary and secondary ovarian cancer were taken at the time of primary surgery prior to chemotherapeutic intervention. The normal ovarian samples were taken at the time of surgery for benign disease. H&E-stained sections were examined and verified histopathologically to be stage III serous adenocarcinomas. All samples comprised at least 70% tumour, except one omental sample which had 5% tumour content. The normal ovarian samples were verified to be free of any pathology, including benign cysts. Ovarian epithelium was macrodissected from the underlying stroma, and was used for subsequent analysis. For the real-time quantitative RT -PCR data, we used, in addition, three serous tumours of low malignant potential (LMP). All patients gave preoperative informed consent, and the study was approved by the ethics committee of the Royal Free and University College Medical School.

RNA sample preparation
Tissue specimens were homogenised in lysis buffer using a rotary homogeniser. Total RNA was extracted using the Qiagen RNeasy s kit (Qiagen, Valencia, CA, USA), according to the manufacturer's instructions. The integrity of the RNA was assessed by ethidium bromide staining after agarose gel electrophoresis. Total RNA (20 mg) was used to synthesise double-stranded cDNA using the Superscript s Choice System (Life Technologies), with the template being used for an in vitro transcription reaction to yield biotinlabelled antisense cRNA (BioArrayt High Yield RNA Transcript Labelling Kit, Enzo Diagnostics, Farmingdale, NY, USA). Fragmentation, hybridisation and scanning were performed according to the Affymetrix GeneChip s protocol, using the U95Av2 oligonucleotide microarrays containing B12 000 genes (Affymetrix, Santa Clara, CA, USA).

Real-time quantitative RT -PCR
Four genes, shown in the microarray system to be significantly upregulated, were selected for analysis with real-time quantitative reverse transcription -polymerase chain reaction (qRT -PCR). Primer pairs for each gene were designed using the Primer Express s Software (Applied Biosystems) and selected to have the same annealing temperature (601C). The primer sequences used were: mammaglobin B2 (MGB2), forward 5 0 -CCGCTGCAGAGGC-TATGG-3 0 , reverse 5 0 -CATCAGTCCAAAGTTTTTCAGAGTTCT-3 0 , kallikrein 6 (KLK6), forward 5 0 -GCGGACCCTGCGACAAG-3 0 , reverse 5 0 -GGATAAGGACCCCACCACAGA-3 0 ; serum amyloid A1 (SAA1), forward 5 0 -TTCTCACGGGCCTGGTTTT-3 0 , reverse 5 0 -GCCTCGCCAAGGAACGA-3 0 and hepsin (HPN), forward 5 0 -GGCTCGAGTCCCCATAATCAG-3 0 , reverse 5 0 -GGTAGCCAG-CACAGAACATCTTG-3 0 . Primers were tested by conventional PCR and the PCR products were sequenced prior to real-time quantitation to confirm the specificity (data not shown). Primer optimisation and efficiencies were performed prior to the relative quantitation of the expression of the genes (data not shown). Real-time qRT -PCR was performed on an ABI PRISM s 7000 SEQUENCE DETECTOR (Applied Biosystems, Applera UK, Cheshire, UK) using the SYBR s Green PCR Master Mix (Applied Biosystems) in duplicate, with triplicate nontemplate controls (NTC) in a 25 ml PCR reaction. cDNA (1 ml) was used in a 25 ml PCR mixture containing 1 Â SYBR s Green PCR mix (Applied Biosystems) and 0.3 mM of each primer for all genes, apart from HPN where 0.6 mM forward and reverse were used. The cDNAs were amplified by denaturation for 10 min at 951C, followed by 40 cycles of denaturation at 951C for 15 sec and annealing extension at 601C for 1 min. The threshold cycle (C T ), which represents the PCR cycle at which an increase in reporter fluorescence above a baseline signal can first be detected, was calculated as previously described (Heid et al, 1996). The relative expression of each gene was determined on the basis of the C T value. The housekeeping gene GAPDH was used to normalise the quantity of cDNA used. Average GAPDH C T value was subtracted from that of each target gene to obtain a DC T value, that is, normalised target gene expression relative to GAPDH. An average DC T value was obtained for each of the five groups of 19 cDNA ovarian samples (normal: n ¼ 5, LMP: n ¼ 3, primary: n ¼ 5 and metastasis: n ¼ 2). Each average DC T was also subtracted from that of a calibrator (average DC T value of all the normal samples which provide the physiological expression of each gene target) to give the DDC T value, that is, normalised target gene expression in the different groups relative to normal. Since C T values are measured when PCR amplification is still in the exponential phase, the relative quantitative value can be expressed as 2 ÀDDCT , as 2 corresponds to the PCR product doubling in each cycle in the exponential phase.

Immunohistochemistry (IHC)
IHC was performed for hepsin (HPN) on 30 formalin-fixed, paraffin-embedded tissues histologically characterised into three distinct tissue groups: normal ovarian, primary ovarian serous cystadenocarcinoma and metastatic (omentum), to confirm expression at the protein level. Sections were cut at 4 mm, deparaffinised and rehydrated in a series of graded alcohols, before being heated in a microwave in Tris-EDTA (TE) for 25 min. Endogenous peroxidase activity was blocked by 10 min incubation with 0.5% hydrogen peroxide (H 2 O 2 ) in methanol, prior to the application of goat polyclonal primary antibody (1 : 50; Santa Cruz Biotechnology Inc., Insight Biotechnology Ltd, Wembley, UK) for 1 h at 221C. A biotinylated, anti-goat secondary antibody (1 : 400; DAKO, Cambridgeshire, UK) was applied for 30 min, after which slides were incubated with the streptavidin-peroxidase complex (DAKO) for a further 30 min. Sections were visualised by application of diaminobenzidine (DAB) substrate (DAKO) for 7 min, followed by a wash in running H 2 O and counterstaining for 2 min with Mayer's haematoxylin (DAKO). All sections were then dipped in acid alcohol to remove excess haematoxylin, and immediately placed in running H 2 O. After dehydration in graded alcohols, slides ended in xylene, and were mounted in DPX.

Data analyses
Background subtraction, normalisation and expression values of our data were calculated using the rma algorithm (Irizarry et al, 2003), available as part of the Affymetrix package of the Bioconductor open-source software library for the statistical language R (http://www.bioconductor.org). The rma algorithm differs from the standard Affymetrix algorithm in a number of ways; most importantly, the data are quantile -quantile normalised at the probe level, prior to calculation of a final expression summary from the positive match (or PM) probes alone. This algorithm improves measurement precision, reducing the variation between replicate data, particularly of low-expressed genes. Differential expression was calculated using the Benjamini -Hochberg step-down false-discovery rate (FDR) algorithm set to 0.05, implemented using the Bioconductor multtest package. This algorithm adjusts P-values upwards to discount the effects of multiple testing. It is a less-conservative adjustment (admitting more errors) than the more common, but here impractically conservative, Bonferroni or Holm algorithms.

Comparative GEM data
Publicly available GEM data from normal epithelia-rich tissues were obtained from the Genomics Institute of the Novartis Research Foundation expression atlas (http://expression.gnf.org). Prostate and lung adenocarcinoma data were obtained from the Whitehead Institute Centre for Genomic Research (http://wwwgenome.wi.mit.edu/cgi-bin/cancer). Both data sets were in the original Affymetrix CEL format, and were normalised and analysed using the same methods as our own data described above.

RESULTS
Four normal ovarian samples, plus six paired primary (stage IIIC) and secondary samples from the same individual were analysed. The histopathology of adjacent sections showed that 70 -90% of primary samples and 90% of metastases (except one sample) constituted tumour cells. The normal ovarian samples were verified to be free from any benign pathology. Differences in gene expression discussed below were all tested for significance using a FDR of 0.05, using the Benjamini -Hochberg step-down algorithm (Benjamini and Hochberg, 1995) (see Materials and methods). For clarity, gene names and abbreviations used throughout the text are summarised in Table 1.

Primary ovarian disease
There were 421 genes more than two-fold and 118 genes more than three-fold overexpressed in primary compared to normal tissue. Figure 1 shows significantly overexpressed genes in primary ovarian cancer sorted into functional groups. These groups include genes associated with epithelia and cell -cell contact, such as secreted phosphoprotein 1 (osteopontin, OP), folate receptor 1, claudins 3 and 4 (CLDN3, 4), keratins 8, 18 and 19 (KRT8,18,19), and agrin (AGRN). These are also shown in Figure 2B, and could reflect the epithelial origin of these tumours. Genes involved in cell division and growth include cyclin D1, cellular retinoic acidbinding protein 2 and lipocalin 2 (oncogene 24p3). Metastasis and angiogenesis genes include jagged 2, tumour-associated calcium signal transducer 2 (TACSTD2), vascular endothelial growth factor (VEGF), CD24 antigen and neuromedin U.
We compared the consistency of our data with that of another study (Welsh et al, 2001), where overexpression of tumour genes in cancer were ranked according to a combined metric, using normal ovary as a baseline. The four genes CD24, WAP four-disulphide core domain 2 (HE4), CD9 and Lutheran blood group (LU) were found to be the most highly expressed by their method, and are also highly overexpressed in our own data set ( Figure 2A). Where the data sets overlap, they are highly consistent.
We identified 172 genes that were three-fold downregulated in primary ovarian cancer compared to the normal ovary (Figure 3). Among these were putative tumour suppressors including the p53 mediator paternally expressed gene-3 (PEG-3) (Relaix et al, 1998;Deng and Wu, 2000), wnt-inducible signalling protein-2 (WISP-2), a member of the connective tissue growth factor family (Pennica et al, 1998), and the Rho-associated transcriptional coactivator four-and-a-half LIM domains 2 (FHL2) (Muller et al, 2002). However, the recently reported putative tumour suppressor in ovarian cancer, opioid-binding protein (OPCML), did not appear to have significant loss of expression in any of the samples studied here (Sellar et al, 2003) ( Figure 2E).

Omental metastasis
While there were 300 genes with more than three-fold difference between normal and primary samples, there were only 35 equally large differences between primary and omental metastases, all greater in metastases. These genes fell into two main groups. These included serum amyloid A1 (SAA1), which is a marker of inflammation and immunoglobulin (Ig) lamda-locus, which may reflect leucocyte infiltration. We found that many of the gene differences between primary and paired omental samples reflect  Figure 4 Genes upregulated in omental metastasis relative to normal ovary and primary ovarian cancer. The predominance of genes associated with adipocytes reflects the omental background. All differences are significant at the Po0.05 level after multiple testing adjustment (see Materials and methods).  the high adipocyte content in the omentum, such as adipsin, lipoprotein lipase and perilipin (Figure 4). We found a number of putative invasion and metastasis predictive genes including enhancer of zeste homolog 2 (EZH2) (prostate cancer) (Varambally et al, 2002), pituitary tumour-transforming 1 interacting protein (PTTG1) and Lamin B1 (LMNB1) (adenocarcinoma) (Ramaswamy et al, 2003) to be unchanged in primary and omental specimens ( Figure 5). Essentially, the malignant primary and epithelial tumour are alike. Hepsin, a prostate cancer serum biomarker, while marginally overexpressed in primary, was further overexpressed in secondary ovarian cancer tissue. Immunohistochemistry for hepsin showed staining of both the normal ovarian surface epithelium (OSE) and malignant epithelial cells in primary and omental metastasis. The pattern in malignant cells was distinct, however, being localised to the membrane ( Figure 6).

Validation of array data with qRT -PCR
In order to validate the gene-expression levels from the microarray experiments, we performed real-time qRT -PCR with GAPDH as a control in five normal ovaries, three LMP ovarian serous cancers, five primary ovarian serous cystadenocarcinomas and two omental metastases. Figure 3 shows the corresponding gene expression patterns of four genes: mammaglobin B2 (MGB2), serum amyloid A1 (SAA1), kallikrein-6 (KLK6) and hepsin (HPN) for normal ovary, primary and secondary disease on the microarrays, compared to that on qRT -PCR. Figure 7 demonstrates that the differential expression pattern and the quantitative expression level of each of these four genes, as determined by qRT -PCR, were comparable to those observed with the microarrays, confirming the reliability of our array expression data. Notably, qRT -PCR showed high expression of MGB2 and KLK6 in the LMP samples.

New biomarkers
We identified a potential new biomarker MGB2 with: (a) higher expression in both primary and metastatic samples compared to the normal ovary, (b) high gross expression above the 80th percentile of all genes in primary and metastatic samples and (c) with high homology (58% amino-acid identity) to the known serum marker MGB. Figure 8 shows the GEM profile of MGB2 compared to that of six other proteins that have been suggested as potential biomarkers: HPN (Tanimoto et al, 1997), IFI-15K, KLK6 (Diamandis and Yousef, 2002), CP (Hough et al, 2001), SLPI (Shigemasa et al, 2001) and HE4 ) across a panel of epithelia-rich tumours and tissues. This panel was comprised of publicly available Affymetrix data from (see Materials and methods, Data analysis): (a) prostate adenocarcinoma (Singh et al, 2002), (b) lung adenocarcinoma (Bhattacharjee et al, 2001) and (c) the GNF gene expression atlas containing various primary epithelial tissues (Su et al, 2002). MGB2 in particular is specific to ovarian adenocarcinoma.

DISCUSSION
We have used oligonucleotide microarrays representing B12 000 genes to investigate the GEM profiles of epithelial ovarian cancer. A number of groups have previously investigated gene-expression profiling of ovarian cancer using microarrays (Wang et al, 1999;Ismail et al, 2000;Ono et al, 2000;Mok et al, 2001;Shridhar et al, 2001;Welsh et al, 2001). These studies have focused on either the identification of gene products which can serve as ovarian cancerspecific markers (Mok et al, 2001), or on the initiation and progression of ovarian cancer (Ismail et al, 2000;Shridhar et al, 2001). This has been achieved by comparing the normal ovarian epithelium with ovarian cancer samples, as the majority of ovarian  Figure 7 Comparison of qRT -PCR (clear bars, normal (n ¼ 5), primary (n ¼ 5), LMP (n ¼ 3) and metastasis (n ¼ 2)) and GEM data (shaded bars, normal (n ¼ 4), primary (n ¼ 6) and metastasis (n ¼ 6)) for MGB2, SAA1, KLK6 and HPN in normal, primary and omental metastasis samples. Gene-expression microarray data are in original Log2 scale, and qRT -PCR is single Log2 unit per round of amplification, error bars show the standard deviation. The normal level is taken as a 0 baseline reference for both.
cancers are thought to arise from the ovarian surface epithelium, which exists as a single layer of cells covering the ovaries. This layer of cells easily sloughs off at the time of surgery by manual handling, and it is a challenge to obtain enough cells for use in any experimental procedures. Researchers have overcome this problem by firstly using short-term cell culture to increase the number of cells available (Ismail et al, 2000), secondly by RNA amplification (Ono et al, 2000) and thirdly by using commercially available RNA (Welsh et al, 2001). These approaches, however, have drawbacks: (i) short-term culture favouring the growth of only a subset of epithelial cells, (ii) RNA amplification leading to unequal amplification of all RNA transcripts in the cell population and (iii) the inclusion of a stromal component in commercially available RNA.
In this study, we used macrodissected epithelium from the normal ovarian tissue in addition to matched primary and secondary metastatic serous ovarian adenocarcinomas. Tumour specimens were verified histopathologically in five cases to comprise at least 70% tumour. We confirmed a number of ovarian cancer genes previously identified by GEM, for example, CD24 (Welsh et al, 2001), HE4 , PRAME (Ismail et al, 2000), B-factor (properdin) (Shridhar et al, 2001), and, where our studies overlap, the data are highly consistent, despite the difference in methodology.
A large number of genes overexpressed in primary tumours were associated with epithelia. This might reflect the epithelial origin of these tumours or a transformed phenotype. HPN, for example, was marginally overexpressed in both primary and secondary ovarian cancer tissue, compared to the normal ovary (approx. two-fold). HPN is a serine protease that has been shown to be overexpressed in prostate cancer cells, and significantly correlates with poor clinical outcome (Dhanasekaran et al, 2001). We investigated hepsin further by performing IHC, and found the staining to be localised to the epithelial cells, suggesting that it may be a marker of epithelia rather than of malignancy ( Figure 6). However, there was a notable difference in the pattern with malignant cells showing a distinct membranous staining, suggestive of heightened secretion.
We found few differences in the gene signature of stage III primary serous ovarian adenocarcinomas and their corresponding omental metastases. Various studies have shown that metastatic signatures within primary tumours are predictive of subsequent metastasis. We found that, within the stage III serous ovarian adenocarcinomas, a number of predictive genes including EZH2 (Varambally et al, 2002), PTTN and Lamin-B (Ramaswamy et al, 2003) are overexpressed in primary, at least as highly as in omental metastases ( Figure 4). This supports the notion that most tumour cells in advanced primary ovarian lesions have acquired the genetic signature enabling invasion and metastasis. A GEM study comparing stage Ia (no ascites) with Ic (ascites, that is, metastatic spread) might identify genes that infer the propensity of ovarian tumour cells to metastasise, although it would be challenging to obtain sufficient material.
We identified a potential new biomarker MGB2, which is significantly overexpressed in primary and metastatic ovarian cancer compared to the normal ovarian tissue. This gene is part of the uteroglobin family, and is also overexpressed in endometrioid endometrial carcinomas (Moreno-Bueno et al, 2003), and the axillary lymph nodes of metastatic breast cancers (Ooka et al, 2000). A preliminary qRT -PCR analysis of MGB2 confirmed this finding and further demonstrated high expression in LMP samples (n ¼ 3). LMP tumours are a distinct subtype of epithelial ovarian cancer thought to be as an intermediate stage between clearly benign and malignant tumours. No biomarker to date is sufficiently specific for screening and monitoring disease progression in LMP tumours. MGB2 warrants further investigation in this subgroup.
The only widely used ovarian cancer marker CA125 lacks specificity (CA125 or MUC16 is not present on the U95Av2 array). Within the panel of data available to us, MGB2 appears to be a specific biomarker for ovarian tumours with low expression in most normal epithelial tissues and prostate and lung tumours. This survey was far from exhaustive, relying on available published  Figure 8 Gene-expression profile of putative biomarker MGB2 in ovarian serous adenocarcinoma and a panel of other tissues. Comparison with six previously described biomarkers HPN, IF1-15K, KLK6, CP, SLPI and HE4. Serious ovarian AdC ¼ primary serous ovarian adenocarcinoma, omental metastasis ¼ serious ovarian omental metastasis, lung AdC ¼ lung adenocarcinoma, prostate AdC ¼ prostate adenocarcinoma. Adrenal gland, kidney, liver, pancreas, pituitary gland, lung, spleen, thyroid, trachea and uterus, all represent the corresponding normal tissue specimens. GEM data. The screening and selection of candidates for further serological study will benefit from more publicly available data, in particular breast cancer. The recent development of multiplex techniques to screen sera for combinations of biomarkers shows promise for cancer screening (Petricoin et al, 2002). A combina-tion of biomarkers including MGB2 rather than a single biomarker alone is more likely to give a specific signature for epithelial ovarian carcinoma. Our study demonstrates that GEM studies are a practical and economical prelude to streamline candidate genes for larger serological studies.