Identification of potential diagnostic markers of prostate cancer and prostatic intraepithelial neoplasia using cDNA microarray

The identification of novel genes or groups of genes expressed in prostate cancer may allow earlier diagnosis or more accurate staging of the disease. We describe the assembly and use of a 1877-member microarray representing cDNA clones from a range of prostate cancer stages and grades, precursor lesions and normal tissue. Using labelled cDNA from tumour samples obtained from TURP or radical prostatectomy, analysis of expression patterns identified many up-regulated transcripts. Cell lines were found to over-express fewer genes than diseased tissue samples. 17 known genes were found to over-express more than 4-fold in 4 or more cancers out of 15 cancers. Only 2 genes were over-expressed in 6 out of 15 cancers or more, whilst no genes were consistently found to be over-expressed in all cancer samples. Novel prostate cancer associations for several well characterized genes or full length cDNAs were identified, including PLRP1, JM27, human UbcM2, dynein light intermediate chain 2 and human homologue of rat sec61. Novel associations with high-grade PIN include: breast carcinoma fatty acid synthase and cDNA DKFZp434B0335. We shortlist and discuss the most significant over-expressed genes in prostate cancer and PIN, and highlight expression differences between malignant and benign samples. © 2001 Cancer Research Campaign http://www.bjcancer.com

These problems may be addressed using a genomics-based approach investigating global gene expression changes in clinical samples using microarrays (Sagar, 1997;Burczak et al, 1998). With suitable databases and bioinformatics tools, candidate genes can be selected following in silico analysis for favourable tissue distribution, secretion signals and other features, allowing empirical design of microarrays for candidate marker screening. Gene expression profiling is being increasingly used to analyse hundreds or thousands of genes simultaneously in cancer cell lines (Bertucci et al, 1999), and diseased and normal tissue (Zhang et al, 1997;Alon et al, 1999;Wang et al, 1999). Clustering analysis of gene expression data can provide novel insights into disease, for example molecular definition of subtypes of leukaemia, providing a tool for an important diagnostic problem (Golub et al, 1999).
To identify potential candidate prostate cancer markers we assembled a custom microarray for analysis of prostate tissue, biasing the clone choice on our array towards genes that are expected to express at higher levels in cancer. The frequency of clones in tissue-specific cDNA libraries in the proprietary Incyte databases, which typically rank genes according to frequency in a given library, was taken to reflect their true abundance in tissues. Quantitative electronic subtraction of cDNA libraries (e.g. cancer minus normal, late-stage minus early-stage) was used to increase the odds of including relevant genes on the array, which could then be checked by hybridization to diseased prostate cDNA. In Identification of potential diagnostic markers of prostate cancer and prostatic intraepithelial neoplasia using cDNA microarray addition, in silico transcript imaging analysis was used to check specificity against other tissue types. Known cancer and prostateassociated genes were also included in the array.
Although down-regulated genes are of great biological interest, it is not convenient to assay their transcripts because biopsy or other samples may contain normal tissue to varying degrees, and disease tissue may form only a small part of the sample under investigation. Therefore we set out to identify strongly overexpressed genes, which may be detectable even in samples containing a minority of affected tissue. This analysis of normal, BPH, PIN and prostate cancer tissue provides a starting point for further investigation of candidate marker genes for diagnosing, staging and treating prostate cancer.

Bioinformatics, databases and choice of cDNA clones
Genes (or, in the case of unknown genes, clone clusters) of potential interest for microarray were selected either because of a known association with prostate or cancer, or because of a relative abundance in tumour libraries compared to normal prostate. For the latter, manipulations were carried out using a proprietary database (LifeSeq) provided by Incyte Pharmaceuticals (Palo Alto, CA) which primarily contains grossly dissected and microdissected prostate cDNA libraries. Several public domain libraries including those from the Cancer Genome Anatomy Project are also available in this database, including PIN libraries. Using Incyte's electronic tools for transcript imaging (virtual Northern blot analysis) and library subtraction based on the BLAST algorithm, the array was biased towards over-expressed genes which might form useful diagnostic markers. This empirical approach was complemented by inclusion of known prostate-and/or cancerassociated genes from the literature. Physical clones for arraying were chosen from a proprietary collection supplied by Incyte Pharmaceuticals, many of which were sequence verified by Incyte. Where available, 2 different clones per gene were used, typically from the 3′ region of the transcript. An additional set of 11 'housekeeping' genes was also included. A master list of clones was maintained using Microsoft Excel software.
The transcript imaging tool was used to interrogate the Incyte Lifeseq Gold database for tissue distribution across 1113 cDNA libraries. All sequence annotation, gene ID and transcript imaging data presented in this paper was correct at the time of writing, according to Incyte Lifeseq Gold version 5.1, February 2000 release.

PCR and clone arraying
Clones were assembled into 96-well microtitre plates containing selective growth medium, grown overnight at 37˚C. The cDNA inserts were amplified using universal primers for pINCY, pSPORT and pBluescript (TTGGGTAACGCCAGGGTTTTC-CCAGTCAC and CCCCAGGCTTTACACTTTATGCTTCCG-GC) at 35 pmol 50 µl -1 reaction, in 96-well plates containing PCR buffer (1.05 units Taq DNA polymerase, 20 mM Tris-HCl (pH 8.3), 50 mM KCl, 1.5 mM MgCl 2 and 0.2 mM each dATP, dCTP, dGTP and dTTP) (AB gene, Epsom, UK). PCR conditions were 94˚C, 2 min, followed by 35 cycles of 94˚C, 1 min, 60˚C, 1 min and 72˚C 3 min followed by a final extension step of 72˚C for 10 min. PCR products were spotted onto Nytran + membrane (Schleicher and Schuell, Keene, NH) using an automated robotic system (Q-Bot, Genetix, Christchurch, UK) together with appropriate software. Null spots containing dye were used as a visual aid to assess array quality and orientation. The DNA was denatured using standard solutions and crosslinked to the membrane by ultraviolet irradiation (Stratagene Stratalinker, La Jolla, CA).

Prostate RNA, labelling and hybridization conditions
Tumour samples were either from sections of radical prostatectomy specimens, or tissue from transurethral resection, with concurrent histopathological data. All clinical samples were obtained with the approval of the ethics committee of the hospital concerned. Tissue was flash frozen in liquid nitrogen and stored at -70˚C until needed. Culture of prostate carcinoma cell lines was performed under standard conditions. PC-3, DU 145 and LNCaP cells were cultured in RPMI medium (Gibco, Paisley, UK), 5% fetal calf serum (Gibco), 1% glutamate (Gibco). LNCaP cells were initially cultured in this medium for 48 h, then medium was replaced with charcoal-stripped medium, and cells incubated for a further 48 h, followed by 48 h in fresh medium with or without α-5-dihydrotestosterone. About 100 mg tissue or cells was used for RNA extraction using Trizol (Gibco). RNA integrity was checked by agarose gel electrophoresis. Normal prostate RNA was a mixture from 20-30 accident victims with no diagnosable prostate abnormality (Clontech, Palo Alto, CA). 15-20 µg total RNA was treated with 10 units DNase I (RNase free) (Roche Molecular Biochemicals, Lewes, UK) in a suitable volume, at room temperature for 15 min. EDTA was added to 2.5 mM, and the sample heated to 65˚C for 10 min to stop the reaction. If necessary, at this stage Microcon 30 columns (Amicon, Millipore, Bedford, UK) were used to concentrate the sample. 15-20 µg RNA was allowed to anneal to 0.5 µg of oligo (dT) primer 12-18 (Gibco) at 65-70˚C for 10 minutes. First strand labelling was performed with 1st Strand Labelling Buffer (Gibco) with the addition of dTTP, dATP and dGTP at a final concentration of 0.5 mM and dCTP at 50 µM, 30 mCi α -33 P dCTP (Amersham Pharmacia Biotech) and 40 units RNase inhibitor (Roche). After 5 minutes at 42˚C, 200 units Superscript II enzyme (Gibco) was added and the mix was incubated at 42˚C for 1.5-2 hours. Unincorporated nucleotides were removed by centrifugation through a GFX column (Amersham Pharmacia Biotech) according to manufacturer's guidelines. The newly synthesized probe was denatured by boiling for 5 minutes and stored on ice. Microarray filters were wetted in Church hybridization solution (Church and Gilbert, 1984), and incubated with a further ~5 ml of this solution in a cylindrical bottle rotated in a hybridization oven at 65˚C for 4-5 h prior to the addition of fresh solution and denatured probe, which was mixed by quickly swirling. Hybridization was at 65˚C for ~ 16 h. Filters were washed in Church wash solution (Church and Gilbert, 1984) at 65˚C for 2 × 20 min.

Image quantification and data analysis
Filters were covered in clingfilm wrap and exposed to low-energy phosphorimaging screens (Eastman Kodak, Rochester, NY) for 5-6 days, prior to phosphorimager scanning (Storm 830, Molecular Dynamics, Sunnyvale CA). Images were manipulated using ImageQuant software (Molecular Dynamics), and quantitative data for spot intensity extracted using Incyte LifeArray software, then exported to Microsoft Excel. The data were processed in an Excel workbook sheet designed to (a) calculate a local background value for each of 96 small 7 × 7 grid areas on the array, and discard data points within this area less than 2-fold this value, then using this output (b) normalize by calculating a ratio for each value against the mean value for housekeeping genes across the whole array. Data could then be directly compared between filters. A dataset for mean normal prostate was generated from 2 duplicate array hybridization experiments of 2 independent mixed batches of normal prostate RNA. The 2 normal prostate RNA batches comprised samples from 23 men (aged 23-64), and 47 men (aged 15-50), respectively, who were not diagnosed for prostate cancer (Clontech). Typically, data from tumour samples was exported to a second Excel workbook, and ratios generated against normal values. Genes could then be ranked for level of over-expression, and by assembling lists of ranked over-expressed genes, tumours compared and data amalgamated. In some cases, data for genes in the normal RNA were below the 2-fold background cut-off, and a ratio could not be generated. These genes were classified on/off (expressed in tumour, undetectable in normal).

RESULTS
Samples of prostatic material were collected from well characterized patients with either BPH (mean age (range) 75 (66-84) years), PIN or prostate cancer (mean age (range) 68 (52-88) years) (sample details are given in Table 1). Array data were normalized to a representative selection of housekeeping genes, to allow comparison of expression levels between hybridization experiments. To generate a normal prostate dataset, mixed samples of normal prostate from individuals (aged 15-64 years) with no history of prostatic disease were used (Clontech), as normal prostate material was not available. The mean of 2 hybridization results for each of 2 different batches of normal prostate samples was taken. No differences of greater than 2-fold were observed between these pairs of normal prostate control filters. To evaluate overexpression compared to normal prostate tissue, results from cell culture, prostate cancer, PIN or BPH samples were ratioed against the control normal prostate dataset, and genes with values of greater than 1.5 × normal signal were considered to be overexpressed.
To evaluate overexpression of cancer relative to benign tumour, a mean of 11 BPH samples (Table 1) was taken as the control value, and results from cancer samples were ratioed against this. Subsequently, all of the genes identified as overexpressed were checked against the mean value for the set of 11 BPH samples. No BPH sample deviated by 1.5 fold or more from the mean for any of the genes of interest.

Over-expressed genes are more common in prostate tissue than cells
The majority of genes found to be over-expressed in cancer, PIN and BPH samples were expressed in normal tissue also. A variable number of genes were expressed in diseased tissue but not detected  (Gleason, 1992). Tumour stage details are according to the TNM classification system: T0 indicates no evidence of primary tumour; T1-T2 tumours are confined to the prostate, T3 tumours extend through the prostatic capsule, T4 indicates invasion of adjacent structures, N0 indicates no regional lymph node metastasis; N1 indicates limited lymph node involvement (full details of the TNM system are in Jones and Smith, 1994).
in normal tissue, but these genes were always in a minority compared to genes detected in normal but over-expressed in tumour ( Figure 1). In contrast, prostate cancer cell lines displayed a high proportion of genes whose expression was not detected in normal prostate. Compared to normal prostate, LNCaP cells over-expressed roughly twice as many genes as DU-145, PC3 and LNCaP cells deprived of androgen ( Figure 1 and Table 2). However, samples of prostate containing or comprising PIN, BPH or cancer consistently displayed much higher levels of global over-expression than the cell systems, typically in the range 200-600 genes ( Figure 1). There is no strong difference in numbers of over-expressed genes between 2 PIN, 11 BPH and 16 cancer tissue samples (Table 2). Table 3 lists 17 genes most commonly over-expressed in prostate cancer samples compared to normal prostate samples. In addition novel mRNA from fetal brain unpublished SRF accesory protein 1A (SAP-1) s22 g429185 4 6.5 binds to serum response factor Dalton and Treisman, 1992 DNA sequence. Also homology to 1992 cadherin-10 Genes over-expressed in 4 or more cancer samples from a total of 15. Figures are mean fold signal relative to mean normal prostate (where normal prostate value was below the 2-fold background threshold, no normal value was available, so the figure given is fold signal over 2-fold background -see section 2.4). The mean value is calculated for a single clone spot per filter (that which gave the highest number of positives across the samples).  to several blood cell-related protein genes which may be associated with host response to tumour presence (e.g. FYN binding protein and human UbcM2 genes) the list contains several genes and full length cDNAs that are likely to be expressed in the prostate tissue and may constitute specific markers of cancer. These include the strongly over-expressed p97 and p78 genes. A novel association with prostate cancer is apparent for the fulllength cDNA DFKZp564CO362, which encodes an uncharacterized protein. In addition, this is the first report of a prostate cancer association for the genes PLRP1, JM27, human UbcM2, dynein light intermediate chain 2 and human homologue of rat sec61.

Over-expressed genes in cancer
Following collation of these data, the transcript imaging tool was used in the Incyte Lifeseq Gold database to check tissue distribution across 1113 cDNA libraries. Examples of genes for which this data reflected and supported array data are CC chemokine and dynein light intermediate chain 2. CC chemokine tended to be represented in prostate cancer and PIN libraries, but not normal prostate or other tissue types. Dynein light intermediate chain 2 was present in many tissue types, but within the set of prostate libraries tended to be over-represented in cancer. For other genes, transcript imaging analysis gave mixed results which did not always reflect array data: JM27 was present almost exclusively in prostate libraries, but in both normal and disease types; in contrast FYN binding protein and cytochrome p-1450 were not present in any prostate libraries. Their presence on the array was presumably fortuitously due to reallocation to new gene clusters by Incyte following our initial clone selection at the array construction stage (see Discussion). The rest of the markers characterized by the microarray results appeared in many tissue libraries and had no obvious prostate cancer association as indicated by transcript imaging data. Table 4 lists 9 genes most commonly over-expressed in high-grade PIN relative to normal prostate. Again, host response genes feature (e.g. SRp40-1 and CD-24), but are outnumbered by genes of likely prostatic origin, e.g. k-ras. Novel associations with prostate cancer include breast carcinoma fatty acid synthase and the full length cDNA DKFZp434B0335. Transcript imaging analysis showed wide tissue library distribution for all these genes, but an association with prostate cancer was particularly strong for fatty acid synthase and TAT interactive protein genes.

Over-expression in cancer compared to benign tumour
6 transcripts were over-expressed in at least 5 out of 13 prostate cancer samples compared to mean BPH transcript level (Table 5).  Kay, 1991 Genes over-expressed more than 2-fold, in common between 2 PIN samples (figures given are ratios against normal prostate; representative data are given for one of 2 identical spots on the array). Markers increased by more than 1.5-fold in 5 or more cancer samples out of 15, relative to mean value for BPH. CaP/normal values are mean expression levels for those samples in which signal >1.5 × normal, or >1.5 × mean of 13 BPH samples. 3 genes were found to be expressed in cancer but not detected in BPH.
None of these 6 genes was expressed by more than 1.5 × mean BPH level in any of the 11 BPH samples listed in Table 1. Only one of these genes, CC Chemokine, features in the set of genes identified as over-expressed in 4 or more cancers compared to normal prostate (Table 3). Levels of over-expression in cancer relative to BPH are much lower than cancer relative to normal prostate (compare Tables 3 and 5), suggesting that most genes over-expressed in cancer are also over-expressed in BPH, and so may be of limited use in a molecular diagnostic situation necessitating differentiation between the 2 disease states. However, we detected 3 genes in 5/13 cancers which were not detected in any of 11 BPH samples used in this study. These are DKFZp586D091 (a full length cDNA encoding a protein of unknown function), laminin B2 chain, and a repressor of oestrogen activity (REA). Transcript imaging supported this data: DKFZp586D091 was represented in only a few libraries per tissue type with the exception of nervous system, and at high frequency in only one prostate (tumour) library; REA was more ubiquitously and frequently represented, but was common in prostate tumour libraries and rare in normal and BPH. These 3 genes were not co-expressed in the same cancers (data not shown), but a combination of them could form the basis of a diagnostic test capable of detecting cancer in a prostate sample containing or comprising mostly BPH cells.

Relative levels of gene expression from array data
The data presented in Tables 3, 4 and 5 are in the form of ratios of transcript levels in malignant or pre-malignant disease compared to normal or benign disease. In order to compare relative levels of expression for genes of interest (including genes which were not detected in normal tissue, and for which an accurate overexpression ratio could not be generated) it was necessary to refer to normalized filter data on a sample to sample basis. Table 6 allows comparison of expression levels for many of the genes featured in Tables 3-5. For example, it can be seen that levels of JM27 and CC chemokine (possible discriminatory marker for cancer versus normal cells - Table 3) transcripts are approximately 50% that of laminin B2 (a possible discriminatory marker for cancer versus BPH cells - Table 5) and 35% that of prostatespecific antigen (PSA). These results will form the benchmark for further investigations with RT-PCR.

DISCUSSION
We are interested in using molecular criteria to distinguish between different disease states within the prostate. Accordingly, we have focused on genes which exhibited higher than normal levels of expression in cancer and PIN, as indicated by tissue distribution data in silico. Following array analysis, our aim was to shortlist genes that could enable molecular diagnosis of cancer and/or PIN despite the quantities of normal and BPH tissue which may be present in a biopsy sample. The results we present form the starting point for further studies using accurate RNA quantitation methods such as RT-PCR (e.g. Bieche et al, 1999). Our data support the notion that cell lines may be of limited value in the identification of novel markers and drug targets by transcript profiling, because of the limited number of differentially expressed genes. LNCaP cells over-expressed roughly twice as many genes as the hormone-independent, advanced cancer cell line models PC3 and DU-145. Nevertheless, the proportion of gene transcripts undetected in normal tissue, but present in cancer cells, is higher for the cell lines studied than for diseased tissue. Tissue samples exhibited much wider global gene expression activity, with many more transcripts identified as over-expressed. However, this conclusion has several caveats: firstly our array was biased toward differential expression between disease states and normal tissue, rather than towards cell lines. Second, experimental factors such as variation in probe labelling can account for differences in the number of over-expressed genes detected. Thirdly, many of the genes we have identified are associated with host immune response (see Tables 3 and 4) and may not be cancerspecific. The variable proportion of stromal and epithelial cells present in prostatic tissue samples present less of a problem, because all data presented here are compared to mean values for normal prostate tissue or BPH.
We chose to analyse clinical material from TURP and radical prostatectomy specimens rather than needle biopsies. This has the advantage that more material is available, and can be taken from parts of the prostate with macroscopically visible morphological features, adjacent to tissue for which pathology details is available. We believe this approach will anticipate findings in biopsy samples, but lessens the problems associated with heterogeneity in the diseased gland. Ordinarily, 6 needle biopsies taken at different angles are needed to give a good chance of detecting cancer or PIN (Jarmulowicz, 1999). Extra samples can be taken for study, but these are also subject to sampling error and may increase the risk of morbidity in the patient. Our approach has yielded a small number of potential markers which can be justifiably progressed into assessment on biopsy samples in a more convenient, higher throughput format such as RT-PCR.
To keep the physical size of the filter (and therefore the amount of starting mRNA needed) to a minimum, we wished to keep array size within a reasonable limit of 2000 genes. The physical size of the array (~7 × 11 cm) allowed ease of handling, and accommodated a simple first-strand cDNA-labelling method.
Clone sequences were screened prior to arraying using the Incyte sequence databases wherever possible to ensure that Alu or other repeat sequences were not present. Cot-1 DNA was used in some experiments to address problems of repeat sequence hybridization and cross-hybridization. However, comparison of datasets showed this was not a significant problem (data not shown). Certain I.M.A.G.E. clones used were from the Cancer Genome Anatomy Project cDNA libraries Pr1-10 (details are found at http://www.ncbi.nlm.nih.gov/ncicgap/). Whilst these libraries offer the advantage of being derived from microdissected PIN or cancer lesions, clones are made from amplified second strand cDNA, and clone frequency may not reflect true mRNA abundance to the same extent as the Incyte cDNA libraries, generally made by cloning of first strand cDNA. We biased the clone choice on our array towards genes which are expected to express at higher level in cancer, taking clone frequency in cDNA libraries as an indication of their true abundance, but this potential pitfall was kept in mind.
A period of over one year elapsed between the time of clone choice for microarray assembly, and collation of results for this paper, at which time all sequences were checked in the relevant Incyte database for current gene identity. Some clones had been reassigned to a new gene cluster, based on BLAST homology, during this period. This is a result of updating of sequence data from Incyte and new data of public origin which also is assimilated into the Incyte Lifeseq Gold database. Thus, though most array genes were chosen purely on the basis of library frequency, the current database release does not give the same results for some genes. This, combined with an unknown component due to crosshybridization between closely related cDNA species, probably accounts for the disparity often seen between array data and transcript imaging results. This disparity cautions against inferred associations between genes on the basis of library frequency alone, as has been reported by other groups (e.g. Walker et al, 1999), and strongly suggests that verification of potential markers with highly specific approach is necessary. This work is in progress using RT-PCR methodology.
Assessing the range of gene expression in normal prostate tissue was hindered by the lack of availability of this tissue due to ethical constraints. Mixtures of mRNA from various individuals (as supplied commercially) were the only available source of normal prostate tissue for this study. However, the data from paired filters for 2 different batches of this material was checked for variation of the genes characterized in this study, with the result that overexpression of >1.5 fold above mean was not seen for any gene of interest.
At present, the most specific molecular marker for PIN is absence of GST P1 expression in tumour cells (Brooks et al, 1998). Because PIN cells and normal or BPH cells are often found in the same sample, it is not feasible to test for absence of expression without associated morphological data from microscopy (e.g. via immunohistochemistry). Our shortlist of candidate PINassociated genes requires verification from more samples, RT-PCR and in situ hybridization, but may lead to a solution to this problem.
There was no obvious difference between global gene expression profiles of cancer and BPH. It was interesting that most of the genes which were over-expressed in cancer were also overexpressed in BPH (data not shown), perhaps reflecting the cellular proliferation which occurs in both states. However, 6 genes were identified as being potential discriminatory markers between BPH and cancer, as detailed in section 3.4. In particular, 3 genes, DFKZp586D091, laminin B2 and REA were detected at high levels in cancer tissue but not detected in either BPH or normal prostate. Further study of the expression of these genes by RT-PCR and in situ hybridization could lead to a rapid molecular diagnostic capable of discriminating cancer from BPH, which avoids extensive histological analysis.
Defining genes that are differentially expressed in normal, BPH, PIN and cancer could lead to molecular testing of biopsies or TUR chips for malignancy. In addition, some of these genes have the potential to be therapeutic drug targets. Further, markers of disease progression in organ-confined cancer patients are needed to plan appropriate therapy, as many of these patients undergo radical prostatectomy without being cured. RT-PCR-based tests in a rapid format could be used during surgery to provide information on likelihood of metastasis. Also, patients could benefit from early therapy if there were molecular indications that metastases were present. Although microarray analysis is not ideal for diagnostic applications, markers thus identified may be validated by further studies, and a panel of markers capable of distinguishing disease states could result. In addition, candidates for serum or immunohistochemical tests can be identified.
In summary we have identified novel associations in prostate cancer for several well characterized genes or full length cDNAs including PLRP1, JM27, human UbcM2, dynein light intermediate chain 2 and human homologue of rat sec61 and also novel associations with high-grade PIN, which include: breast carcinoma fatty acid synthase and cDNA DKFZp434B0335. These genes may prove advantageous in defining prostatic proliferative disease states and may have important diagnostic and therapeutic potential in prostate cancer.