Molecular classification of synovial sarcomas, leiomyosarcomas and malignant fibrous histiocytomas by gene expression profiling

In this study, we have used genome-wide expression profiling to categorise synovial sarcomas, leiomyosarcomas and malignant fibrous histiocytomas (MFHs). Following hierarchical clustering analysis of the expression data, the best match between tumour clusters and conventional diagnosis was observed for synovial sarcomas. Eight of nine synovial sarcomas examined formed a cluster that was characterised by higher expression of a set of 48 genes. In contrast, sarcomas conventionally classified as leiomyosarcomas and MFHs did not match the clusters defined by hierarchical clustering analysis. One major cluster contained a mixture of both leiomyosarcomas and MFHs and was defined by the lower expression of a set of 202 genes. A cluster containing a subgroup of MFHs was also detected. These results may have implications for the classification of soft tissue sarcomas, and are consistent with the view that sarcomas conventionally defined as MFHs do not represent a separate diagnostic category.

Adult soft tissue sarcomas are malignant tumours that occur in supporting connective tissues throughout the body, other than the bone or cartilage. They account for around 1% of all cancers and 2% of cancer deaths with metastasis leading to death in about half of the cases (Dirix and van Oosterom, 1999;Singer et al, 2000;Weiss and Goldblum, 2001). These tumours are heterogeneous and a major complication in the management of this disease is that a definitive classification scheme has been slow to emerge. Synovial sarcomas are relatively well defined with two clear subcategories designated monophasic and biphasic, distinguishable following immunohistochemical examination (Fisher, 1998). The majority of cases of synovial sarcoma contain a t(X;18)(p11.2;q11.2) translocation that results in the fusion of the chromosome 18 gene SYT to three closely related genes SSX1, SSX2 and SSX4 on the X chromosome (Clark et al, 1994;Crew et al, 1995;Skytting et al, 1999). However, this tumour is of uncertain histogenesis and despite its name, it does not appear to originate from the synovium. Leiomyosarcomas are malignant tumours with smooth muscle differentiation that are defined as a single group based on morphology and immunohistochemical examination. However, they exhibit a wide range of clinical behaviour that appears to be partly related to their site of occurrence, with tumours of the retroperitonium having poorer prognosis than tumours of the uterus and extremities (Weiss and Goldblum, 2001). Malignant fibrous histiocytoma (MFH) until recently was a major diagnostic category for sarcomas. However, the perceived absence of defining clinical or histopathological criteria has led to the proposal that MFH is not a single entity but rather a heterogeneous collection of poorly differentiated sarcoma types, the majority of which might be suitably recategorised into other tumour groups, including leiomyosarcomas, if suitable markers were in hand (Fletcher, 1992).
Since the overall behaviour of a cancer must be determined by the expression of the genes within it, it should be possible to use cDNA microarray technology to classify tumours and identify sets of genes whose expression define individual tumour groups. This approach has, for example, already been used by Alizadeh et al (2000) to identify two new subgroups of diffuse large B-cell lymphoma that had distinct clinical behaviour, and by Golub et al (1999) to distinguish acute myeloid leukaemia (AML) and acute lymphoblastic leukaemia (ALL). Recently, Nielsen et al (2002) have used data obtained using microarrays to molecularly characterise soft tissue tumours. These results showed that synovial sarcomas, gastrointestinal stromal tumours (GISTs), neural tumours and a subset of leiomyosarcomas showed distinct gene expression patterns. In the current study, we have used cDNA microarrays to investigate the gene expression profiles for synovial sarcomas, MFHs and leiomyosarcomas. Our result also show that synovial sarcomas clustered together exhibiting a characteristic gene expression profile, but the set of genes whose increased expression define this group is quite distinct from that described by Nielsen et al (2002). In addition, we have identified a second major cluster of tumours that contains both MFHs and leiomyosarcoma and we provide evidence for the existence of a subset within the MFH tumour category.

MATERIALS AND METHODS
Tumour and control RNA preparation Sarcoma tissues were collected from patients undergoing surgery. Diagnoses were carried out by pathologists using conventional criteria, immunohistochemistry and electron microscopy. The tumour samples were snap-frozen in liquid nitrogen and stored at À801C until RNA extraction. As a common reference sample in hybridisations, total RNAs from a combination of six cell lines including three sarcoma cell lines (HTB-175, HTB-115, CCL-121, CCL-224, T91-95 and HB4a) were used. HTB-175 (small cell lung cancer), HTB-115 (leiomyosarcoma), CCL-121 (fibrosarcoma) and CCL-224 (colorectal adenocarcinoma) were obtained from the American Type Culture Collection. T91-95 (alveolar rhabdomyosarcoma) was obtained from Dr T Gordon (Institute of Cancer Research, Sutton, UK). HB4a (immortalised human mammary luminal epithelial cell line) was obtained from Dr MJ O'Hare. Cells were grown according to the suppliers' instruction. Total RNA was extracted from the tumours and cell lines by the TRIZOL s method (GibcoBRL, Invitrogen, Paisley, UK).

cDNA microarray slide preparation and RNA labelling
Microarray slides were gridded with the 'ICR-geneset' that consisted of 5603 I.M.A.G.E cDNA clones (including 169 duplicates) acquired from the UK Human Genome Mapping Project Resource Centre and Research Genetics (http://www.resgen.com). Information on the geneset can be found at http://www.icr.ac.uk/array/ array.html. The preparation of the microarray slides including gridding and blocking were as described in Clark et al (2002).
Total RNA was labelled by reverse transcription using Superscript II (Invitrogen, Paisley, UK) using Cy5-or Cy3-labelled dCTP. Cy5 was used for labelling tumour RNA samples, while Cy3 was used for labelling control cell line pool RNA. Total cellular RNA (4 mg) was reverse transcribed overnight at 371C with 400 U Superscript II (Invitrogen) with 500 mM dGTP, dATP and dTTP, 200 mM dCTP, 100 mM DTT, 100 mM Cy5-or Cy3-labelled dCTP (Amersham, Bucks, UK) and 30 mM random primer (5 0 -IIINNNNNN-3 0 , where I is inosine) (100 Â excess) in a 20 ml reaction of 1 Â first-strand buffer (Invitrogen). Labelling reaction was stopped by the addition of EDTA (pH 8.0) to 90 mM. Cot-1 DNA (12.5 mg) (Invitrogen) was added. The sample was heated at 701C for 10 min in 40 mM NaOH. 0.5 Â SSPE (Sigma, Dorset, UK) (400 ml) was added and the sample was filtered through a 0.1 mM ultrafree-MC filter column (Millipore, Billerica, MA, USA). Sample volume was then reduced to 30 ml using a Microcon YM-30 filtration unit (Millipore). A 400 ml volume of 0.5 Â SSPE (Sigma, Dorset, UK) was added and this process was repeated twice with a final reduction in volume to 15 ml.

Microarray hybridisation
Prehybridisation of the microarray slide was performed by adding 500 ml of prehybridisation mix (6 Â SSPE pH 7.4, 12.5 mM EDTA pH 8.0, 0.1% (vv À1 ) Tween 20) on the slide and incubating at 651C overnight in a sealed humid box. The microarray slide was then washed in 4 Â SSPE, 10 mM EDTA for 1 min; 2 Â SSPE, 10 mM EDTA for 1 min; and 0.1 Â SSPE for 1 min and drained on a rack. The microarray slide was then submerged in 70% (vv À1 ) deionised formamide, 2 Â SSC pH 7.0 at 651C for 1 min (denaturation). Slides were then rinsed twice with 70% ethanol, then with 80 and 100% (vv À1 ) ethanol, blown dry with canned air (RS Components, Northants, UK) and prewarmed to 371C in a hybridisation chamber (BDH Precision Engineers, Cambridge, UK) for 30 -60 min. The labelled samples (from 4 mg each of total RNA from tumour and control pool) were made up to 50 ml in hybridisation mix (6 Â SSPE pH 7.4, 12.5 mM EDTA pH 8, 0.1% (vv À1 ) Tween 20). This mixture was heated to 991C for 2 min and then at 651C for 3 h. The mixture was then filtered through a 0.1 mM ultrafree-MC filter column (Millipore, Billerica, MA, USA). The filtrate was heated to 991C for 2 min, then at 651C for 10 min, and 371C for 10 min, pipetted onto a microarray slide and covered with a Hybrislip (22 mm Â 60 mm, Sigma, Dorset, UK). 6 Â SSPE (300 ml) was pipetted underneath the slide, then the hybridisation chamber was sealed and incubated at 651C overnight. The slide was then soaked in 4 Â SSPE, 10 mM EDTA for 1 min at 321C until the coverslip fell off; and then washed with 4 Â SSPE, 10 mM EDTA for 1 min at 321C; 2 Â SSPE, 10 mM EDTA for 1 min at room temperature and 0.1 Â SSPE for 1 min at room temperature. The slide was then dried with canned air.
Hybridised microarray slides were scanned in a GenePix 4000A scanner (Axon Instruments, Foster City, CA, USA). Slides were scanned at photomultiplier tube (PMT) voltage levels that provided a Cy5 : Cy3 hybridisation ratio across the slide of roughly 1. Ratios of fluorescent intensities (Cy5 : Cy3) for individual cDNA were then determined after subtraction of background using the GenePix Pro 3.0 software (Axon Instruments, Foster City, CA, USA).

Analysis of microarray data
The scanned image was analysed with the GenePix Pro 3.0 software (Axon Instruments, Foster City, CA, USA). Fluorescent signals for both channels of the spots were determined. A local background in each channel was also determined for each spot, which is the median fluorescence of pixels in a halo surrounding the same array spot. Spots or areas of array with defects were flagged bad and were excluded from subsequent analysis. To enhance the reliability of the expression data, another round of quality filtering was performed. Spots with fluorescent spot intensity in each channel that were more than 1.4 times the local background (medians) of that channel were considered well measured (Alizadeh et al, 2000), and the data were further filtered to include only these spots. The median background intensity was subtracted from the median spot intensity to generate the background-corrected signal intensity for use in further analysis.
Further analyses including cluster analysis were performed using the GeneSpring software (Silicon Genetics, Redwood City, CA, USA). Fluorescent intensity ratios of Cy5 : Cy3 for individual spots of the filtered data were determined by dividing the background-corrected intensity for the Cy5 by that of the Cy3 channel. These ratios were then normalised by making the median of all measurements in each sample to be 1. The resulting ratios were further normalised so that the median of all measurements taken for a particular gene is 1. In order to better explore the differences between the samples, a subset of genes showing normalised expression ratios of above 2 in at least three of the samples or below 0.5 in at least three of the samples were selected. Hierarchical clustering was then applied to the log-transformed data for these genes, using average-linkage clustering with Pearson's correlation as the similarity metric. To ensure that potentially important genes were not excluded, the selection criteria used was slightly less stringent than those adopted by Nielsen et al (2002): their clustering studies were carried out on a subset of genes with an absolute value of fluorescence ratio at least three times greater than the geometric mean ratio of specimens looked at, in at least two arrays.

RESULTS
Hierarchical clustering of soft tissue sarcoma expression profiles cDNA microarrays containing 5603 I.M.A.G.E. cDNA clones (Clark et al, 2002) were used to obtain expression profiles for 27 soft tissue sarcomas including nine synovial sarcomas, nine leiomyo-sarcomas and nine MFH tumours. We had previously established the high reliability of these microarray procedures for identifying overexpressed genes (Clark et al, 2002). In each experiment, Cy5labelled sarcoma cDNA was cohybridised with Cy3-labelled reference cDNA from pooled human cell lines that served as an internal standard for the comparison of different experiments. Following filtering and normalisation, average-linkage hierarchical clustering analysis was performed on the data set. As we were specifically interested in the expression differences that may exist between the different sarcomas, a subset of 833 genes that showed the most variation in expression among the tumours was used in the cluster analysis. The resulting dendrogram is shown in Figure 1. The tumours separated into four clusters. Interestingly, the clustering of the synovial sarcomas corresponded best to histological diagnosis. Eight of nine synovial sarcomas clustered together in a distinct group. A second cluster was composed of a mixture of five leiomyosarcomas and five MFH tumours. The third cluster contained a small subset of MFHs, while the fourth group contained mostly leiomyosarcomas (four) together with an MFH and the outlying synovial sarcoma.

Gene clustering analysis
Gene clustering groups together genes whose expression patterns vary in a similar way among the sarcomas examined (Figure 1, vertical axis). In particular, these analyses identified a synovial sarcoma cluster of 48 sequences representing 44 different genes that appeared to have increased expression in synovial sarcoma compared to leiomyosarcomas and MFHs (Table 1). Each of the 48 sequences in this cluster was resequenced to confirm its identity. Application of the Wilcoxon -Mann -Whitney test utilising the Benjamini and Hochberg false discovery rate to correct for multiple testing (corrected P ¼ 0.01) confirmed that 36 of these showed statistically significantly different levels of expression in synovial sarcomas compared to sarcomas in other groups. The single synovial sarcoma that had a very different expression profile from other cases of synovial sarcoma was excluded from this analysis.
The observation that SSX4 is included within the synovial sarcoma gene set flagged in these analyses demonstrates proof of principle. Fusion of SYT to SSX1, SSX2 and SSX4 causes inappropriate transcription of SSX sequences in synovial sarcoma that is characteristic of this tumour group (Crew et al, 1995;Skytting et al, 1999). Although SSX1 and SSX2 were not present on the microarray, they contain significant regions that exactly match SSX4 sequences and their transcripts would be likely to crossreact with SSX4 in these microarray studies. The genes present in the synovial sarcoma represent many functional groups. For example, they included genes implicated in embryonic development (FGF9), transcriptional regulation (SSX4, NCOA3), cell signalling (EFNB1) and cellular adhesion (CDH1, ICAM1). Interestingly, a gene encoding receptor for the drug cyclosporin A (PPIF) was overexpressed in synovial sarcoma.
The MFH subgroup appeared to be characterised by the increased expression of a set of 21 genes (Table 2). This subgroup is potentially very interesting despite its small size (three tumours), particularly when considered together with the existence of the mixed leiomyosarcoma/MFH cluster. Further analysis of a larger set of MFHs may still be required to confirm its existence and to assess the significance of these genes. The hierarchical cluster that contained five leiomyosarcomas and five MFHs was characterised by low expression of a set of 202 genes (Figure 1). These genes are listed at http://www.icr.ac.uk/array/array.html (Table A).

DISCUSSION
Genome-wide analysis of gene expression using microarray technology is proving an important aid in the molecular diagnosis and classification of human malignancies including leukaemias and lymphomas (Golub et al, 1999;Alizadeh et al, 2000), breast cancer (Perou et al, 2000;Sorlie et al, 2001) and melanoma  (Bittner et al, 2000). In the current study, we have used a cDNA microarray technique to obtain expression profiles for three diagnostic categories of adult soft tissue sarcomas: synovial sarcomas, leiomyosarcomas and MFHs. Our results show that most (eight out of nine) synovial sarcomas clustered as a single group based on their gene expression portrait and that this cluster was characterised by the raised expression of a set of 48 genes. Nielsen et al (2002) obtained expression profiles of 41 sarcomas including eight synovial sarcomas using either 22 or 42k gene microarrays. In common with our analyses this group found that synovial sarcomas exhibited a distinct expression profile and identified a set of genes (104 genes, representing 89 different genes) whose overexpression appeared to be characteristic of synovial sarcomas. Remarkably, there was very little overlap between the genes in this set and those in our synovial sarcoma gene cluster: indeed the only gene in common was SSX4. It is noteworthy that only 20 of the 44 distinct genes listed in Table 1 were included in the genes used by Nielsen et al (2002) for their clustering analyses of 41 sarcomas and that only 11 of the 89 distinct genes in the synovial sarcoma cluster of Nielsen et al (2002) were present in the set of 833 genes selected for our clustering studies. The lack of correlation may also partly reflect the need to examine much larger series of individual tumours in microarray studies before coming to a firm conclusion on the identity of the genes whose over or underexpression define a tumour group. This is illustrated by the gene ATP1B2 (Table 1) that had consistently increased expression in our eight clustered synovial sarcomas, but was probably not selected by Nielsen et al (2002) because elevated expression was observed in a lower proportion of their synovial sarcomas. Conversely BMP7, present in the synovial sarcoma cluster of Nielsen et al (2002), was upregulated in only a low proportion of our synovial sarcomas.
Other genes, such as GFRA1, BMP4 and IGF2 that were present in our cluster, were probably not selected by Nielsen et al (2002) because they were also upregulated in GIST tumours, a category not examined in our study. EGFR present in the synovial sarcoma cluster of Nielsen et al (2002) was not selected in our analysis because, although upregulated in five of six clustered synovial sarcoma that had data for this gene, it was also upregulated in one leiomyosarcoma and four MFHs. It is also noteworthy that in contrast to the study of Nielsen et al (2002), our analysis did not distinguish two separate groups of leiomyosarcomas. This might be related to the fact that only seven of the genes (24 clones representing 20 different genes) that distinguished their 'calponin' subgroups were present in the group of 833 genes used in our clustering studies. We have identified a mixed cluster containing both MFH tumours and leiomyosarcomas that was characterised by low expression of a set of 202 sequences. When considered together with our preliminary evidence for a subcategory of MFH tumours, this observation could be considered to support the proposal of Fletcher (1992) that MFH does not exist as a single diagnostic category and that many MFH should be reclassified into groups with other soft tissue sarcomas, including leiomyosarcomas.
During immunohistochemical diagnosis the detection of markers such as keratin, epithelial membrane antigen (EMA) and Bcl2 are characteristic of synovial sarcoma (Fisher, 1998). In the current study, we have confirmed that analysis of microarray expression profiles can also be used to group most synovial sarcomas in a single cluster. In addition, groups of genes that could potentially be used in differential diagnosis or have implications in the development of sarcoma were identified. We have also provided evidence on heterogeneity of MFH from a gene expression perspective that may aid in the development of a definitive classification scheme for soft tissue sarcomas.