Protein expression patterns in primary carcinoma of the vagina

Protein patterns in six samples from primary vaginal cancers, in five from normal vaginal tissue and in five primary cervical cancers, were analysed using two-dimensional polyacrylamide gel electrophoresis (2-DE). Protein expression profile was evaluated by computer-assisted image analysis (PDQUEST) and proteins were subsequently identified using matrix-assisted laser desorption/ionisation mass spectrometry. The aim was to analyse the protein expression profiles using the hierarchical clustering method in vaginal carcinoma and to compare them with the protein pattern in cervical carcinoma in order to find a helpful tool for correct classification and for increased biomedical knowledge. Protein expression data of a distinct set of 33 protein spots were differentially expressed. These differences were statistically significant (Mann–Whitney signed-Ranked Test, P<0.05) between normal tissue, vaginal and cervical cancer. Furthermore, protein profiles of pairs of primary vaginal and cervical cancers were found to be very similar. Some of the protein spots that have so far been identified include Tropomyosin 1, cytokeratin 5, 15 and 17, Apolipoprotein A1, Annexin V, Glutathione-S-transferase. Others are the stress-related proteins, calreticulin, HSP 27 and HSP 70. We conclude that cluster analysis of proteomics data allows accurate discrimination between normal vaginal mucosa, primary vaginal and primary cervical cancer. However, vaginal and cervical carcinomas also appear to be relatively homogeneous in their gene expression, indicating similar carcinogenic pathways. There might, further, be a possibility to identify tumour-specific markers among the proteins that are differentially expressed. The results from this study have to be confirmed by more comprehensive studies in the future.

Primary carcinoma of the vagina (PCV) is a rare disease affecting predominantly postmenopausal women (Pecorelli, 2001). Histologically, the majority of PCV consist of squamous cell carcinomas (Pecorelli, 2001). Owing to the rarity of this disease, little is known about the aetiological and prognostic factors. Like cervical carcinomas, PCV has been shown to be associated with HPV, but only in about 50% of the cases (Daling and Sherman, 1992;Hildesheim et al (1997)). The prognosis for PCV is quite poor with an overall 5-year survival rate of about 50%, which is worse than for cervical carcinoma (Pecorelli, 2001). Early detection is crucial for the prognosis.
It has been suggested that vaginal and cervical carcinomas have common aetiology since vaginal tumours often occur as second primary malignancy in patients with a history of cervical dysplasia and/or neoplasia or hysterectomy due to these disorders (Choo and Anderson, 1982;Benedet et al, 1983;Brinton et al, 1990;Eddy et al, 1991;Kirkbride et al, 1995). In the clinical situation, it is sometimes difficult to discriminate between cervical and vaginal carcinomas, especially in patients with prior cervical disease. As 95% of the recurrences of cervical carcinoma occur within 5 years, many authors have chosen this limit for the distinction between a recurrent cervical carcinoma and a new primary vaginal carcinoma. Correct diagnosis is of importance for the choice of therapy, prognosis and follow-up. The treatment of choice for primary cervical carcinoma is surgery, sometimes in combination with radiotherapy and chemotherapy, whereas radiotherapy alone is the most common treatment for vaginal carcinoma. The treatment and prognosis of especially vaginal, but also cervical, carcinomas mainly depend upon crude histopathological and clinical findings. There is, thus, a need for additional sensitive markers of prognostic and therapeutic importance and for classification.
Proteomic studies are widely used in the search for new tumour markers.
Carcinogenesis is a multistep process leading to the development of multiple cell clones and heterogeneity as a result of tumour cell genetic instability.
Two-dimensional gel electrophoresis (2-DE) has been used to examine heterogeneity in gene expression in tissues from different tumours with a view to find tumour-specific molecular markers. With 2-DE, the complex polypeptide expression is analysed qualitatively as well as quantitatively. Significant differences in the polypeptide expression between tumour tissues and the corresponding normal tissues have been identified, for example in carcinoma of the bladder (Celis et al, 2000), breast (Franzen et al, 1997), colon-rectum (Stulik et al, 2001), lung (Schmid et al, 1995) and ovary (Alaiya et al, 1999), leading to a possibility to find tumour-specific biological markers.
Cluster analysis, which is a method to describe the similarity between samples based on their pattern of gene expression (Eisen et al, 1998), has enabled accurate classification of breast tumour tissues (Dwek and Alaiya, 2003).
The purpose of this study was to characterise the protein expression in PCV and to compare the protein profiles with normal vaginal tissue and primary cervical cancer by using 2-DE in order to point out similarities or differences that might be helpful in the diagnosis/differential diagnosis and that could be indicative for related/unrelated aetiology of these carcinomas.

Patient tissue samples
A total of 16 tissue biopsies (about 3 mm Â 3 mm) were analysed consisting of five biopsies from normal vaginal epithelium, six from primary vaginal carcinomas and five from primary cervical carcinomas. For histopathological data, see Table 1. We included the normal vaginal tissue to ensure effective comparison with vaginal cancer. The cervical cancer samples were added in an attempt to elucidate similarities and differences between cervical and vaginal cancers at the proteome level. The 11 tumour biopsies were taken from patients with histopathologically confirmed diagnosis of either vaginal or cervical cancers.
In order to ensure sample representativity, the samples were taken by experienced gynaecologists and gynaecological surgeons. Each tissue sample was macroscopically examined and only representative, non-necrotic tissue samples were used. Furthermore, both cytological and histological evaluations of all the samples were made. Only cases in which both histological and cytological features corresponded with each other were included in the study. We did not focus on the HPV status in this study bearing in mind that the limited number of samples will not permit the drawing of any significant conclusion.
The five normal vaginal biopsies were obtained from the upper part of the vagina approximately 1 cm from the vaginal fornix in postmenopausal women undergoing total hysterectomy for either benign disease or endometrial/ovarian carcinoma. All the fresh tissue samples were snap-frozen in liquid nitrogen until further processing for 2 DE. All samples were obtained with patient consent. One of the vaginal cancer cases (V32T) had been treated with radiation therapy for squamous cell carcinoma of the cervix 35 years ago. None of the other vaginal and cervical cancer cases had a history of prior gynaecological cancer. None of the vaginal cancer cases had a history of vaginal or cervical dysplasia or hysterectomy.

Sample preparation
All the tissue samples were prepared according to a frozen tissue preparation method (Franzen et al, 1991), with slight modification. Briefly, whole tissue biopsies were kept frozen in liquid nitrogen and mechanically homogenised using a pestel and mortal. Each sample was then dissolved in 300 -500 ml lysis buffer containing 7 M urea, 2 M thiourea, 4% SDS, reducing agents and protease inhibitors. Protein concentration was determined using the Bradford method (Bradford, 1976).

Electrophoresis, scanning and image analysis
For each sample, the equivalent of 100 mg total solubilised proteins dissolved in 350 ml volume of rehydration buffer (2% (v v À1 ) IPGbuffer 4-7 linear) was loaded onto a 17 cm IPG-strip 4 -7 linear (Bio-Rad, Harcules, CA). This gives better resolution and better overview of protein spots across the entire chosen pH window. In addition, the linear gradient also gives a better estimation of the isoelectric point (pI). Isoelectric focusing was performed for each individual sample to a total of 45.5 kVh using Bio-Rad IEF unit (201C).
The second dimension was carried out in a 10 -13% gradient SDS gel, and proteins were visualised by silver staining (Rabilloud et al, 1994). After electrophoresis and staining, only high-quality gels were used. Occasionally, some samples had to be rerun in order to obtain comparable quality with other 2-D gels. Stained gels were scanned at 100 mm resolution using a laser densitometer, and data were analysed using the PDQUESTt software (version 7.1.0, Bio-Rad). Gel images were compared for qualitative and quantitative differences. Polypeptide quantities were calculated in parts per million (ppm) of the total integrated optical density.

Mass spectrometry
Protein spots with statistically significant variability in the expression pattern between normal vaginal epithelium, cervical and vaginal cancers were selected for identification by mass spectrometry.
Micropreparative gels for protein identification were prepared essentially like the analytical gels, except that larger amounts (750 mg) of total proteins were loaded and subjected to isoelectric focusing. Following 2-DE, gels were stained using Coomassie colloidal stain. The 2-D gels were analysed by PDQUEST software and spots of interest were manually excised using a clean sharp scalpel and transferred into an eppendorf tube. In-gel digestion for peptide mass fingerprint analysis was carried out manually with trypsin (Oppermann et al, 2000), and digests were desalted using Zip Tip (Millipore) as recommended by the manufacturer. Peptides were eluted in 70% acetonitrile/5% formic acid. The eluate was mixed 1 : 1 (v v À1 ) with a saturated matrix solution containing a-cyano-4-hydroxycinnamic acid in 30% acetonitrile/ 0.1% trifluoroacetic acid. Mass mapping of tryptic peptides was performed using MALDI-TOF (above protocol) or Cap-LC-MS/MS on Micromass Q-TOF Ultima mass spectrometer with LC-packings pep Map C18, 75 mm ID column using a gradient of 7 -80% (95% acetonitrile and 0.1% formic acid) over a period of 35 min. Trypsin fragments of masses 842.50 and 2211.10 Da were used as internal standards for spectra calibration. Data generated were screened in databases using a mass tolerance p20 ppm. The licensed ProteinLynxt Software (Micromass) or mascot was used for mass mapping (http://www.matrixscience.com).
The above protocol of MALDI-TOF analysis has a sensitivity of femtomole amounts of standard 2-DE gel-separated proteins. For a positive identification of the peptide mass fingerprinting, protein scores greater than 72 were considered significant (Po0.05), as calculated by the MASCOT scoring algorithm. In addition, at least four matching peptides should be found and more than 50% of the measured masses should match the theoretical peptide fragments.

Data processing/data analysis
Both quantitative and qualitative 2-DE data sets were generated from PDQUEST, a 2-DE software analysis program.
The data set generated from the matchset based on each individual sample was imported into J-Express as an Excel test format in the form of a data table, with rows representing gels and columns representing spots (Alaiya et al, 2000b). The preprocessed data were analysed by hierarchical clustering (Golub et al, 1999;Alaiya et al, 2002) using the J Express pro software v 2.1 available at http://www.molmine.com.
The J-Express program was primarily designed to analyse microarray data but equally accepts data sets generated from 2-DE analysis.
Hierarchical cluster analysis is a statistical method that is based on measured variables capable of identifying relatively similar groups of samples. This method is based on the strong assumption that an appropriate distance measure for comparing cases has been carefully selected. Thus, the outcome of the clustering analysis depends on the method of calculation of the distance between samples being compared. In this study, the degree of similarity was calculated using the Bray Curtis distance metric and a complete linkage clustering method. The clustering patterns are then represented diagrammatically as dendrograms with trees and branches depicting the degree of sample relatedness. The sets of genes used in the cluster analysis were selected using Student's ttest and the Mann -Whitney ranked test analysis (Po0.05) between normal vaginal tissue and vaginal cancer samples. A similar analysis was made between groups of primary vaginal cancer and primary cervical cancer. These variables were then used for the classification of the samples into different groups.
Both quantitative and qualitative differences were taken into account for the statistical analysis.

Correspondence analysis (CA)
We have used correspondence analysis to further evaluate the same data sets used in hierarchical cluster analysis. This was considered as a means to test if the observed set of genes can indeed discriminate the sample groups, bearing in mind the small sample size of this study.
Correspondence analysis is a computational method that is similar to principal component analysis (PCA) with potential to study association between groups of samples based on selected variables.
The data being subjected to CA is presented as two-dimensional graphical display. This method is capable of visualising different structures within a complex data set.
The principle behind the CA is an attempt to group together objects that are similar while dissimilar objects are separated off. The degree of similarity or difference is measured by distances between objects or groups of objects. The analysis has been used to evaluate different complex microarray data (Fellenberg et al, 2001).

RESULTS
Variation in protein expression between normal vaginal tissue, vaginal cancer and cervical cancer Tissue samples from 11 cancer patients and five normal vaginal tissues were evaluated. The clinical characteristics of the samples are presented in Table 1. Cells were prepared from fresh-frozen biopsies and extracts were prepared and analysed by 2-DE for both qualitative and quantitative differences in the expression of multiple polypeptides. An average total number of 1373 spots were resolved on 18 Â 20 cm 2-D gels and between 75 -82% of the spots were matched between all the gels. Gel spots were visualised using silver staining.
Marked quantitative and qualitative changes were observed in the protein expression pattern between normal samples, vaginal cancer and cervical cancer samples. In contrast, differential protein expression data revealed similar expression profiles comparing vaginal and cervical cancer samples compared with normal samples (data not shown). This similarity in protein expression between vaginal and cervical cancers was observed using the correlation analyses between pairs of samples. When pairs of vaginal and cervical samples were compared, an average correlation coefficient of 0.68 was observed, compared with 0.62 and 0.55 for pairs of normal vs vaginal cancer and normal vs cervical cancers, respectively (Table 2). No significant variation was observed between group correlation among pairs of vaginal cancer and cervical cancer samples with correlation coefficients of 0.76 and 0.79, respectively.
As shown in Table 1, the materials included one vaginal adenocarcinoma and one cervical adenocarcinoma. Pairwise comparison of the adenocarcinoma of the vagina and the adenocarcinoma of the cervix, with the respective squamous cell carcinomas, did not show any significant difference in the correlation coefficient analysis (data not shown).
However, this is in line with the high degree of similarity found between different subtypes of common epithelial ovarian tumours where a relatively large number of samples were analysed (Alaiya et al, 1999).
Representative 2-DE maps from normal vaginal tissue, vaginal cancer and cervical cancer are shown in Figure 1.

Cluster analysis of differentially expressed proteins in normal vaginal tissue, vaginal cancer and cervical cancer
A total of 67 proteins were differentially expressed in normal vaginal tissue and vaginal/cervical cancers. The differential analysis takes into consideration both qualitative and quantitative changes observed between two sets of samples. This difference was statistically significant using Mann -Whitney analysis (Po0.05). A similar analysis was carried out for the three groups of samples using Student's t-test analysis, and 94 protein spots differed significantly.
We have used two simple methods of statistical analysis to select the variables that may discriminate the three groups of samples, and proceeded to use these two separate data sets for possible classification of the samples into their respective groups. The samples were correctly classified using the hierarchical cluster analysis (data not shown).
In an effort to reduce the data set to a reasonable number, we further examined how many protein spots fall in the intersection of the two data sets, resulting in 33 spots common to both data sets. Of these 33 protein spots, only 11 were upregulated in both cervical and vaginal cancers, whereas the remaining 22 spots were downregulated compared with normal vaginal tissue samples. The differential expressions of some of these protein spots are shown in Figure 2.
The 33 spots were used in the cluster analysis of all the samples. As shown in Figure 3a, all the samples were correctly classified.
Owing to the small sample size of this study, we have used correspondence analysis to evaluate the same data sets used in hierarchical cluster analysis. We have used this as a means to test if the observed set of genes can indeed discriminate the sample groups. As shown in Figure 3b, the samples clustered distinctively, and the relatedness of each sample to each other was presented in a two-dimensional correspondence analysis plot.
This type of analysis allows the identification of potential protein spots that contribute to the overall clustering of the samples.

Classification of vaginal and cervical cancer
A total of 23 protein spots were significantly differentially expressed between pairs of 2-DE gels from only vaginal and cervical cancers using both the Mann -Whitney and the t-test (Po0.05). The expression level of this set of 23 protein spots was used to classify all the samples. Interestingly, all the samples could be correctly classified into three distinct groups (normal tissue, vaginal and cervical cancer), Figure 4.

Identification of differentially expressed polypeptides by mass spectrometry
Protein spots with statistically significant variability in the expression pattern between normal vaginal epithelium, cervical and vaginal cancers were selected for identification. Some of these proteins were identified through matching with 2-DE maps of proteins already identified, using bench top MALDI-TOF mass spectrometry. One obvious limitation of working with clinical samples is getting sufficient material for detailed analysis. Therefore, the majority of the protein spots in the data sets for cluster analyses could not be easily identified.
Among the protein spots so far identified are high molecular weight Tropomyosin 1, cytokeratins 5, 15 and 17, Apolipoprotein A1, Annexin V, Glutathione-S-transferase. Others are the stressrelated proteins, calreticulin, HSP 27 and HSP 70. Some of the identified protein spots are shown in Figure 1.

DISCUSSION
This is the first proteomic study concerning vaginal carcinoma in the literature. As vaginal carcinoma is a rare disease, the numbers of samples collected in this study are quite few.
In this investigation, we have used hierarchical cluster analysis based on the protein expression in 2-DE to classify vaginal carcinoma. All samples could be correctly classified into three distinct groups (normal tissue, vaginal and cervical cancer). One of the vaginal cancer cases (V32T) had a history of cervical cancer 35 years ago. This case was originally classified as a new primary vaginal carcinoma and not as a recurrent cervical carcinoma due to the long interval between the two carcinomas. In our study, this classification is supported by the results from the cluster analysis, where this vaginal cancer case was classified as a vaginal carcinoma (Figures 3 and 4).
Interestingly, pairs of vaginal cancer and cervical cancer showed to be relatively homogeneous in their protein expression. Studies from ovarian carcinoma have shown large heterogeneity between pairs of different ovarian carcinomas with a correlation coefficient of 0.54 (Alaiya et al, 1999). Studies of breast carcinoma have likewise shown large intertumoural heterogeneity, with a correlation coefficient of 0.57 for diploid tumours and 0.48 for aneuploid tumour (Franzen et al, 1996). Consequently, pairs of vaginal and cervical carcinomas seem to be more homogeneous than pairs of ovarian and breast carcinomas. This might point at similar genetic alterations and pathways in the carcinogenesis for vaginal and cervical carcinomas. This hypothesis is supported by a recent study by Habermann et al (2003) where comparative genomic hybridisation was used to analyse the pattern of genomic imbalances in vaginal squamous cell carcinomas, and revealed that 70% of vaginal carcinomas carry relative increases in copy number that map to chromosome arm 3q. As almost all squamous cell carcinomas of the uterine cervix contain extra copies of chromosome arm 3q (Heselmeyer et al (1996)), the pattern of genomic imbalances in PCV is strikingly similar to the one observed in cervical carcinomas. According to a recent study by Hellman et al (2003), there seem to be two types of vaginal carcinoma with age-related aetiology: one type occurring at younger age with aetiological factors similar to cervical carcinoma and another type occurring at older age with different aetiology. This might explain why some vaginal and cervical carcinomas seem to be more homogeneous in their gene expression, whereas others are more heterogeneous.
Previous studies have described marked variations in the expression of cell cycle-related proteins, stress proteins and members of cytoskeletal proteins between benign and malignant epithelial tumours of lung, breast, ovary and prostate gel-separated proteins (Alaiya et al, 2000a;Bergman et al, 2000). Similarly, in this study, we observed high expression of HSP 27, GST and Apolipoprotein A1 in both cervical and vaginal cancers compared with normal vaginal tissue. In contrast, CK 17, a member of a family of intermediate filament proteins that are characteristic of epithelial cells as well as tropomyosin 1 (TM 1), were upregulated in normal vaginal tissue but not in both vaginal and cervical cancer. Other proteins identified without significantly altered expression levels between the three sample groups are annexin V, actins, calreticulin and Stratifin, a member of the 14-3-3 family proteins.
This observation may indicate that some proteins that are differentially expressed between benign and malignant epithelial tumours may not be similarly altered in some other epithelial tumours such as squamous cell tumours of the vagina and the cervix. The finding of similar expression pattern of some sets of proteins in both squamous cell tumours and other epithelial malignancies may indicate their potential use as markers of malignancy. In this study, we found that 23 spots enabled clustering of almost all of the samples. This set of proteins is evidently interesting for further studies in the search for potential markers, and may give better insight into the aetiology and progression of vaginal and cervical cancers.
According to an earlier study in ovarian carcinoma, cluster analysis of a set of differentially expressed proteins also could be used as a prognostic tool (Alaiya et al, 2002). However, this material is too small for survival analysis but might be useful in future studies.
Most methods of statistical analysis are capable of identifying potential marker variables that show significant differential expression between two or more sets of sample groups. However, data sets used in making predictions between two sample groups may potentially be susceptible to data over-fit. This problem is obvious if there are no real biological differences and if the samples being compared are relatively small. It would, therefore, be interesting to test the set of genes used in the learning data to determine whether it can truly differentiate between the two groups when new samples are added. Unfortunately this was not possible to test in the present study because of the small sample size. However, the observed result from the correspondence analysis is in keeping with the cluster analysis data. Despite the limited sample size, the observed result is encouraging and warrants further validation studies.
In conclusion, we have used 2-DE to study protein expression profiles of vaginal and cervical tissue samples and found that hierarchical cluster analysis allowed accurate discrimination between normal vaginal, vaginal and cervical cancer tissue specimens. This study thus indicates that cluster analysis might be utilised for correct classification of the tumours. Further, there might be a possibility to find tumour-specific markers among the differentially expressed proteins.
Vaginal and cervical carcinomas were also found to be quite homogeneous in their protein expression, which might indicate similar aetiological pathways.

ACKNOWLEDGEMENTS
We are grateful to Ms Birgitta Sundelin and Susanne Becker for technical assistance. This work was supported by the Cancer Society in Stockholm, the Swedish Cancer Society and the Gustav V Jubilee Foundation.   Figure 4 (A) Cluster analysis of normal, vaginal cancer and cervical cancer samples using expression data set from 23 polypeptides. (B) Correspondence analysis plot of same data set; Blue ¼ normal, red ¼ vaginal cancer and green ¼ cervical cancer.