Main

Breast cancer is the most frequently diagnosed cancer, and the leading cause of cancer death, in women worldwide (Jemal et al, 2011), with the lifetime risk of developing breast cancer estimated to be 1 in 8 in Western countries (Feuer et al, 1993). Patient survival has increased steadily over recent decades, attributable in part to advances in both mammographic screening (Kopans, 2011) and adjuvant systemic treatment protocols (Peto et al, 2012). Whereas pathological features such as tumour size, node positivity, hormone receptor positivity, and human epidermal growth factor receptor 2 (HER2) overexpression have been used to guide clinicians’ prescription of adjuvant therapy, true personalised medicine requires the development of better biomarkers of risk and response to therapy.

Gene expression profiling is emerging as a tool for classifying breast cancers, guiding therapy, and predicting treatment responses (Cheang et al, 2008; Haas et al, 2011). However, genome and transcriptome analyses alone provide only a partial picture, as alternative splicing of mRNA, combined with more than 100 unique post-translational protein modifications, mean that each gene may give rise to multiple protein species (Banks et al, 2000).

Analysing the proteome may provide a more dynamic reflection of the impact of the cell’s genetic programme on its immediate environment (Aebersold et al, 2005). Cancer proteomics encompasses the identification and quantitative analysis of differentially expressed proteins relative to healthy tissue counterparts at different stages of disease. Proteomic technologies can also be used to identify markers for cancer diagnosis, to monitor disease progression and efficacy of therapy, and to identify new therapeutic targets (Srinivas et al, 2001).

Surface-enhanced laser desorption/ionisation time-of-flight (SELDI-TOF) mass spectrometry (MS) is a high-throughput proteomic method that involves solid-phase extraction of subsets of the proteome before analysis by TOF MS (Callesen et al, 2008). It has the ability to rapidly analyse hundreds of samples, essential for obtaining biologically and statistically relevant data in medical proteomic research. A recent review of protein profiling studies of breast cancer demonstrates that, despite a considerable diversity among these studies, there is a pattern of conformity developing, with increasing numbers of studies reporting similar peaks in protein profiles (Galvao et al, 2011). This suggests convergence to a set of common discriminatory peaks for breast cancer, with reproducibility across different clinical studies.

In this study we have employed SELDI-TOF MS to discover tissue biomarkers of breast cancer, and validate them on an independent sample set. We have used two immunological methods to verify the identified proteins. Finally, the expression levels of these proteins have been associated with clinical pathological variables in order to explore their potential value in breast cancer classification and prognosis.

Materials and methods

Patient samples

The study involved 404 patient samples comprising 202 pairs: breast tumour tissue (BC) and adjacent unaffected breast tissue (AT) from each subject. For the discovery phase, 102 sample pairs were obtained from the Kolling Institute Breast Tumour Bank at Royal North Shore Hospital, Sydney, Australia. For independent validation, 100 sample pairs were provided by the Australian Breast Cancer Tissue Bank, Sydney, Australia. All breast tissue samples were collected at the day of surgery with prior informed consent, and the study was approved by the Human Research Ethics Committee of the Northern Sydney Central Coast Area Health Service, Sydney, Australia. At the time of surgical resection, tissues were immediately taken to a pathologist, who sampled both the tumour itself and adjacent tissue of normal appearance. Both samples were snap-frozen in liquid nitrogen within 20 min of resection and stored at −80 °C. Oestrogen receptor (ER) and progesterone receptor (PR) were scored as either negative or positive by immunohistochemistry, using rabbit monoclonal SP1 (Biocare Medical, Concord, CA, USA) and mouse monoclonal Clone PgR636 (Dako, Carpinteria, CA, USA), respectively. The HER2 status was defined as positive or negative by immunohistochemistry using the HercepTest (Dako). Any equivocal result using this test was confirmed by FISH.

Tissue preparation

Approximately 20 mg of each tissue sample (BC or AT) was prepared for proteomic analysis by grinding with a mortar and pestle while frozen in liquid nitrogen, and then solubilising in 10 volumes of lysis buffer (9.5 M urea, 2% 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonate (CHAPS), and 1% dithiothreitol). Lysates were added to a QiaShredder spin column (Qiagen, Hilden, Germany) and centrifuged (12 000 r.p.m., 5 min) to remove insoluble material. Samples were applied to weak cation-exchange (CM10) protein chips (Bio-Rad Laboratories, Hercules, CA, USA) for immediate analysis as described below, or aliquotted and stored at −80 °C for future analysis. The protein concentration of each extract was determined by BCA Protein Assay (Thermo Scientific, Rockford, IL, USA).

Preparation of protein chips

The CM10 protein chips were pre-equilibrated twice with 5 μl of binding buffer (50 mM sodium acetate, pH 6.0) for 5 min. Protein extracts were diluted 1 : 5 with binding buffer and 5 μl of each diluted extract was pipetted onto the chips. All samples were run in duplicate. Chips were then incubated with shaking using a MicroMix 5 (settings: form 20, amplitude 4; EURO/DPC Instrument Systems, Flanders, NJ, USA) for 90 min at room temperature. Each spot was treated with 2 × 1 μl of 50% cyano-4-hydroxycinnamic acid in 50% acetonitrile containing 0.5% trifluoroacetic acid (TFA), and air dried.

Generation of MS profiles

Protein profiles were initially obtained using a PBSIIc protein chip reader (Bio-Rad Laboratories, Hercules, CA, USA), and in the latter part of the study, a SELDI Enterprise Edition protein chip reader (Bio-Rad). Mass spectra were generated for each sample in the mass/charge (m/z) range of 1000–30 000 with a laser intensity setting of 175 (arbitrary units). The laser was optimised for data collection between 1000 and 15 000 m/z, with detector sensitivity set at 8. Peaks <1000 m/z were deflected away from the detector. Data were averaged from 328 spectra evenly distributed across each spot. Mean values from duplicate spectra for each sample were used in all subsequent analyses. The m/z value for each peak was determined using external calibration with protein standards including bovine insulin (5734.51 Da), equine cytochrome c (12 362 Da), equine apomyoglobulin (16 952.3 Da), and bovine carbonic anhydrase (29 023.70 Da; Sigma-Aldrich, St Louis, MO, USA). After calibration, spectra were baseline-subtracted and normalised using the total ion current between 1500 and 15 000 m/z. Spectra that required a normalisation factor of >2 were repeated, and if the high normalisation factor persisted, these data were discarded. Peak detection was initially performed using Biomarker Wizard Version 3.2.2 (Bio-Rad Laboratories) on all peaks with signal/noise ratio ⩾5 and present in at least 10% of all spectra. Subsequently, all MS spectra were exported to ProteinChip Data Manger v4.1 used with the ProteinChip SELDI System Enterprise Edition (Bio-Rad) to refine the combined data analysis.

MS data analysis

Data analysis was designed in three stages. For initial discovery, biomarker panels were developed on the training data set using 102 BC and AT sample pairs. Cluster analysis was performed using Biomarker Wizard version 3.2.2 (Bio-Rad). Univariate analysis of individual peaks was performed by Mann–Whitney U-test using SPSS (Version 18.0, SPSS Inc., Chicago, IL, USA). All protein peaks that significantly discriminated BC from AT at P<0.001 were then subjected to multivariate analysis using forward and reverse binary logistic regression (SPSS) to develop the training model. The discriminatory power of each putative marker was further described using receiver operating characteristic (ROC) area-under-the-curve (AUC) analysis. To test protein panels that were best able to discriminate BC from AT, 10-fold internal cross-validation was used as previously described (Ambroise and McLachlan, 2002; Scarlett et al, 2006). External validation was carried using an independent set of 100 paired BC and AT samples.

After external validation, to consolidate and unify the initial discovery and validation data, a further analysis was performed on the combined data sets. This coincided with the acquisition of new peak cluster analysis software, ProteinChip Data Manager Version 4.1 (Bio-Rad). Similar to the initial discovery phase, both univariate analysis using nonparametric statistics and multivariate analysis using binary logistic regression were applied, confirming a final two-protein marker panel and allowing calculation of overall estimates of sensitivity and specificity, accuracy, and ROC AUC values. The final stage of data analysis was to re-evaluate the two-protein panel on the separate training and validation sets to ensure consistency between the findings from the new and original software packages. In this re-testing, all common peaks obtained from the combined data set study were used for each regression analysis to achieve classification of tumour samples separately in the training and validation sets.

Protein identification

For purification of the putative biomarkers, tissue lysates were fractionated using a cation-exchange resin (Mustang S, Pall Corp., Ann Arbor, MI, USA) with stepwise pH elution from pH 4 to pH 9 in a 96-well filter plate format (AcroPrep, Pall) as previously described (Chung et al, 2009). Proteins of interest in the eluates were monitored by SELDI-TOF MS on normal-phase (NP20) chips. Fractions containing an ∼8.5 kDa putative biomarker were further purified using reverse-phase liquid chromatography (LC) on a 250 × 4.6 mm Jupiter 5 μm 300-Å C18 column (Phenomenex, Lane Cove, Australia), eluted with a 35-min linear gradient from 15% to 60% acetonitrile in 0.1% TFA at 1.5 ml min−1, followed by separation on 12% SDS–PAGE detected with SYPRO ruby protein stain (Invitrogen, Eugene, OR, USA). Protein bands of interest were excised from the gel and analysed using both nanoLC-ESI-MS/MS and MALDI-TOF peptide mass fingerprinting by the Australian Proteome Analysis Facility (Macquarie University and University of New South Wales, Sydney, Australia). The protein peak at 9.2 kDa was purified and identified in a similar manner.

Immunological validation of protein markers

To detect ubiquitin and S100P by western blotting, BC and AT tissue extracts were separated by 12% SDS–PAGE and transferred to PVDF membranes (Bio-Rad). Membranes were blocked for 1 h at room temperature with 5% skim milk. Ubiquitin was detected by incubating the transferred membranes for 2 h at room temperature on a shaking platform with anti-human ubiquitin monoclonal antibody (R&D Systems, Minneapolis, MN, USA) in a 1 : 500 dilution in 5% skim milk. For S100P western blotting, samples were concentrated five-fold by centrifugal ultrafiltration with 3-kDa MW cutoff (Nanosep 3K Omega, Pall Corp.) before electrophoresis. This was necessary to increase detection sensitivity. Concentrated samples were separated and transferred, and membranes blocked, as described above, and S100P was detected by incubating overnight at 4 °C with rabbit anti-human antibody (Invitrogen) in a 1 : 500 dilution in 5% skim milk. Secondary antibody, peroxidase-linked anti-rabbit IgG (1 : 2000) was added for 1 h at room temperature and the protein bands were visualised by enhanced chemiluminescence using the SuperSignal West Pico Luminol/Enhancer solution (Thermo Scientific). Western blot data were imaged using the LAS 3000 imaging system (Fujifilm, Stamford, CT, USA) and the images were analysed with Multi-Gauge version 3.0 software (Fujifilm). The quantitative data were normalised to the loading control of β-actin, and analysed using the Wilcoxon signed-rank test (SPSS).

To confirm the identity of the m/z 8558 protein peak by protein chip immunocapture, pre-activated RS100 protein chips (Bio-Rad) were pre-coupled with 2 μg of monoclonal anti-human ubiquitin antibody (R&D) in 50 mM NaHCO3 buffer (pH 9.2) at 4 °C. The spots were washed with 50 μ M BSA to block the remaining active sites. Tissue lysates were diluted 1 : 5 in buffer containing 50% human serum in 0.1% Triton X-100 in PBS, spotted onto RS100 protein chips, and incubated for 2 h at room temperature on a shaker to achieve optimal binding. After washing with PBS, all spots were rinsed by 50 mM Tris-HCl, 1 M urea, 0.1% CHAPS, and 0.5 M NaCl, pH 7.2. After further washing in 5 mM HEPES, pH 7.2, the spots were coated with 2 × 1 μl of 50% sinapinic acid in 50% acetonitrile, 0.5% TFA, and air dried. The chips were then analysed on the SELDI-TOF MS. A His-tagged recombinant ubiquitin standard (10.6 kDa; R&D) was used as a control. The m/z 9226 protein peak was similarly verified using RS100 protein chips to confirm its identity as S100P. Before protein chip preparation, all tissue extracts were pre-concentrated as described above for western blotting. The RS100 protein chips were pre-coupled with 2 μg of rabbit anti-human S100P antibody (Invitrogen) in 50 mM NaHCO3 buffer (pH 9.2) at 4 °C. The samples were then treated and analysed as described above. His-tagged recombinant S100P (12.6 kDa; Novus Biologicals, Littleton, CO, USA) was used as a control.

Statistical analysis of clinical features

The association between levels of the two protein markers, individually and in combination, and tumour pathologic variables (tumour size, histological grade, lymphovascular invasion, lymph node involvement, ER and PR status, and HER2 expression) was examined using the Mann–Whitney U-test (SPSS). Subgroup analyses were also performed, in which lymph node-negative (n=84) or lymph node-positive (n=85) groups were analysed separately. Significance was set at P<0.05.

Results

Patient characteristics

A total of 202 pairs of tissue samples were used in this study, generating 808 spectra, of which 684 (duplicate spectra on 171 pairs of samples) were subjected to full analysis. Of the 102 pairs of samples selected for the training stage, 82 pairs were fully analysed. Of the remaining 20 pairs, 8 were excluded on clinicopathologic grounds: 4 had DCIS, 2 had neoadjuvant treatment, and 2 had recurrent tumours; a further 12 sample pairs were excluded when their mass spectra did not meet normalisation criteria. For the validation set of 100 samples pairs, 89 pairs of the subjects were analysed. Seven sample pairs were excluded on clinicopathologic grounds: 4 had neo-adjuvant therapy, 1 had metastatic disease, and 2 had recurrent disease; 3 sample pairs were lost during preparation; and 1 pair was excluded when the mass spectra did not meet normalisation criteria. The median age for the patients included in the training and validation sets was 60 (range 28–92) and 58 (range 27–85), respectively. The clinical pathologic characteristics of the tumours including histologic type and grade, size, presence of lymphovascular invasion (LVI), hormone receptor (ER and PR), HER2 status as well as lymph node status are presented in Table 1.

Table 1 Patient characteristics

Selection of protein biomarker panel by MS-based protein profiling

The training set sample pairs (BC and AT) were subjected to MS analysis in duplicate to identify putative protein biomarkers that could distinguish tumour from unaffected tissue. The 82 sample pairs whose spectra were amenable to normalisation yielded 328 spectra, from which 53 common peaks were determined by clustering analysis. Of these, 14 peaks (m/z 1337, 1705, 1842, 2033, 3790, 3804, 8346, 8548, 8599, 9205, 9239, 9292, 9641, and 12 220) were significantly differentially expressed (P<0.005, Mann–Whitney test). These individual putative biomarkers had ROC AUC values ranging from 0.70 to 0.84. The 14 peaks were tested in forward and reverse binary logistic regression analysis with 10-fold cross-validation. This produced a final panel of 3 peaks (m/z 1842, 8599, and 9292) that classified BC and AT, with ROC AUC of 0.87, as shown in Figure 1A (curve Ti).

Figure 1
figure 1

Performance of two protein peaks individually and in combination. (A) The ROC area-under-curve (AUC) after cross-validation was 0.87 (Ti) for the combination of peaks at m/z 1842, 8599 and 9292. For the independent validation sample set, the average value of ROC AUC was 0.91 (Vi). (B) Combination of the discovery and validation sets. The sensitivity and specificity of the combination peaks of m/z 8558 and 9226 were 77.2% and 88.9% with a ROC AUC value of 0.92. (C) Retesting of initial training and validation sets. The ROC AUC values for these tests were 0.86 (Tr) and 0.91 (Vr) for training and validation sets, respectively. (D) Mean peak intensity values±s.e.m. (normal vs cancer) for the marker at m/z 8558; (E) mean values±s.e.m. for the marker at m/z 9226, and (F) mean values±s.e.m. for the two markers combined. For the comparisons in (D–F), n=171, P<0.001.

Independent validation

The three putative biomarkers were tested using an independent validation set of 100 sample pairs, of which 89 pairs of spectra (in duplicate, 356 spectra) could be analysed after normalisation. For the validation set, 57 common protein peaks were determined by clustering analysis. Testing the three-protein panel derived from the training set on the independent sample set of 89 BC and 89 AT samples gave a ROC AUC of 0.91 (Figure 1A, curve Vi). The sensitivity and specificity were 80.9% and 91%, respectively, and overall accuracy was 90%.

Re-analysis of combined data sets

To increase the statistical power of the training and validation analyses and confirm the results using a newer software version, we combined the data sets into a single analysis of all 171 breast tissue sample pairs. Using new clustering analysis software, ProteinChip Data Manager Version 4, we found 28 peaks common to all spectra in the m/z range of 2500 to 15 000. Peaks of lower mass were excluded from this analysis because the putative marker at m/z 1842 had been determined by LC-MS/MS to be non-peptide in nature (data not shown). By univariate analysis (Mann–Whitney), the significant peaks (P<0.001) were selected with the additional criterion that individual ROC AUC was at least 0.80, as summarised in Table 2. Multivariate analysis using binary logistic regression again confirmed the two protein markers at m/z 8558 and 9226. The difference in m/z values from those determined in the initial training set analysis (m/z 8599, 9292) is larger than expected and may be attributable to the fact that they are averaged from 684 spectra (171 sample pairs in duplicate) rather than 328 spectra (82 sample pairs in duplicate), re-calibration of standard curves between the initial and subsequent analyses, the use of different analysis software, and the relative mass inaccuracy of this technique. Both protein peaks were elevated in BC tissue relative to AT. The sensitivity and specificity for the binary classification using the combined 2-marker panel were 77.2% and 88.9%, respectively, with a ROC AUC value of 0.92 (Figure 1B, curve C).

Table 2 Summary of data analysis

Re-testing of initial training and validation sets

For final confirmation of the potential two-marker panel, it was re-tested on the original separate training and validation sets. The sensitivity and specificity of the classification for breast tissue biopsy samples were 73.2% and 87.8%, respectively, in the training set, compared with 80.9% and 91% in the validation set. Their corresponding ROC AUC values were 0.86 (curve Tr) and 0.91 (curve Vr) for the training and validation sets, respectively (Figure 1C).

Together, these results suggest that two protein biomarkers in combination provide efficient discrimination between breast cancer tissue and healthy tissue. Figures 1D–F demonstrate the performance of the two protein peaks of m/z 8558 and 9226 alone and in combination. By paired sample t-test, a significant difference between BC and AT groups was found for each protein tested separately (Figures 1D and E, n=171, P<0.001). For the two-protein combined panel, the mean value was 3.3-fold increased in BC compared with AT samples (Figure 1F, n=171, P<0.001).

Identification and verification of putative biomarkers

Both proteins of m/z 8558 and 9226, retained by weak cation-exchange protein chips, were significantly increased in breast cancer tissue. For identification, initial purification was carried out using cation-exchange followed by reversed-phase HPLC. Eluted fractions were pooled and fractionated by SDS–PAGE, and bands of ∼8 kDa were excised for final identification by LC-MS/MS. Ubiquitin was identified from 6 peptides (two overlapping), giving 72% sequence coverage. The calculated mass of monomeric ubiquitin (8560 Da) was in good agreement with the consensus mass obtained experimentally with SELDI (m/z 8558). Similarly, analysis of the marker of ∼9.2 kDa identified it as a fragment or variant of S100P (10 400 Da) from two peptides, giving 24% sequence coverage relative to full-length S100P (Supplementary Figure S1). Notably, the two peptides found in this study were identical to those previously used to identify S100P in a MALDI-MS study of proteins upregulated in colorectal cancer (Lam et al, 2010).

Immunological verification of the two protein identities was performed using both western blotting and protein chip immunocapture. For ubiquitin, western blot confirmed differential expression of this protein between BC and AT tissue extracts. Figure 2A shows that for BC and AT samples from four randomly selected patients, relative overexpression of ubiquitin in the cancer samples was observed. When quantitated and analysed for eight randomly selected sample pairs, the increase in ubiquitin in BC was significant (Figure 2B, P=0.017, Wilcoxon signed-rank test). The identity of this protein as ubiquitin was also verified by immunocapture on RS100 protein chips (Figure 2C). The m/z 8558 peak, captured by immobilised ubiquitin antibody and displayed by SELDI-TOF MS, was increased in two BC samples in Figure 2Cii and iv compared with their corresponding AT samples in Figure 2Ci and iii, and absent when the capture antibody was nonimmune IgG (Figure 2Cvi). Figure 2Cv shows His-tagged recombinant ubiquitin (10.6 kDa) as a control.

Figure 2
figure 2

Immunological validation of ubiquitin and S100P. (A) For ubiquitin, four BC and corresponding AT extracts were analysed by immunoblotting, indicating relative upregulation of ubiquitin in some breast cancer patients. β-Actin is shown as a loading control. (B) Densitometric analysis of ubiquitin western blots of eight sample pairs. Box plot shows median and upper and lower quartiles; lines show maximum and minimum values. P=0.017, Wilcoxon signed-rank test. (C) Mass spectrometry (MS) spectra of proteins bound to immobilised mouse anti-ubiquitin antibody. Samples were (i) patient 1 normal tissue, (ii) patient 1 cancer tissue, (iii) patient 2 normal tissue, (iv) patient 2 cancer tissue, (v) recombinant His-tagged ubiquitin, and (vi) patient 2 cancer tissue, mouse IgG control. Arrow indicates the mass of monomeric ubiquitin, m/z 8558. N=normal tissue; C=cancer tissue. (D) For S100P, four BC and corresponding AT extracts were analysed by immunoblotting, indicating relative upregulation of S100P in some breast cancer patients. β-Actin is shown as a loading control. (E) Densitometric analysis of S100P Western blots of 8 sample pairs. Box plot shows median and upper and lower quartiles; lines show maximum and minimum values. P=0.012, Wilcoxon signed-rank test. (F) Mass spectrometry spectra of proteins bound to immobilised rabbit anti-S100P antibody. Samples were (i) patient 3 normal tissue, (ii) patient 3 cancer tissue, (iii) patient 4 normal tissue, (iv) patient 4 cancer tissue, (v) recombinant His-tagged S100P, and (vi) patient 4 cancer tissue, rabbit IgG control. Arrow indicates the mass of the S100P form of m/z 9226. N=normal tissue; C=cancer tissue.

Similarly, the expression of S100P was also examined by western blot in eight random sets of BC and AT samples. Figure 2D shows the western blot data for four pairs, indicating variable levels of this protein between patients, with upregulation in BC samples. When quantitated and analysed for all eight sample pairs, the increase in immunoreactive S100P in BC was significant (Figure 2E, P=0.012, Wilcoxon signed-rank test). By immunocapture using the same S100P antibody immobilised on RS100 protein chips, an apparently truncated form (m/z 9226) of S100P protein was observed, similar to that found in the discovery programme using CM10 cation-exchange chips. This peak was more abundant in BC samples (Figure 2Fii and iv) than in the corresponding AT samples (Figure 2Fi and iii), and absent when the capture antibody was nonimmune IgG (Figure 2Fvi). Figure 2Fv shows His-tagged recombinant S100P (12.6 kDa) as a control.

To further confirm the identity of the 9.22 kDa protein as a short form of S100P associated with breast cancer, we also isolated this protein from cell lysates prepared from MCF-7 breast cancer cells. As shown in Supplementary Figures S2A–C, this protein could be immunoprecipitated from MCF-7 lysates using three different S100P antibodies (rabbit monoclonal, mouse polyclonal, and rabbit polyclonal). Together with the S100P sequence data (Supplementary Figure S1), this unequivocally confirms its relationship to S100P. Also visible in the immunoprecipitates was a smaller peak of 10.48 kDa, presumably representing full-length S100P. The 9.22 kDa form could be separated from the full-length protein by further purification on reverse-phase HPLC (Supplementary Figure S2D).

Association of two protein biomarkers and their combination with prognostic variables

To investigate the potential prognostic value of ubiquitin and S100P separately and in combination in breast cancer, we initially examined the association of each protein with variables including tumour stage, nodal stage, histologic type and grade, hormone receptor (ER and PR) and HER2 status, and LVI. As shown in Table 3, significant positive associations were seen between expression of the short form of S100P and tumour size, higher grade, LVI, lymph node involvement, hormone receptor positive status, and HER2 overexpression, whereas for ubiquitin a significant association was only seen with tumour size, grade, and HER2. When analysed together (Table 3), the combined panel was significantly associated with tumour histologic grade, size, and LVI, and also with ER-positive (ER+) and PR-positive (PR+) status and HER2 overexpression (Figure 3).

Table 3 Association of two protein markers and their combination with tumour histopathologic variables
Figure 3
figure 3

Association of the combined panel with histopathologic variables. Higher expression of the combined panel was significantly associated with higher histologic grade (P=0.016) and higher tumour size (P=0.008), and weakly associated with the presence of LVI (P=0.044). The panel was also relatively increased in tumours that were positive for oestrogen receptors (P=0.016), progesterone receptors (P=0.022), and HER2 overexpression (P=0.009). Box plots show median and upper and lower quartiles; lines show maximum and minimum values.

As levels of the short form of S100P showed stronger associations than ubiquitin with each of the pathological indicators (except for grade), and appeared to point to an ER/PR+, HER2-overexpressing phenotype (possibly corresponding to a ‘HER2-enriched’ molecular subtype; Reis-Filho and Pusztai, 2011), we undertook further analysis of its relationship to these prognostic features. When examined separately for ER− and ER+ tumours, high S100P expression in both groups was equally associated with tumour size and the presence of LVI (not shown). However, the association between S100P and lymph node involvement was only significant for ER− tumours (P=0.010). In contrast, the association between S100P and HER2 overexpression was only significant for ER+ tumours (P=0.004), supporting the concept that a high S100P level might be associated with a hormone receptor-positive, HER2-enriched molecular subtype.

When examined separately for lymph node-negative and lymph node-positive tumours, the positive association between ubiquitin, the short form of S100P, or the combined panel and LVI, ER+ status, and PR+ status was entirely attributable to the lymph node-positive tumours. A significant relationship between the combined panel and HER2 overexpression was also confined to the lymph node-positive tumours (Supplementary Table S1). This subanalysis again points to a link between high expression of the short form of S100P in breast tumours, and an ER/PR+, HER2-overexpressing phenotype that has been associated with markers of poor patient outcome without treatment. However, because sample numbers are low in some subanalyses, these interpretations should be regarded as preliminary.

Discussion

We have used SELDI-TOF MS to discover two proteins that, in combination, show high discrimination between breast cancer and healthy breast tissue samples. A limitation of the protocol was that no microdissection was used, and hence tissue samples could have contained heterogeneous cell types. Despite this technical limitation, a robust panel of two putative breast cancer biomarkers was discovered, and verified on an independent sample set. After purification, the proteins were identified by LC-MS/MS as ubiquitin and a truncated form of the S100-family member, S100P.

To discover tissue biomarkers in various cancers, SELDI-TOF MS has been used previously, although the majority of such studies in breast cancer have examined serum rather than tumour tissue. Included among proteins previously identified from breast tumour tissue lysates are albumin fragments (Gast et al, 2009) and complement C3a (Zhang et al, 2012), both presumably derived from the circulation. Tissue proteomic profiling using SELDI-TOF MS has also yielded peak clusters that can contribute to the classification of breast tumours into molecular subtypes (Brozkova et al, 2008; Goncalves et al, 2008) that resemble the luminal A and B, basal, and HER2-like subtypes defined by gene expression analysis (Reis-Filho and Pusztai, 2011).

Of the two breast cancer-associated proteins identified in this study, ubiquitin is a small protein of 76 amino acids that is involved in both apoptotic signalling (Vucic et al, 2011) and transcriptional regulation (Hammond-Martel et al, 2011). Although monomeric ubiquitin has been identified in several previous biomarker studies in breast cancer, its exact relationship to disease status is unclear. In a SELDI-TOF MS study of breast cancer cell lines, we previously discovered ubiquitin as a strongly downregulated protein following treatment with chemotherapeutic drugs (Leong et al, 2007). Another SELDI analysis found the combination of a high ubiquitin level and low ferritin light chain level to be a positive prognostic marker in node-negative breast cancer (Ricolleau et al, 2006). In contrast, SELDI was also used to show that a protein of similar mass (not identified as ubiquitin) was a significant predictive factor for axillary lymph node metastasis (Nakagawa et al, 2006). In a MALDI MS analysis of microdissected cells from invasive breast cancer and healthy (reduction mammoplasty) tissue, ubiquitin was one of a cluster of proteins with increased expression in the cancer tissue (Sanders et al, 2008).

Several E3 ubiquitin ligases are regarded as tumour suppressors in breast cancer and are either mutated or downregulated; in contrast, some others are regarded as oncogenes and are overexpressed (Chen et al, 2006). Among key downregulated or mutated E3 ligases are BRCA1 and Siah1, involved in DNA repair and transcriptional regulation, among other functions. The E3 ligases downregulated in cancer are involved in both monoubiquitination (Hahn et al, 2012) and polyubiquitination (Wen et al, 2010), and low expression of the E3 ligase Siah1 is associated with poorer disease-free survival in women with breast cancer (Confalonieri et al, 2009). It may be speculated that the increased level of monomeric ubiquitin that we observed associated with larger tumours, higher grade, and HER2 overexpression, but not with other pathological markers (Table 3), reflects a decrease in the activity of some key ubiquitin ligase complexes. Interestingly, a component of the Siah1 ubiquitination complex, calcyclin-binding protein/Siah1-interacting protein (CacyBP/SIP), has increased expression in breast cancer tissue compared with adjacent unaffected breast tissue, and is associated with markers of poor prognosis (Wang et al, 2010). CacyBP/SIP is a documented binding partner of S100P (Filipek et al, 2002), raising the possibility that the disruption of ubiquitination pathways in breast cancer might be involved in the increased levels of both of the cancer-related biomarkers discovered in our study.

In contrast to the relatively weak associations observed between elevated ubiquitin levels and tumour size, higher grade, and HER2 overexpression, a high level of the novel short form of S100P was positively associated with larger tumours, higher grade, LVI, lymph node involvement, ER/PR positivity, and HER2 overexpression. Of the two identified biomarkers, S100P made the stronger contribution towards the association of the combined panel towards each of these pathological features apart from tumour grade. As the association between S100P and HER2 overexpression was exclusive to the ER+ tumours (P=0.004), and absent in the ER− subgroup, a high tissue S100P level may point to a group of tumours with high ER/PR+ status, HER2 overexpression, and – given the association with size, grade, and LVI – relatively poor outcome, although our study did not include actual outcome variables. This corresponds most closely to the ‘HER-enriched’ breast cancer subtype (Slamon et al, 1987; Reis-Filho and Pusztai, 2011), and suggests that S100P might have potential, both in the classification of breast cancer and possibly as a target for therapy.

S100P is a member of the calcium-binding S100 protein family that contain a characteristic structural domain known as the EF hand motif (Marenholz et al, 2004). There are at least 24 homologous S100 proteins with similar subcellular localisation, but differing in expression pattern and function (Marenholz et al, 2004). The S100 proteins are low-molecular-weight (10–12 kDa) acidic proteins that exist as intracellular or secreted homo- or hetero-dimers with composition depending on the abundance of individual family members and the cellular context (Santamaria-Kisiel et al, 2006). Although the factors that regulate S100P have not been studied extensively, DNA microarray studies have included S100P among panels of genes upregulated by oestradiol (Terasaka et al, 2004), progesterone (Bray et al, 2005), and HER2 overexpression (Mackay et al, 2003). These preliminary gene expression reports are consistent with the clinical associations we observed between high S100P levels and ER/PR+ and HER2-overexpressing tumours.

Through its effects on tumour growth and metastasis, S100P has been associated with the progression of several types of cancer including pancreatic, prostate, colorectal, and breast (Lam et al, 2010; Jiang et al, 2011). At least some of its effects have been shown to be mediated through extracellular interaction with RAGE (receptor for activated glycation end products) (Arumugam et al, 2004). Several studies of pancreatic cancer-related molecular profiles have identified S100P as a significantly elevated gene (Crnogorac-Jurcevic et al, 2003; Logsdon et al, 2003) whose upregulation is an early event in the development of pancreatic cancer (Whiteman et al, 2007). In breast cancer, S100P was linked to immortalisation of breast epithelial cells in vitro and both tumour progression (Guerreiro Da Silva et al, 2000; Schor et al, 2006) and early relapse (Barraclough et al, 2010) in patients. Survival of breast cancer patients with S100P-positive carcinomas was significantly worse than those negative for S100P (Wang et al, 2006; Barraclough et al, 2009). S100P was also prominent among genes overexpressed in primary breast cancer cells from high-grade tumours (Dairkee et al, 2009). In contrast, gastric cancers that stain positive for S100P are associated with a better patient outcome than those that are negative for S100P (Jia et al, 2009).

The S100P form detected in our study by MS on cation-exchange chips, and confirmed by MS after selective binding to immobilised S100P antibody, appeared at a m/z value of 9226. This contrasts with the expected size of mature S100P that contains 95 amino acids and has a molecular mass of 10.4 kDa, suggesting that the observed S100P species detected by MS is a previously unreported truncated form of this protein. An amino-terminally truncated form of S100P, termed migration-inducing gene 9 protein or MIG9, has been reported in GenBank (Protein Accession No. AAS00487.1), described as an alternatively spliced product. The predicted protein is identical to S100P[8–95] except for an isoleucine to methionine substitution at S100P residue 12 (MIG9 residue 5), and has a predicted molecular mass of 9.64 kDa. If the true translation start site is methionine-5, the predicted molecular mass would be 9.21 kDa and could explain our observed peak on SELDI-TOF MS. Importantly, it is unlikely that the many immunohistochemical studies that have measured S100P distribution in patient tissues could distinguish between S100P and these truncated forms. Mass spectrometry would be the optimal method for this identification. We have therefore identified for the first time a novel isoform of S100P that is associated with pathologic markers in breast cancer.

In conclusion, this study has discovered two protein biomarkers, ubiquitin and S100P – the latter as a novel truncated isoform – that, in combination, provide high discrimination between breast cancer tissue and healthy breast tissue. Correlation with clinical pathologic variables demonstrated that high values for the two-protein panel were associated with high histologic grade and tumour size, presence of LVI, ER- and PR-positive status, and HER2 overexpression. We propose that this independently validated protein biomarker panel may indicate a HER2-enriched breast cancer subtype with poor prognosis, and that measurement of S100P, in particular, may be valuable both in the classification of breast cancer and as a possible target for treatment.