Main

Endocrine therapy for hormone receptor-positive breast cancers is a standard treatment that is proven to improve patient outcome.1, 2, 3 Currently, both estrogen receptor and progesterone receptors are routinely analyzed using immunohistochemical techniques on formalin-fixed paraffin-embedded specimens to determine patient prognosis and management.1, 2, 3, 4 Patient management decisions are predominantly made on the results of the estrogen receptor assay; the role of progesterone receptor is not clearly established according to ASCO/CAP guidelines4 despite many groups showing the prognostic and predictive value of progesterone receptor as a breast cancer biomarker.5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19

Immunohistochemical assays slowly replaced the ligand-binding assay for hormone receptor determination between the mid-90s and early 2000s. Following the adoption of immunohistochemical techniques was the development of automated immunostainers and adoption of multiple different progesterone receptor antibody clones. Most early assays were laboratory developed, and although some laboratories still use in-house developed assays, many clinical laboratories have opted to use platform-specific, ready-to-use assays developed by the major manufacturers of automated immunohistochemistry stainers: DakoCytomation (Dako), Leica Microsystems (Leica) and Ventana Medical Systems (Ventana). Ready-to-use assays minimize variability, and improve reproducibility and consistency.20, 21, 22, 23, 24, 25, 26, 27, 28 We have previously compared these platform-specific ready-to-use estrogen receptor assays,29 but a direct comparison of the progesterone receptor assays on a single clinical outcome series has never been performed.

Here we present a systematic comparison of three vendor-specific progesterone receptor ready-to-use immunohistochemical assays using a retrospective, tamoxifen-treated breast cancer cohort to: (1) assess observer agreement and concordance; (2) explore the prognostic value of the progesterone receptor ligand-binding assay and immunohistochemical ready-to-use assays; and (3) explore the standard measures of test performance using (a) ligand-binding, or (b) 5-year disease-free survival, as the reference standards.

Materials and methods

Patient Cohort

The Calgary Tamoxifen Breast Cancer Cohort (Calgary cohort) is a retrospective database containing demographic, clinical and pathological data for breast cancer patients diagnosed between 1985 and 2000 at the Tom Baker Cancer Centre in Calgary, Alberta, Canada and has been previously described.29 Briefly, a total of 532 primary tumors were identified that received surgical intervention, adjuvant tamoxifen endocrine therapy without primary or adjuvant chemotherapy, had no prior cancer diagnosis (except non-melanoma skin cancer) and had available archival formalin-fixed paraffin-embedded tissue with confirmed invasive cancer. Archival tissue was reviewed, and pathologist confirmed invasive tumor tissue was placed into tissue microarrays.30 Adjuvant tamoxifen treatment was offered regardless of hormone receptor status in this cohort as there was no definitive evidence at the time these patients were diagnosed that receptor-negative patients did not respond to tamoxifen.

Clinical Assessment of Hormone Receptor Status

At the time of diagnosis for patients in this cohort (1985–2000), hormone status was predominantly assessed using dextran-coated charcoal (ligand)-binding assay.31 Tumors were considered to be receptor positive if the ligand-binding assay results were >10 fmol/ml. Cases that did not have ligand-binding data available were assessed by early immunohistochemical methods. Human epidermal growth factor 2 (HER2) was not assessed clinically during this time period, and was retrospectively assessed using the Dako HercepTest pharmDx kit, as previously described.29

Immunohistochemistry

Detailed information regarding immunohistochemical staining for estrogen and progesterone receptor has been previously described for this cohort.29 Briefly, all RTU assays were performed on corresponding vendor-specific autostainer systems according to the manufacturer’s instructions. Cell line controls from the Dako ER/PR pharmDx kit and HercepTest pharmDx kit were run in addition to a laboratory built reference tissue microarrays. All tissue microarrays were reviewed and showed consistent staining, suggesting that the antibody was uniformly applied during the staining process.

Immunohistochemistry Scoring

HER2, estrogen and progesterone receptor were manually scored following the ASCO/CAP guidelines.4, 32 Specifically, HER2 was scored as either 0, 1+, 2+ or 3+, and tumors were considered positive if the HER2 average of the replicate cores for each case were >2. Estrogen and progesterone receptor status was assessed following the Allred scoring method,33 and tumors were considered positive if they had an overall Allred score of 3 or higher in at least one of the replicate cores. Scoring was performed as a consensus between two highly trained researchers (observer 1), and two expert pathologists (observers 2 and 3). Slides were rescored by observers 1 and 3 three months after initial review to assess intra-observer reproducibility.

Statistical Analysis

All statistical analyses were performed using Stata 12 (StataCorp LP). The kappa statistic was used to measure inter- and intra-observer agreement, and inter-platform comparisons.34 The event under study was 5-year disease-free survival, defined as time from diagnosis to local recurrence, metastatic disease or death from breast cancer. Kaplan–Meier curves were analyzed using the log-rank test at 5-year disease-free survival and Cox proportional hazard regression was performed to estimate hazard ratios, adjusting for lymph node status, tumor grade, tumor size and HER2 status. All survival analyses were performed in estrogen receptor-positive cases, as determined by the corresponding vendor estrogen receptor assay results. Subjects were excluded from the multivariate models if there were missing data for any of these variables. The proportional hazard assumptions were tested by assessing log–log survival curves as well as the goodness-of-fit using the Schoenfeld residuals test. The ligand-binding assay results were used as the reference standard for calculating measures of test performance, and additional calculations were performed, comparing ligand-binding and the immunohistochemical assays, with 5-year disease-free survival as the reference standard.35

Results

Agreement

Inter-observer agreement was evaluated for all three platforms by comparing observer interpretation of each core for progesterone receptor staining between three observers, and was measured using the kappa statistic. Inter-observer interpretations on the Ventana platform consistently showed the strongest agreement for progesterone receptor, with κ=0.94 between observers 1 and 2, κ=0.78 between observers 1 and 3, and κ=0.84 between observers 2 and 3. Inter-observer agreement for progesterone receptor interpretation on the Leica platform showed substantial to almost perfect agreement with κ=0.89 between observers 1 and 2, κ=0.70 between observers 1 and 3, and κ=0.79 between observers 2 and 3. Kappa values for interpretation with the Dako platform also showed substantial to almost perfect agreement, with κ=0.90 between observers 1 and 2, κ=0.69 between observers 1 and 3, and κ=0.79 between observers 2 and 3 for progesterone receptor.

The stained slides were rescored 3 months after initial review by observers 1 and 3, and intra-observer agreement was calculated. For observer 1, all three platforms had almost perfect agreement (Dako and Ventana, κ=0.98; Leica, κ=0.94). For observer 3, all platforms also had almost perfect agreement (Dako, κ=0.93; Leica, κ=0.92; Ventana, κ=0.84).

Concordance

Progesterone receptor status for the three ready-to-use assays was compared as a core by core analysis, and agreement was again assessed using the kappa statistic. Substantial agreement for progesterone receptor was seen between the cores stained with the Dako and Ventana assays, κ=0.78 with 49 discordant cases (n=23, Dako negative/Ventana positive; n=26 Dako positive/Ventana negative). Substantial agreement for progesterone receptor was also seen for Dako and Leica assays, κ=0.81 with 47 discordant cases (n=19 Dako negative/Leica positive; n=28 Dako positive/Leica negative). Agreement between Leica and Ventana was similar, κ=0.82, with 44 discordant cases (n=23 Leica negative/Ventana positive; n=21 Leica positive/Ventana negative). Example images of discordance with progesterone receptor staining are presented in Figure 1.

Figure 1
figure 1

Example images of progesterone receptor assay discordance between vendor ready-to-use assays for Dako (clone PgR 1294), Leica (clones 16 and SAN27) and Ventana (clone 1E2).

Univariate Analysis of Progesterone Receptor

Cases were dichotomized into positive and negative as previously described,4, 31 and Kaplan–Meier survival curves for 5-year disease-free survival were analyzed for progesterone receptor in the estrogen receptor-positive cases (Figure 2). The log-rank test was used to compare positive and negative groups and hazard rations were calculated to compare relative survival. The ligand binding, as well as the Dako and Ventana assays, achieved significance with the log-rank test (ligand-binding assay Figure 2a, P=0.049; Dako Figure 2b, P=0.043; Ventana Figure 2d; P=0.033), whereas the Leica assay failed to show significance (Figure 2c, P=0.359). Univariate proportional hazard analysis showed significance for the Dako and Ventana assays (P=0.046 and P=0.036, respectively), and near significance for the ligand-binding assay (P=0.051); the Leica assay failed to reach significance (P=0.361). The corresponding hazard ratios and 95% confidence intervals for all assays are presented in Table 1.

Figure 2
figure 2

Univariate analysis, including log-rank P-values, of progesterone receptor negative (PR-) vs positive (PR+) in estrogen receptor positive cases: (a) ligand-binding assay (LBA); (b) Dako; (c) Leica; and (d) Ventana. Total sample size for univariate analysis of each assay is indicated in the upper right corner.

Table 1 Univariate Cox proportional hazard models for the LBA and Dako, Leica and Ventana ready-to-use immunohistochemical assays

Multivariate Analysis of Progesterone Receptor

Cox proportional hazard models were analyzed for the effect of progesterone receptor, as determined by the ligand-binding and immunohistochemical assays for each platform, and are presented in Table 2. All analyses were adjusted for lymph node status, tumor grade, tumor size and HER2 status. None of the assays achieved significance in the multivariate models (Table 2); however, the Ventana assay did show a trend toward significance (P=0.090).

Table 2 Multivariate Cox proportional hazard models for the LBA and Dako, Leica and Ventana ready-to-use immunohistochemical assays

Measures of Test Performance

Sensitivity, specificity, positive predictive value, negative predictive value and accuracy were calculated for all platforms, using the ligand-binding assay results as the reference standard (Table 3). All ready-to-use assays performed comparably to the ligand-binding assay. Remarkably sensitivity was strong for all assays (>96%), and positive predictive value, negative predictive value and accuracy were similar, between 72 and 77%, for all immunohistochemical assays, whereas specificity was consistently low (23–25%).

Table 3 Measures of test performance comparing progesterone receptor ready-to-use immunohistochemical assays to ligand-binding assay results

Measures of test performance were also calculated for the ligand-binding and ready-to-use immunohistochemical assays with 5-year disease-free survival used as the reference standard (Table 4). All three immunohistochemical assays performed similar to the ligand-binding assay for sensitivity, specificity, positive predictive value and accuracy. Interestingly, all measures of test performance for the immunohistochemical assays assays greatly improved, and moreover, outperformed the ligand-binding assay in respect to sensitivity (92–93% vs 73%) and accuracy (77–79% vs 68%). Positive predictive and negative predictive values were similar across all assays, whereas the ligand-binding assay showed the highest specificity (39% vs 11–15% for the immunohistochemical assays).

Table 4 Measure of test performance comparing LBA and ready-to-use immunohistochemical assays to 5-year disease-free survival

Discussion

This study presents a systematic comparison of three ready-to-use progesterone receptor immunohistochemical assays using a retrospective, tamoxifen-treated, breast cancer cohort. Utilizing a ready-to-use assay along with an automated staining platform for immunohistochemical analysis of progesterone receptor offers increased reproducibility and standardization, minimizing potential analytical errors that may lead to incorrect test results.20, 24, 36 We have previously reported on our comparison of the ready-to-use estrogen receptor assays with this cohort,29 and many of the points discussed are valid with the progesterone receptor evaluation presented here. Specifically, ready-to-use antibodies are titrated by the vendor to ensure optimal and consistent results, negating the need for laboratory personnel to perform rigorous titrations, and minimizing potential inter- and intra-laboratory variability (lot to lot evaluations should still be performed). In addition, the improved consistency provided by these assays allows for increased reproducibility in reporting the results between observers. Taken together, this leads to improved performance and reliability, which is particularly important for a clinically utilized assay.

We evaluated platform concordance by a core-to-core comparison, and noted that concordance was similar between platforms (κ=0.78–0.82). Similarly, all ready-to-use assays had similar rates of progesterone receptor negativity (Dako 12%; Leica 13%; Ventana 11%). As we previously noted in our estrogen receptor comparison, counterstaining varied between platforms (Figure 1). As we followed the manufacturer's recommended protocols, we did not augment this step to provide counterstain consistency across the platforms, and this may have contributed to some of the discordance seen. In addition, all vendors utilized different antibodies for their ready-to-use assays, each potentially recognizing different epitopes. Specifically, Dako utilized antibody clone progesterone receptor 1294 (recognizes A and B isoforms, binding to N-terminal amino acids 165–534), Leica was a progesterone receptor cocktail including clone 16 (recognizes A isoform, binding to N-terminal, exact epitope unknown) and clone SAN27 (recognizes B isoform, binding to 164 amino-acid sequence in N-terminal unique to B isoform), whereas Ventana make use of the progesterone receptor antibody 1E2 clone (recognizes A and B isoforms, specific binding unknown), which is the more likely cause of discordance.37, 38

Our study utilized archival tissue specimens, and pre-analytical variables could not be controlled and thus, may be another potential source for some of the observed discordance. It has previously been shown that pre-analytical variation can negatively affect immunohistochemical biomarkers, specifically in breast cancer;25, 26, 39 however, a recent publication has shown that the effects of delayed fixation is consistent among the progesterone receptor clones evaluated in this study,38 suggesting that all clones would be equally affected by unknown pre-analytical variables and an unlikely source for the discordance observed in our study.

The prognostic and/or predictive significance of progesterone receptor in breast cancer patients is still debated;4, 40 however, there is sufficient evidence to suggest it has prognostic value5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 and we subsequently investigated all progesterone receptor assays with regards to patient survival in estrogen receptor-positive cases, as determined by the corresponding vendor-specific estrogen receptor assay result. This exploration yielded interesting variations between the assays. Univariate analysis of the ligand-binding, Dako and Ventana assays all showed significance, or near significance (Figure 2 and Table 1), whereas the Leica assay did not show significance (HR=0.71 (95% confidence interval: 0.34–1.48), P=0.361). Even more intriguing, only the Ventana progesterone receptor assay neared significance in the multivariate Cox model that adjusted for lymph node status, tumor grade, tumor size and HER2 status (HR=0.50 (95% CI: 0.22–1.12), P=0.090) for 5-year disease-free survival. Unlike our evaluation of the different estrogen receptor assays where we reported functional equivalency with the univariate and multivariate models,29 these results suggest the Ventana progesterone receptor assay utilizing the 1E2 clone may be superior to the other assays and clones investigated. These findings may partially, or wholly, explain discordance in the role of progesterone receptor in the literature.

We also assessed standard measures of test performance—sensitivity, specificity, positive predictive value, negative predictive value and accuracy. All assays performed similarly in all measures when the ligand-binding assay was set as the reference standard (Table 3). Of note, specificity was low (23–25%), and sensitivity was the strongest measure (>96%), for all immunohistochemical assays. Interestingly, when 5-year disease-free survival was utilized as the reference standard, the immunohistochemical assays outperformed the ligand-binding assay in sensitivity and accuracy (92–93% vs 73%, and 77–79% vs 68%, respectively). However, the ligand-binding assay showed the strongest specificity at 39% vs 11–15% for the immunohistochemical assays. These results support current immunohistochemical methods for assessing progesterone receptor are superior to the ligand-binding assay.

Our previous publication suggested functional equivalence of the estrogen receptor ready-to-use assays;29 however, the surprising differences in 5-year disease-free survival with the progesterone receptor immunohistochemical assays (Tables 1 and 2; Figure 2) led us to subsequently question if the progesterone receptor results would alter if estrogen receptor positivity was selected by an alternate vendor assay. To investigate this further, we evaluated the univariate and multivariate survival analyses looking at all combinations of estrogen and progesterone receptor. This exploration led to more surprising results, suggesting that our previous conclusion of functional equivalence with the estrogen receptor ready-to-use immunohistochemical assays may be misleading when it is not utilized in conjunction with progesterone receptor. The Kaplan–Meier curves for all possible combinations are presented in Figure 3, and corresponding univariate proportional hazard estimates are presented in Table 5. Univariate analyses suggests that estrogen receptor measured by the Dako ready-to-use assay yields the strongest prognostic group for the Ventana progesterone receptor (P=0.023). Moreover, the Leica progesterone receptor assay that previously showed no significance now has achieved statistical significance when the Ventana estrogen receptor results were used to define the estrogen receptor-positive group. However, when the estrogen and progesterone receptor combinations were looked at in the multivariate model (Table 6), only the Ventana progesterone receptor was able to maintain significance when adjusting for lymph node status, tumor grade, tumor size and HER2 status. Even more intriguing was that statistical significance was only achieved with the combinations of Leica estrogen receptor positivity with Ventana progesterone receptor (P=0.026) and Dako estrogen receptor positivity with Ventana progesterone receptor results (P=0.037) (Table 6), suggesting that the estrogen receptor ready-to-use immunohistochemical assays may not be functionally equivalent, as we previously suggested, when looked at in combination with progesterone receptor. As previously mentioned, these findings may partially explain some of the discordance seen with the prognostic/predictive value of progesterone receptor seen in the literature. Further investigation to validate our findings may finally clarify progesterone receptor utility and value in breast cancer management.

Figure 3
figure 3

Univariate analysis, including log-rank P-values, of progesterone receptor negative (PR-) vs positive (PR+) in estrogen receptor positive cases, as defined by each vendor ready-to-use assay. Total sample size for univariate analysis of each assay is indicated in the upper right corner.

Table 5 Univariate Cox proportional hazard models for the progesterone receptor Dako, Leica and Ventana immunohistochemical assays with estrogen receptor positivity for each matching vendor assay
Table 6 Multivariate Cox proportional hazard models for the progesterone receptor Dako, Leica and Ventana immunohistochemical assays with estrogen receptor positivity for each vendor

In conclusion, despite similar agreement and concordance seen between the vendor-specific progesterone receptor assays, clear differences were noted with regards to 5-year disease-free survival, suggesting that relying on agreement and concordance for the utility of alternate assays may lead to important clinically relevant information being lost. Further investigation into the prognostic, and potentially predictive, value of specific progesterone receptor antibodies is warranted, and should be considered in combination with different estrogen receptor antibody clones.