Comparative survival analysis of multiparametric tests—when molecular tests disagree—A TEAM Pathology study

Bartlett, John M. S.; Bayani, Jane; Kornaga, Elizabeth; Xu, Keying; Pond, Greg R.; Piper, Tammy; Mallon, Elizabeth; Yao, Cindy Q.; Boutros, Paul C.; Hasenburg, Annette; Dunn, J. A.; Markopoulos, Christos; Dirix, Luc; Seynaeve, Caroline; van de Velde, Cornelis J. H.; Stein, Robert C.; Rea, Daniel

doi:10.1038/s41523-021-00297-7

Download PDF

Article
Open access
Published: 08 July 2021

Comparative survival analysis of multiparametric tests—when molecular tests disagree—A TEAM Pathology study

John M. S. Bartlett ORCID: orcid.org/0000-0002-0347-3888^1,2,3^na1,
Jane Bayani¹^na1,
Elizabeth Kornaga^1,4^na1,
Keying Xu¹,
Greg R. Pond⁵,
Tammy Piper³,
Elizabeth Mallon⁶,
Cindy Q. Yao⁷,
Paul C. Boutros ORCID: orcid.org/0000-0003-0553-7520^7,8,9,10,
Annette Hasenburg¹¹,
J. A. Dunn¹²,
Christos Markopoulos¹³,
Luc Dirix¹⁴,
Caroline Seynaeve¹⁵,
Cornelis J. H. van de Velde¹⁶,
Robert C. Stein ORCID: orcid.org/0000-0003-2969-0415¹⁷ &
…
Daniel Rea¹⁸

npj Breast Cancer volume 7, Article number: 90 (2021) Cite this article

2705 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

Multiparametric assays for risk stratification are widely used in the management of both node negative and node positive hormone receptor positive invasive breast cancer. Recent data from multiple sources suggests that different tests may provide different risk estimates at the individual patient level. The TEAM pathology study consists of 3284 postmenopausal ER+ve breast cancers treated with endocrine therapy Using genes comprising the following multi-parametric tests OncotypeDx^®, Prosigna™ and MammaPrint^® signatures were trained to recapitulate true assay results. Patients were then classified into risk groups and survival assessed. Whilst likelihood χ² ratios suggested limited value for combining tests, Kaplan–Meier and LogRank tests within risk groups suggested combinations of tests provided statistically significant stratification of potential clinical value. Paradoxically whilst Prosigna-trained results stratified Oncotype-trained subgroups across low and intermediate risk categories, only intermediate risk Prosigna-trained cases were further stratified by Oncotype-trained results. Both Oncotype-trained and Prosigna-trained results further stratified MammaPrint-trained low risk cases, and MammaPrint-trained results also stratified Oncotype-trained low and intermediate risk groups but not Prosigna-trained results. Comparisons between existing multiparametric tests are challenging, and evidence on discordance between tests in risk stratification presents further dilemmas. Detailed analysis of the TEAM pathology study suggests a complex inter-relationship between test results in the same patient cohorts which requires careful evaluation regarding test utility. Further prognostic improvement appears both desirable and achievable.

Clinically high-risk breast cancer displays markedly discordant molecular risk predictions between the MammaPrint and EndoPredict tests

Article Open access 27 April 2020

The incidence of discordant clinical and genomic risk in patients with invasive lobular or ductal carcinoma of the breast: a National Cancer Database Study

Article Open access 21 December 2021

Concordance between results of inexpensive statistical models and multigene signatures in patients with ER+/HER2− early breast cancer

Article 08 February 2021

Introduction

Multi-parametric molecular tests are central to the treatment management of early breast cancer and their use is incorporated into most major guidelines¹ as a pre-requisite for the staging of breast cancer patients, to direct prognostication and to select patients for chemotherapy treatment^2,3. Two major challenges related to their use need to be addressed. Firstly, reports highlighting disagreements between tests are disquieting for physicians, health care providers, and patients alike⁴ since they raise the question “have I recommended/received the right test?” Secondly, the lack of consistency at an individual patient level between different tests suggests additional prognostic information may result from novel tests. Recent results from the MINDACT and TAILORx studies validate the utility of tests to direct chemotherapy use in node-negative patients^2,5,6, which may be extended as new evidence emerges from retrospective³ or prospective studies^7,8. In this context an error in assigning appropriate risk classifications would have significant impact on patient treatment and outcomes. Additionally, given recent evidence documenting the long-term risk of relapse for ER+ve breast cancer and the increasing use of extended endocrine therapy⁹ the selection of the appropriate test to detect recurrence risk over extended time periods is also critical.

Reports of disagreements between tests, based on in silico analyses of existing expression array data, were frequently attributed to methodological challenges and incomplete gene coverage^{10,11,12,13,14}. However, recently direct comparisons, where tests were performed exactly to vendor protocols, demonstrate marked disagreement in risk categorization and subtyping of individual tumors between widely used multiparameter assays⁴. Furthermore, comparisons between tests in clinical trials derived cohorts provide consistent evidence that combining test results generally improves prognostic value^15,16. These results may reflect the relatively modest performance of individual multiparametric tests¹⁷.

To date, no direct comparison between different multiparameter assays in a large patient cohort with associated follow-up provides robust information on the impact of discrepant test results for patients. We developed a method to compare signatures using a combined quantitative mRNA array covering key molecular signatures¹⁷, trained against the results of the same signatures measured by original methodology¹⁸. We analyzed >3000 samples from the TEAM pathology cohort¹⁹ using “trained” signatures to demonstrate the impact of disagreements between tests on patient outcome in the context of a recent clinical trial cohort.

Results

Comparing signature-trained risk scores—Likelihood ratios

We compared the ability of trained signatures to predict DMFS10 using the likelihood ratio χ²(LRχ²) based on the Cox models as a measure of the overall prognostic information provided by each model. We illustrated the performance of each “trained” test using Kaplan–Meier survival curves and estimated Hazard ratios as described above (see Fig. 1). We calculated the change in LRχ² values(ΔLRχ²) between the reclassified and single signature models to assess prognostic improvement of reclassification with a second signature versus the single signature using existing trinary and binary (Table 1) cut points as outlined above.

**Fig. 1: Test performance in ER+ve, HER2-ve breast cancer from the TEAM cohort.**

Table 1 Likelihood χ² ratios by test and cohort.

Full size table

In ER+/HER2− cases (n = 3284), the Prosigna-trained signature provided greater prognostic information compared to Oncotype-trained and MammaPrint-trained signatures(LRχ² = 146.9 vs. 118.0 and 119.5, respectively; Table 1). In bivariate models (combining 2 tests) the greatest LRχ² was observed with Oncotype-trained and Prosigna-trained results (Table 1). Comparing bivariate and univariate results combining Oncotype-trained and Prosigna-trained results increased the LRχ² to a far greater extent versus Oncotype-trained results (ΔLRχ² = 60.0) than versus Prosigna-trained (ΔLRχ² = 31.0) results. Similarly, when combining tests with Mammaprint-trained results adding Prosigna-trained results showed a greater increase in LRχ² (ΔLRχ² = 49.3) than did combining Mammaprint-trained results with Oncotype-trained results (ΔLRχ² = 26.3). Adding Mammaprint-trained results to either Oncotype-trained or Prosigna-trained results to, versus either test produced the smallest improvements in the LRχ² (Table 1). Nonetheless, all test combinations outperformed single tests to a highly statistically significant degree (p < 0.0001; Table 1).

When test results for Oncotype-trained and Prosigna-trained results were dichotomized, there were less marked differences in univariate models between these tests and Mammaprint-trianed results (Table 1). Again the largest increase in LRχ² was observed when comparing combined Oncotype-trained and Prosigna-trained classification versus Oncotype-trained alone. All other bivariate models outperformed univariate models to a lesser, but still statistically significant, degree (p < 0.0001; Table 1).

Analysis of test performance by outcome in reclassified patients

We analyzed agreement between tests by investigating the extent to which re-classifying results for individual patients by performing tests in sequence affected predicted outcome. Example, we estimated the effects of performing a Prosigna-trained test on tumors previously classified as intermediate risk by the Oncotype-trained test.

Entire ER+ve/HER2−ve population

Oncotype-trained

Of 3284 ER+ve/HER2−ve breast cancers with results for the Oncotype-trained risk classification, 48.9% were classified low risk (DMFS10 = 87.9%), 35.8% intermediate risk (DMFS10 = 78.6%) and 15.3% high risk (DMFS10 = 67.5%) (Table 2; Figs. 1a, 2).

Table 2 Oncotype-trained results stratified by other test results, trinary classification.

Full size table

**Fig. 2: Forest plot of Oncotype-trained test results re-stratified by other tests, all ER+ve/HER2−ve cases.**

Oncotype-trained stratified by Prosigna-trained

When Oncotype-trained results were further stratified by Prosigna-trained results a significant proportion (56.5%) of cases changed risk category (Supplementary Table 2). In Oncotype-trained low-risk cases, 279 (17.4%) were re-classified as high risk by Prosigna-trained results and 9 Oncotype-trained high-risk cases (1.8%) were re-classified as low risk by Prosigna-trained results. Oncotype-trained low risk/Prosigna-trained high-risk cases exhibited a significantly reduced DMFS10 (75.4%) relative to cases low risk by both signatures (HR = 3.19; 95%CI 2.12–4.82; p < 0.001; Table 2; Fig. 2). For Oncotype-trained intermediate-risk cases, 174 (14.8%) were classified as Prosigna-trained low risk with a DMFS10 = 91.5% (p < 0.001; Table 2; Fig. 2), and 618 (52.6%) were classified as Prosigna-trained high risk (DMFS10 = 73.3%; Table 2; Fig. 2). Few Oncotype-trained high-risk tumors were low risk by Prosigna-trained scores and no events were observed in these cases.

Oncotype-trained stratified by MammaPrint-trained

124 Oncotype-trained low-risk cases (8%) were high risk by MammaPrint-trained (DMFS10 = 72.1%; Table 2; Fig. 2; p < 0.001). 52 Oncotype-trained high-risk cases (10%) were low risk by MammaPrint-trained (DMFS10 = 70.4%; Table 2; Fig. 2; p = 0.465). Finally 528 (45%) Oncotype-trained intermediate-risk cases were MammaPrint-trained high risk(DMFS10 = 73.2%; Table 2; Fig. 2; p < 0.001).

Prosigna-trained results

Of 3284 ER+ve/HER2−ve cases with results for Prosigna-trained risk available 25.2% were low risk (DMFS10 = 92.1%, 95%CI 89.8–94.0%), 35.2% intermediate risk (DMFS10 = 84.9%, 95%CI 82.3–87.1%) and 39.7% high risk (DMFS10 = 71.4%, 95%CI 68.6–74.1%; Table 3; Figs. 1b, 3).

Table 3 Prosigna-trained results stratified by other test results, trinary classification.

Full size table

**Fig. 3: Forest plot of Prosigna-trained test results re-stratified by other tests, all ER+ve/HER2-ve cases.**

Prosigna-trained results stratified by Oncotype-trained results

In Prosigna-trained low-risk cases there were no significant differences in outcome across Oncotype-trained risk groups, all Prosigna trained low-risk cases experienced DMFS10 > 90% (Table 3; Fig. 3a). Similarly all Prosigna-trained high risk cases experienced a DMFS10 ≤ 80%; those that were also Oncotype-DX-trained high risk experienced significantly poorer outcome (DMFS10 = 65.7% 95%CI 60.4–70.5%, p < 0.001) than low or intermediate risk by Oncotype-trained (Table 3; Fig. 3c). Of 1155 Prosigna-trained intermediate-risk cases, 685 (59%) were classified low risk by the Oncotype-trained test (DMFS10 = 88.5%; p < 0.001), 89 cases (8%) were Oncotype-trained high risk (DMFS10 = 72.6%; p < 0.001, Table 3; Fig. 3b).

Prosigna-trained stratified by MammaPrint-trained

Excluding Prosigna-trained intermediate-risk cases the majority of results (79.7%) remained in the same risk category (Supplementary Table 2). No stratification of Prosigna-trained low-risk cases occurred using MammaPrint-trained results (Table 3; Fig. 3a). All Prosigna-trained high-risk cases had DMFS10 < 80%, 32% were MammaPrint-trained low risk (Table 3; Fig. 3c). For Prosigna-trained intermediate-risk cases 18% were MammaPrint-trained high risk (DMFS10 = 79.4%; p = 0.005; Table 3, Fig. 3b).

MammaPrint-trained

Of 3284 ER+ve/HER2−ve breast cancers with MammaPrint-Trained risk classification, 66.3% were low risk (DMFS10 = 86.9%) and 33.7% high risk (DMFS10 = 70.7%; Table 4, Figs. 1c, 4).

Table 4 Mammaprint-trained results stratified by other test results, trinary classification.

Full size table

**Fig. 4: Forest plot of Mammaprint-trained test results re-stratified by other tests, all ER+ve/HER2-ve cases.**

MammaPrint-trained stratified by Oncotype-trained

Of 2180 MammaPrint-trained low-risk cases, 68% were low risk by Oncotype-trained results (DMFS10 = 89.1%; Table 4; Fig. 4a). Mammaprint-trained low risk Oncotype-trained intermediate-risk cases (30%) exhibited DMFS10 = 83.2% (Table 4, p < 0.001) and Oncotype-trained high-risk cases exhibited DMFS10 = 70.4% (Table 4, p < 0.001; Fig. 4a). In MammaPrint-trained high-risk cases DMFS10 ranged from 73.2–67.3 across Oncotype-trained-subgroups and there were marked differences in outcome across Oncotype-trained categories (Table 4, Fig. 4b).

MammaPrint-Trained results stratified by Prosigna-trained results

In MammaPrint-trained low-risk cases 20% were Prosigna-trained high risk (DMFS10 = 78.1%; Table 4, p < 0.001) and 43% intermediate risk (DMFS10 = 86.1% Table 4; p < 0.001, Fig. 4a). Amongst MammaPrint-trained high-risk cases, only a small (n = 12) subgroup of Mammaprint-trained high, Prosigna trained low results exhibited DMFS10 = 90% (p = 0.006, Fig. 4b).

Sub-group analysis ER+ve/HER2-ve, Node-ve patients not treated with chemotherapy

Oncotype-trained

Of 970 cases in this subgroup, 47.2% were Oncotype-trained low (DMFS10 = 92.5%), 36.0% intermediate (DMFS10 = 86.3%) and 16.8% high risk (DMFS10 = 76.7%, Table 2; Figs. 1d; 5) respectively.

Oncotype-trained results stratified by Prosigna-trained results

When Oncotype-trained results were stratified by Prosigna-trained results, 57.3% changed risk category (Supplementary Table 3). In Oncotype Dx-trained low risk 95 cases (21%) were Prosigna-trained high risk with DMFS10 = 83.8% (p = 0.006, Table 2; Fig. 5). In Oncotype-trained intermediate-risk cases 12% were Prosigna-trained low risk (DMFS10 = 94.1%; Table 2, p = 0.090; Fig. 5). The 57% of Oncotype-trained intermediate-risk cases classified as Prosigna-trained high risk exhibited DMFS10 = 83.7% (Table 2; p = 0.076, Fig. 5). Only three Oncotype-trained high-risk cases were Prosigna-trained low risk no events were observed in these cases.

Oncotype-trained stratified by MammaPrint-trained

11% of Oncotype-trained low-risk cases were MammaPrint-trained high risk (DMFS10 = 80.8%, p = 0.004; Table 2, Fig. 5a). In Oncotype-trained intermediate-risk patients 50% were MammaPrint-trained low risk(DMFS10 = 92.2%, p = 0.002; Table 2, Fig. 5b). In Oncotype Dx-trained high-risk cases 11% were MammaPrint-trained low risk, no events were observed in these 18 cases (Table 2, Fig. 5c). MammaPrint-trained scores identified 37.5% of Oncotype-trained cases (intermediate or high) as low risk (DMFS10 > 90%).

Prosigna-trained stratified by Oncotype-trained

Neither Prosigna-trained low nor moderate risk cases showed statistically significant sub-stratification for outcome by Oncotype-trained risk scores (Table 3, Fig. 6a, b). Within Prosigna-trained high-risk cases 22% were Oncotype-trained low risk, however, DMFS10 for this group was 83.8% (Table 3, Fig. 6c).

**Fig. 6: Forest plot of Prosigna-trained test results re-stratified by other tests, Node-ve ER+ve/HER2-ve cases treated without chemotherapy.**

Prosigna-trained stratified by MammaPrint-trained

No impact of MammaPrint-trained scores was observed in the Prosigna-trained low-risk group (Table 3, Fig. 6a), with only three discordant results. For both moderate and high risk Prosigna-trained results a group of MammaPrint-trained low-risk cases were identified (DMFS10 = 93.1% and 89.6%, respectively, Table 3; Fig. 6b, c).

MammaPrint-trained results

No impact of Oncotype-trained on Mammaprint-trained scores was observed (Fig. 7; Table 4). In Mammaprint trained low-risk cases 22% were categorized as Prosigna-trained high risk, with a modest reduction in DMFS10 = 89.6% (p = 0.027, Table 4).

**Fig. 7: Forest plot of Mammaprint-trained test results re-stratified by other tests, Node-ve ER+ve/HER2−ve cases treated without chemotherapy.**

Discussion

Our analysis of 3284 ER+ve/HER2−ve cases using trained signatures demonstrates that the Prosigna-trained signature provides potentially more prognostic information than either the Oncotype-trained or MammaPrint-trained signatures (Table 1). This result is consistent with results in the smaller TransATAC cohort²⁰ using original vendor methodology.

Critical to our study is the close correlation between the computationally derived “signature trained” scores and true results as shown by us previously¹⁸. For ROR-PT results the correlation coefficient between “trained” and true assay results was 0.93, comparing true to “trained” results showed 90% of cases within the same risk category (low, intermediate, high—see ref. ¹⁸). Similarly for “Oncotype-Dx trained” results the correlation coefficient between true and “trained” results was 0.87 with 75% of results giving the same risk category (see ref. ¹⁸) and only 1% of cases disagreeing by more than 1 risk category. For Mammaprint trained results, which were calculated only as categorical high versus low risk groups, over 90% of cases were classified in the same risk group by “trained” and true results¹⁸. Full details of these results are reported elsewhere¹⁸.

We also show when two trained tests are combined the overall amount of information is always greater than a single test alone. In this study, adding stratification by Prosigna-trained results to Oncotype-trained results provided the greatest LRχ², and the improvement was greater for this combined model versus Oncotype-trained results alone than for Prosigna-trained results alone. Collectively these results suggest that, in this study, Prosigna-trained results, either alone or combined with other test results, provide potentially greater prognostic information. However, most critically, all test combinations (where two tests were used for patient stratification) outperformed models with only one test to a highly statistically significant degree. This both confirms earlier reports²⁰ and suggests that differences between tests reflect quantitative and qualitative differences in the degree of prognostic information collected. This conclusion is supported by recent comparisons by the ATAC group, showing the impact of different signaling modules in ER+ve/HER2−ve cases²¹ across different signatures. The conclusion from this work is that different tests capture different aspects of prognostic drivers and therefore that future improvements in prognostic testing remain achievable.

Critically, we dissected the effect of applying a second test to risk-stratified subgroups defined by the initial result; e.g. we examined the effect of applying the Prosigna-trained signature to the “intermediate risk” group identified by the Oncotype-trained signature etc. When combining tests, Prosigna-trained results added value to both Oncotype-trained and MammaPrint-trained results (Table 1). The improved prognostic impact of Prosigna-trained results applied across all ER+ve/HER2−ve cases after Oncotype-trained results was reflected by Prosigna-trained results sub-stratifying patients across both low and intermediate risk Oncotype trained groups (Fig. 2a, b). Even within the node negative ER+ve/HER2−ve population not treated with chemotherapy (Table 2; Fig. 5a, b) Oncotype-trained low and intermediate-risk groups were also further stratified by Prosigna-trained results and 20.7% of Oncotype-trained low-risk cases were identified as high risk by Prosigna-trained results, with DMFS10 of 83.8%, which is important as results from prospective trials suggest these cases may benefit from chemotherapy^2,6. This difference was more striking when Oncotype-trained results were dichotomized using cut-points applied in the Tailor-X trial. In ER+HER2−ve, node negative patients treated without chemotherapy 17–24% of cases with Oncotype-trained results ≥25 were low risk (DMFS10 > 90%) when stratified by Mammaprint-trained or Prosigna-trained results respectively (Supplementary Table 4; Supplementary Fig. 2). Conversely 18–30% of Oncotype-trained low risk cases (<25) were high risk when stratified by Mammaprint-trained or Prosigna-trained results and exhibited DMFS < 90% (Supplementary Table 4; Supplementary Fig. 2)

Conversely, only in Prosigna-trained intermediate risk cases did Oncotype-trained results provide additional stratification by risk (Fig. 3; Table 3). However this stratification was not observed in the sub-group of node negative cases treated without chemotherapy (Fig. 6). No stratification of Prosigna-trained low or high risk cases was observed using either Oncotype-trained or Mammaprint trained results (Fig. 3; Table 3). When using dichotomized risk scores for Prosigna-trained ER+ve/HER2−ve node-negative cases treated without chemotherapy no further stratification using dichotomized Oncotype-trained results was seen (Supplementary Table 5; Supplementary Fig. 5) and all Prosigna-high risk cases exhibited DMFS10 < 85% regardless of dichotomized Oncotype-trained results (Supplementary Table 5; Supplementary Fig. 5). These results are illustrative of and highlight the potential clinical impact of disagreements between tests at an individual patient level previously demonstrated in the OPTIMA-prelim cohort⁴.

A number of conclusions that can be drawn from our analyses. Firstly that, as with previous analyses²⁰ there is additional prognostic value to be gained from combining multiple molecular tests in the research setting. The corollary is that no single existing assay captures the sum of prognostic information available at the transcriptomic level. This confirms earlier findings²² that improvements in prognostic assays remain possible. Such improvements may, however, require integration of additional molecular features beyond transcriptomics^23,24. Secondly, there was evidence, albeit from sub-group analyses, that the known interaction between clinical risk, treatment, and molecular risk profiling may differ depending on the test chosen. If taken at face value, this might provide support for the use of different testing strategies in different patient risk strata.

Our analysis has some potentially important limitations. In particular we have used a computational approach to generate test scores for the different tests described herein. At an individual tumor level, the trained score may not be identical to the equivalent generated using original methodology. We trained our signatures in an independent cohort using the same signatures measured using original methodology¹⁸, achieving extremely high correlations with commercial test results. Additionally, the broad agreement between our analysis with the(more limited) analysis of Sestak et al. ²⁰ using original methodology and a slightly different statistical approach is highly reassuring.

Additionally, although our cohort is exclusively postmenopausal ER-positive, 30% of cases were treated with adjuvant chemotherapy. All patients in the TEAM trial were postmenopausal, with a median age of 64 years, results presented here may not be representative of the premenopausal population. We included chemotherapy-treated patients to maximize the power of our main analysis. However, the conclusions of our analysis performed on the node-negative subgroup who were not chemotherapy-treated are broadly similar to those in the analysis of the entire cohort, suggesting that these findings are robust both in this clinically critical node negative sub-group and indeed across all patients in the TEAM cohort.

The goal of our study was to provide robust information on the impact of discordant risk classification by different molecular prognostic signatures in postmenopausal, ER+ve early breast cancer. Existing evidence highlights discordance between tests^4,25, which is reiterated here. There is clear evidence that adding clinical information to test results provides additional prognostic information^{15,26,27,28,29}, which is supported by sub-group analyses performed here, and that information provided by any individual assay is relatively modest¹⁷. To date comparisons between tests have been limited either by relatively small sample sizes or by a lack of evidence that signatures extracted from global expression data reflect actual test performance and can therefore inform patients and clinicians on the impact of discordant test results on outcome in the real-world setting. This study provides data on a large clinical trial cohort (the TEAM trial) using test signatures trained in a second cohort (OPTIMA-prelim⁴) to match actual commercial test performance.

In summary, our study provides novel evidence for the potential clinical impact of discordant molecular test results in a large population. Further improvements in test performance are potentially within reach and would be of benefit to patients. Evidence presented here suggests the differences in test performance are more nuanced than previously reported and that careful consideration to test selection, in the context of treatment and clinical risk may be appropriate.

Methods

Study design

Our primary analyses explored the impact of signature-trained prognostic scores, categorized in accordance with published cut-points for each assay, for patients with centrally confirmed estrogen receptor positive (ER+ve) HER2 negative (HER2−ve) disease^30,31,32. HER2 positive (HER2+ve) cases were excluded since during recruitment of the TEAM trial HER2 targeted therapies were not used in this setting. We performed a secondary analysis using dichotomized scores for Oncotype Dx and Prosigna to reflect the results of the TailorX study. We also report a complete cohort analysis, including HER2+ve cases (see Supplementary Information), since no assay used was trained on samples treated with HER2-targeted therapies. Supplementary analyses further sub-divide patient groups into node negative cases treated with endocrine therapy (but not chemotherapy), node positive cases treated with endocrine therapy (but not chemotherapy) and cases treated with chemotherapy and endocrine therapy (both node negative and node positive, supplementary methods, data and figures).

Patient samples

Patient samples were derived from the Tamoxifen Exemestane Adjuvant Multicenter (TEAM) Trial pathology study (Supplementary Table 1; NCT00279448/NCT0032126/NCT0036270, NTR267, UMIN C000000057)^19,33 and included only hormone receptor positive, post-menopausal cancers. Patients provided informed consent and this study was approved by the University of Toronto REB (protocol number 29021).

RNA profiling using NanoString

Profiling of all samples was performed using mRNA previously extracted and analyzed using a custom NanoString codeset as described previously²². Five 4 μm formalin-fixed paraffin-embedded (FFPE) sections per case were deparaffinised, tumor areas were macro-dissected and RNA extracted using the Ambion^® Recoverall™ Total Nucleic Acid Isolation Kit-RNA extraction protocol (Life TechnologiesTM, ON, Canada). RNA aliquots were quantified using a Nanodrop-8000 spectrophometer (Delaware, USA). All 3825 RNAs extracted from the TEAM pathology cohort were successfully assayed. Probes for each gene were designed and synthesized at NanoString^® Technologies (Seattle, WA, USA); and 250 ng of RNA for each sample were hybridized, processed and analyzed using the NanoString^® nCounter^® Analysis System, according to NanoString^® Technologies protocols.

Signature-trained Risk Stratification Scores from candidate assays

We compared two different approaches to the generation of simulated risk scores¹⁸, and selected a training and validation approach using results obtained from the OPTIMA prelim study⁴ to fit risk stratification scores generated for this study to those derived from the relevant commercial assay. For all tests, we used the suffix-trained to discriminate the computationally derived assays scores from the commercially derived scores, e.g. Oncotype-trained vs. Oncotype-DX™.

Methods for cross comparisons between Tests

Results were available for 3811 subjects. Cases were grouped into the pre-defined risk categories for each test as follows: Oncotype DX—low risk < 18, intermediate risk 18–31 (supplementary methods), high risk ≥ 31; Prosigna-ROR-PT—low risk < 41, intermediate risk 41–60, high risk ≥ 61^3,20,34; MammaPrint—low risk and high risk¹⁸. We also performed a dichotomized risk analysis for Oncotype Dx using low/intermediate risk 0–25 and high risk > 25, in line with the TailorX study², and for Prosigna RT using low/intermediate risk < 61 and high risk ≥ 61. Grouped analyses were performed as follows: (1) ER+/HER2−ve (n = 3284); and (2) hormone-receptor positive (HR+) regardless of HER2 status (n = 3811). Subjects were considered HR+ve if ER and/or progesterone receptor (PR) was reported as positive³³. Differences in distant metastasis free survival (DMFS; i.e. time to first distant recurrence or death, excluding ipsilateral breast cancer recurrences but including distant metastasis, contralateral breast cancer and death from breast cancer) were evaluated using the Kaplan–Meier method with test equality of survivor functions assessed by log-rank and graphs with risk tables generated. 10-year survival function with 95% confidence intervals (95%CI) were calculated as DMFS10. Hazard ratios (HRs) were calculated using Cox proportional hazards regression models, with appropriate adjustments to obtain HRs for each risk level, with low risk set as reference. To assess the prognostic information of each signature, we evaluated the likelihood ratio χ² (LRχ²) statistics based on the Cox models, and the difference in LRχ²(ΔLRχ²) was calculated to assess prognostic improvement. All analyses were performed using Stata 14.2 (StataCorp, College Station, TX) and R 4.0.2. Reported p-values were two-sided with p < 0.05 considered statistically significant.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The data generated and analyzed during this study are described in the following data record: https://doi.org/10.6084/m9.figshare.14617113³⁵. The data generated and analyzed as part of this study take the form of 3811 individual Nanostring data files (one per sample). These data represent part of a clinical trial and were used under license for the current study, therefore restrictions apply to their availability. The data are housed in institutional storage at The Ontario Institute for Cancer Research (OICR) and are not publicly available, but can be made available upon request subject to approval from the TEAM steering committee and after appropriate data sharing agreements have been completed. Requests for data access should be directed to the senior author (J.M.S.B.).

Code availability

The codes that support these findings are subject to patent applications and restrictions related to licenses. Codes are available from the author J.M.S.B. upon reasonable request and with the permission of the Ontario Institute for Cancer Research (OICR).

References

Vieira, A. F. & Schmitt, F. An update on breast cancer multigene prognostic tests—emergent clinical biomarkers. Front. Med. 5, https://doi.org/10.3389/fmed.2018.00248 (2018).
Sparano, J. A. et al. Adjuvant chemotherapy guided by a 21-gene expression assay in breast cancer. N. Engl. J. Med. 379, 111–121 (2018).
Article CAS Google Scholar
Sestak, I. et al. Prediction of chemotherapy benefit by EndoPredict in patients with breast cancer who received adjuvant endocrine therapy plus chemotherapy or endocrine therapy alone. Breast Cancer Res. Treat. 176, 377–386 (2019).
Article CAS Google Scholar
Bartlett, J. M. et al. Comparing breast cancer multiparameter tests in the OPTIMA Prelim trial: no test is more equal than the others. J. Natl Cancer Inst. 108, djw050 (2016).
Article Google Scholar
Cardoso, F. et al. 70-Gene signature as an aid to treatment decisions in early-stage breast cancer. N. Engl. J. Med. 375, 717–729 (2016).
Article CAS Google Scholar
Sparano, J. A. et al. Clinical and genomic risk to guide the use of adjuvant therapy for breast cancer. N. Engl. J. Med. 380, 2395–2405 (2019).
Article CAS Google Scholar
Bartlett, J. et al. Selecting breast cancer patients for chemotherapy: the opening of the UK OPTIMA trial. Clin. Oncol. (R. Coll. Radiol.) 25, 109–116 (2013).
Article CAS Google Scholar
Ramsey, S. D. et al. Integrating comparative effectiveness design elements and endpoints into a phase III, randomized clinical trial (SWOG S1007) evaluating oncotypeDX-guided management for women with breast cancer involving lymph nodes. Contemp. Clin. Trials 34, 1–9 (2013).
Article Google Scholar
Pan, H. et al. 20-Year risks of breast-cancer recurrence after stopping endocrine therapy at 5 years. N. Engl. J. Med. 377, 1836–1846 (2017).
Article Google Scholar
Prat, A., Ellis, M. J. & Perou, C. M. Practical implications of gene-expression-based assays for breast oncologists. Nat. Rev. Clin. Oncol. 9, 48–57 (2012).
Article CAS Google Scholar
Fan, C. et al. Concordance among gene-expression-based predictors for breast cancer. N. Engl. J. Med. 355, 560–569 (2006).
Article CAS Google Scholar
Kelly, C. M. et al. Agreement in risk prediction between the 21-gene recurrence score assay (Oncotype DX(R)) and the PAM50 breast cancer intrinsic Classifier in early-stage estrogen receptor-positive breast cancer. Oncologist 17, 492–498 (2012).
Article Google Scholar
Mackay, A. et al. Microarray-based class discovery for molecular classification of breast cancer: analysis of interobserver agreement. J. Natl Cancer Inst. 103, 662–673 (2011).
Article CAS Google Scholar
Weigelt, B. et al. Breast cancer molecular profiling with single sample predictors: a retrospective analysis. Lancet Oncol. 11, 339–349 (2010).
Article CAS Google Scholar
Dowsett, M. et al. Comparison of PAM50 risk of recurrence score with oncotype DX and IHC4 for predicting risk of distant recurrence after endocrine therapy. J. Clin. Oncol. 31, 2783–2790 (2013).
Article Google Scholar
Sgroi, D. C. et al. Prediction of late distant recurrence in patients with oestrogen-receptor-positive breast cancer: a prospective comparison of the breast-cancer index (BCI) assay, 21-gene recurrence score, and IHC4 in the TransATAC study population. Lancet Oncol. 14, 1067–1076 (2013).
Article Google Scholar
Bayani, J. et al. Molecular stratification of early breast cancer identifies drug targets to drive stratified medicine. npj Breast Cancer 3, 3 (2017).
Article Google Scholar
Bartlett, J. M. S. et al. Computational approaches to support comparative analysis of multiparametric tests: modelling versus Training. PLoS ONE 15, e0238593–e0238593 (2020).
Article CAS Google Scholar
van de Velde, C. J. H. et al. Adjuvant tamoxifen and exemestane in early breast cancer (TEAM): a randomised phase 3 trial. Lancet 377, 321–331 (2011).
Article Google Scholar
Sestak, I. et al. Comparison of the performance of 6 prognostic signatures for estrogen receptor–positive breast cancer: a secondary analysis of a randomized clinical trialprognostic signatures for estrogen receptor–positive breast cancerprognostic signatures for estrogen receptor–positive breast cancer. JAMA Oncol. 4, 545–553 (2018).
Article Google Scholar
Buus, R. et al. Molecular drivers of oncotype DX, Prosigna, EndoPredict, and the Breast Cancer Index: a TransATAC study. J. Clin. Oncol. 20, 00853 (2020).
Google Scholar
Bayani, J. et al. Molecular stratification of early breast cancer identifies drug targets to drive stratified medicine. npj Breast Cancer 3, 3 (2017).
Article Google Scholar
Bayani, J. et al. Identification of distinct prognostic groups: implications for patient selection to targeted therapies among anti-endocrine therapy-resistant early breast cancers. JCO Precis. Oncol. 3, 1–13 (2019).
Google Scholar
Pereira, B. et al. The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes. Nat. Commun. 7, 11479 (2016).
Article CAS Google Scholar
Vallon-Christersson, J. et al. Cross comparison and prognostic assessment of breast cancer multigene signatures in a large population-based contemporary clinical series. Sci. Rep. 9, 12184 (2019).
Article Google Scholar
Cuzick, J. et al. Prognostic value of a combined ER, PgR, Ki67, HER2 immunohistochemical (IHC4) score and comparison with the GHI recurrence score - results from TransATAC. Cancer Res. 69, 503S–503S (2009).
Article Google Scholar
Dowsett, M. et al. Prediction of risk of distant recurrence using the 21-gene recurrence score in node-negative and node-positive postmenopausal patients with breast cancer treated with anastrozole or tamoxifen: a TransATAC study. J. Clin. Oncol. 28, 1829–1834 (2010).
Article Google Scholar
Cuzick, J. et al. Prognostic value of a combined estrogen receptor, progesterone receptor, Ki-67, and human epidermal growth factor receptor 2 immunohistochemical score and comparison with the genomic health recurrence score in early breast cancer. J. Clin. Oncol. 29, 4273–4278 (2011).
Article Google Scholar
Sestak, I. et al. Factors predicting late recurrence for estrogen receptor-positive breast cancer. J. Natl Cancer Inst. 105, 1504–1511 (2013).
Article CAS Google Scholar
Bartlett, J. M., Rea, D. & Rimm, D. L. Quantification of hormone receptors to guide adjuvant therapy choice in early breast cancer: better methods required for improved utility. J. Clin. Oncol. 29, 3715–3716 (2011).
Article Google Scholar
Bartlett, J. M. et al. Mammostrat as an immunohistochemical multigene assay for prediction of early relapse risk in the tamoxifen versus exemestane adjuvant multicenter trial pathology study. J. Clin. Oncol. 30, 4477–4484 (2012).
Article CAS Google Scholar
Bartlett, J. M. et al. Do type 1 receptor tyrosine kinases inform treatment choice? A prospectively planned analysis of the TEAM trial. Br. J. Cancer 109, 2453–2461 (2013).
Article CAS Google Scholar
Bartlett, J. M. S. et al. Estrogen receptor and progesterone receptor as predictive biomarkers of response to endocrine therapy: a prospectively powered pathology study in the tamoxifen and exemestane adjuvant multinational trial. J. Clin. Oncol. 29, 1531–1538 (2011).
Article CAS Google Scholar
Sestak, I. et al. Abstract P5-06-05: discordant classification and outcomes between Prosigna and Oncotype Dx Recurrence Score for ER-positive, HER2-negative, node-negative breast cancer. Cancer Res. 80, P5-06-05 (2020).
Article Google Scholar
Bartlett, J. M. et al. Metadata Record for the Article: Comparative Survival Analysis of Multiparametric Tests—when Molecular Tests Disagree—A TEAM Pathology Study https://doi.org/10.6084/m9.figshare.14617113 (2021).

Download references

Acknowledgements

Research at the Ontario Institute for Cancer Research is supported by the Government of Ontario. P.C.B. was supported by Genome Canada, by CIHR New Investigator Award and by a Terry Fox Research Institute New Investigator Award. R.C.S. was supported by the National Institute for Health Research University College London Hospitals Biomedical Research Centre. The funder had no role in the analysis or reporting of results.

Author information

These authors contributed equally: John M.S. Bartlett, Jane Bayani, Elizabeth Kornaga.

Authors and Affiliations

Diagnostic Development, Ontario Institute for Cancer Research, Toronto, ON, Canada
John M. S. Bartlett, Jane Bayani, Elizabeth Kornaga & Keying Xu
Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
John M. S. Bartlett
Edinburgh Cancer Research Centre, Edinburgh, UK
John M. S. Bartlett & Tammy Piper
Translational Laboratories, Tom Baker Cancer Centre, Calgary, AB, Canada
Elizabeth Kornaga
Department of Oncology, McMaster University, Kingston, ON, Canada
Greg R. Pond
Department of Pathology, Glasgow, UK
Elizabeth Mallon
Informatics & Computational Biology, Ontario Institute for Cancer Research, Toronto, ON, Canada
Cindy Q. Yao & Paul C. Boutros
Department of Medical Biophysics, University of Toronto, Toronto, Canada
Paul C. Boutros
Department of Pharmacology & Toxicology, University of Toronto, Toronto, Canada
Paul C. Boutros
Jonsson Comprehensive Cancer Center, University of California, Los Angeles, USA
Paul C. Boutros
Dept of Gynecology and Obstetrics, University Center Mainz, Mainz, Germany
Annette Hasenburg
University of Warwick, Coventry, UK
J. A. Dunn
National and Kapodistrian University of Athens, Medical School, Athens, Greece
Christos Markopoulos
St. Augustinus Hospital, Antwerp, Belgium
Luc Dirix
Erasmus MC Cancer Institute, Rotterdam, the Netherlands
Caroline Seynaeve
Leiden University Medical Center, Leiden, the Netherlands
Cornelis J. H. van de Velde
National Institute for Health Research University College London Hospitals Biomedical Research Centre, London, UK
Robert C. Stein
Cancer Research UK Clinical Trials Unit, University of Birmingham, Birmingham, UK
Daniel Rea

Authors

John M. S. Bartlett
View author publications
You can also search for this author in PubMed Google Scholar
Jane Bayani
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Kornaga
View author publications
You can also search for this author in PubMed Google Scholar
Keying Xu
View author publications
You can also search for this author in PubMed Google Scholar
Greg R. Pond
View author publications
You can also search for this author in PubMed Google Scholar
Tammy Piper
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth Mallon
View author publications
You can also search for this author in PubMed Google Scholar
Cindy Q. Yao
View author publications
You can also search for this author in PubMed Google Scholar
Paul C. Boutros
View author publications
You can also search for this author in PubMed Google Scholar
Annette Hasenburg
View author publications
You can also search for this author in PubMed Google Scholar
J. A. Dunn
View author publications
You can also search for this author in PubMed Google Scholar
Christos Markopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Luc Dirix
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Seynaeve
View author publications
You can also search for this author in PubMed Google Scholar
Cornelis J. H. van de Velde
View author publications
You can also search for this author in PubMed Google Scholar
Robert C. Stein
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rea
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.M.S.B., J.B., R.C.S., D.R. contributed to the conception and design of the work, the acquisition, analysis, and interpretation of data, and were in involved in drafting, critical review and approval of the final submitted version. They have agreed to be personally accountable for their contributions. E.K., K.X., G.R.P., C.Q.Y., and P.C.B. contributed the acquisition, analysis, and interpretation of data, and were in involved in critical review and approval of the final submitted version. They have agreed to be personally accountable for their contributions. T.P., E.M., A.H., J.A.D., C.M., L.D., C.S. and C.J.H.v.d.V. contributed the acquisition and interpretation of data, and were in involved in critical review and approval of the final submitted version. They have agreed to be personally accountable for their contributions. All authors contributed to the conception or design of the work or to the acquisition, analysis or interpretation of data. All authors were involved in the drafting and/or critical review.

Corresponding author

Correspondence to John M. S. Bartlett.

Ethics declarations

Competing interests

J.M.S.B. has received consultancy or honoraria from Insight Genetics Inc., BioNTechAG, Biothernostics Inc., RNA Diagnsotics Inc., oncoXchange, NanoString Technologies Inc, and research funding from ThermoFisher Scientific, Genoptics, Agendia, NanoString Technologies Inc., Biotheranostics Inc. J.B. has received honoraria from ThermoFisher Scientific. G.P. has received consulting fees from Merck, Astra-Zeneca, Profound Medical, outside of submitted work; Honorarium for DSMB membership from Takeda outside of submitted work. A.H. has received honoraria from MedConcept GmbH, Med Update GmbH, Pfizer, Roche Pharma AG, Streamed up GmbH, Tesaro Bio Germany GmbH and serves on Advisory Boards for PharmaMar, Roche Pharma AG, Tesaro Bio Germany GmbH. C.M. has received consultancy from Genomic Heath. All other authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Bartlett, J.M.S., Bayani, J., Kornaga, E. et al. Comparative survival analysis of multiparametric tests—when molecular tests disagree—A TEAM Pathology study. npj Breast Cancer 7, 90 (2021). https://doi.org/10.1038/s41523-021-00297-7

Download citation

Received: 28 January 2021
Accepted: 27 May 2021
Published: 08 July 2021
DOI: https://doi.org/10.1038/s41523-021-00297-7

Subjects

Abstract

Similar content being viewed by others

Clinically high-risk breast cancer displays markedly discordant molecular risk predictions between the MammaPrint and EndoPredict tests

The incidence of discordant clinical and genomic risk in patients with invasive lobular or ductal carcinoma of the breast: a National Cancer Database Study

Concordance between results of inexpensive statistical models and multigene signatures in patients with ER+/HER2− early breast cancer

Introduction

Results

Comparing signature-trained risk scores—Likelihood ratios

Analysis of test performance by outcome in reclassified patients

Entire ER+ve/HER2−ve population

Oncotype-trained

Oncotype-trained stratified by Prosigna-trained

Oncotype-trained stratified by MammaPrint-trained

Prosigna-trained results

Prosigna-trained results stratified by Oncotype-trained results

Prosigna-trained stratified by MammaPrint-trained

MammaPrint-trained

MammaPrint-trained stratified by Oncotype-trained

MammaPrint-Trained results stratified by Prosigna-trained results

Sub-group analysis ER+ve/HER2-ve, Node-ve patients not treated with chemotherapy

Oncotype-trained

Oncotype-trained results stratified by Prosigna-trained results

Oncotype-trained stratified by MammaPrint-trained

Prosigna-trained stratified by Oncotype-trained

Prosigna-trained stratified by MammaPrint-trained

MammaPrint-trained results

Discussion

Methods

Study design

Patient samples

RNA profiling using NanoString

Signature-trained Risk Stratification Scores from candidate assays

Methods for cross comparisons between Tests

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary Information

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links