The healthcare value of the Magee Decision Algorithm™: use of Magee Equations™ and mitosis score to safely forgo molecular testing in breast cancer

Magee Equations™ are multivariable models that can estimate oncotype DX® Recurrence Score, and Magee Equation 3 has been shown to have chemopredictive value in the neoadjuvant setting as a standalone test. The current study tests the accuracy of Magee Decision Algorithm™ using a large in-house database. According to the algorithm, if all Magee Equation scores are <18, or 18–25 with a mitosis score of 1, then oncotype testing is not required as the actual oncotype recurrence score is expected to be ≤25 (labeled “do not send”). If all Magee Equation scores are 31 or higher, then also oncotype testing is not required as the actual score is expected to be >25 (also “do not send”). All other cases could be considered for testing (labeled “send”). Of the 2196 ER+, HER2-negative cases sent for oncotype testing, 1538 (70%) were classified as “do not send” and 658 (30%) as “send”. The classification accuracy in the “do not send” group was 95.1%. Of the 75 (4.9%) discordant cases (expected score ≤25 by decision algorithm but the actual oncotype score >25), 26 received endocrine therapy alone. None of these 26 patients experienced distant recurrence (average follow-up of 73 months). The Magee Decision Algorithm accurately identifies cases that will not benefit from oncotype testing. Such cases constitute ~70% of the routine clinical oncotype requests, an estimated saving of $300,000 per 100 test requests. The occasional discordant cases (expected ≤25, but actual oncotype score >25) appears to have an excellent outcome on endocrine therapy alone.


Introduction
Several molecular tests are now regularly used in the management of breast cancer in routine clinical practice.
Although developed mostly as prognostic assays, the majority of the testing is performed to make therapy decisions in hormone receptor-positive breast cancers. The most commonly used assay in the United States is oncotype DX®. Based on earlier studies that utilized tissue blocks from National Surgical Adjuvant Breast and Bowel Project B-14 and B-20 clinical trials, oncotype clinical risk (of recurrence) categories were defined as low-risk (score 0 to <18, average risk of 7% assuming patient receives tamoxifen for 5 years), intermediate-risk (score 18-30, average risk 14%), and high-risk (score 31 or higher, average risk approaching 30%) [1,2]. These retrospective studies showed the benefit of chemotherapy only in the high-risk group with no benefit in low risk and negligible benefit in intermediate-risk group [1,2]. However, instead of using these predefined group scores, the prospective clinical trial (Trial Assigning Indi-viduaLized Options for Treatment or TAILORx) designed to assess the usefulness of oncotype testing redefined the intermediate-risk group as score [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25]. Consequently, patients with scores 0-10 received only endocrine therapy, patients with scores >25 received both endocrine and chemotherapy. Patients with oncotype recurrence score [11][12][13][14][15][16][17][18][19][20][21][22][23][24][25] were randomized to receive either endocrine therapy alone or both endocrine and chemotherapy. After 9 years of average follow-up, the recurrence rate and survival were similar between the endocrine only group and the chemoendocrine group concluding that there is a lack of chemotherapy benefit in patients with recurrence score 11-25 [3]. Although practice changing, the results are not entirely unexpected. The earlier oncotype validation studies and a recent retrospective study of the Surveillance Epidemiology End Result database showed similar results [4].
We have previously designed multivariable models called Magee Equations™ (ME) that can estimate oncotype score [5,6]. These models use routinely reported histopathology and breast cancer biomarker data to provide a score similar to oncotype. One of the equations (ME3) has been shown to predict for a pathologic complete response to neoadjuvant chemotherapy in ER+/HER2-negative tumors [7]. With the new oncotype recurrence score cut-off value of 25, we recently described a decision algorithm using MEs and tumor mitotic activity score to safely forgo oncotype testing [8].
The primary goal of the current study was to evaluate the accuracy of the Magee Decision Algorithm TM within a large database. The secondary goal was to determine the clinical outcome for cases where the results are deemed discordant.

Methods
The current study tests the accuracy of the Magee Decision Algorithm™ using a large in-house database. According to the algorithm (Fig. 1), if all ME scores are <18, or 18-25 with a mitosis score of 1, then oncotype testing is not required as the actual oncotype recurrence score will be ≤25 (these cases were labeled as "do not send-expect low risk"). If all ME scores are 31 or higher, then also oncotype testing is not required as the actual score will be >25 (labeled as "do not send-expect high risk"). All other cases, i.e., any or all ME scores 18-25 and mitosis score >1, and any or all ME scores >25 to <31 regardless of the mitosis score could be considered for testing (labeled as "send'). The triage of cases as "send" or "do not send" was compared with actual oncotype recurrence score results. We analyzed all ER+, HER2-negative cases (including HER2 immunohistochemical score 2+ cases with HER2 copies of 4 to <6, such cases are classified as equivocal for ME score calculation) sent for oncotype testing with available pathology parameters for calculation of all MEs. The cases included in the study are from two in-house databases, a "retrospective" cohort (cases sent for clinical oncotype testing from 2007 to 2015; 1824 cases) and the cohort of cases used for "prospective" value study (cases sent for clinical oncotype testing in last 3 years; 372 cases), the partial results for which were recently published [8]. This resulted in a total of 2196 cases that formed the basis of this study.
Other details regarding variables required for calculation of ME scores are provided within Supplementary information (Supplementary data-methods).
For comparison of means, independent sample t-tests were performed. Univariable analysis was performed using χ 2 and Fisher exact tests to compare the differences in percentages between groups. A p value < 0.05 was considered significant. Kaplan-Meier survival curves for distant recurrence free survival were analyzed for "discordant" cases (i.e., expected score ≤25 but actual oncotype score >25) and the p values were obtained using log-rank test (GraphPad Prism software, version 8.3.0, San Diego, CA).

Results
The age of patients ranged from 26 to 87, with a median age of 59 years. Most were early-stage breast cancers. The median tumor size was 1.6 cm. Of the 2196 cases, 1879 (86%) were lymph node negative. The 2196 cases included 503 grade 1 (23%), 1352 grade 2 (61%), and 36 grade 3 (16%) tumors. A higher number of grade 2 tumors indicate the selection bias for requesting clinical oncotype testing. All cases were estrogen receptor (ER) positive and 2018 (92%) were progesterone receptor (PR) positive. All cases were HER2 negative, including the 53 or 2% cases with HER2 immunohistochemical score 2+ and HER2 copies of 4 to <6 per cell by fluorescence in situ hybridization.
Of the 2196 cases, 1538 (70.1%) were classified as "do not send" and 658 (29.9%) as "send". The classification accuracy in the "do not send" group was 95.1% (see Table 1). Of the 75 discordant cases (expected ≤25, but actual oncotype >25, see Table 1), 41 received chemoendocrine therapy, 2 received chemotherapy only, 26 had endocrine therapy alone (mostly an aromatase inhibitor), and 6 did not receive any systemic therapy (Fig. 2). The average follow-up was 71 months. The follow-up duration was similar for the chemo-endocrine therapy group (average: 71.9 months; interquartile range of 49.8-98.5 months) and in the endocrine therapy alone group (average: 72.9 months; interquartile range of 52.2-95.4 months). There were three distant recurrences, two in patients that received chemo-endocrine therapy and one in a patient who did not receive any systemic therapy. No distant recurrences were recorded in the group that received hormonal therapy alone. Two of the patients with recurrence died of disease (one patient who received chemo-endocrine therapy and one patient who did not receive any systemic therapy). There were two other deaths in the cohort but the cause was unrelated to breast cancer. The average age was 61 years for these 75 "discordant" cases with 9 patients being age 50 and below. Four of these young patients received chemo-endocrine therapy and four received endocrine therapy alone and one received no systemic therapy. As mentioned above, no recurrences and deaths were noted in patients who received endocrine therapy alone. The clinical-pathologic features of these 75 "discordant" cases were compared with 1443 "concordant" (expected ≤25 and actual oncotype ≤25) cases. The only parameter that showed statistically significant difference was PR expression ( Table 2).
Of the 2196 total cases, 513 patients were age 50 or less. The results in this cohort were similar to the overall results. Of the 513 cases, 333 (65%) were classified as "do not send" based on the Magee Decision Algorithm and the classification accuracy of the "do not send" group was 97%. Within this "do not send" group, the percentage of cases with actual oncotype score <21 was 87% and the percentage of cases with actual oncotype score <16 was 59%.
With regards to the cases classified as "send" (n = 658; 30% of the entire cohort), 191 cases (29%) had actual oncotype of >25. The clinical-pathologic data of the cases  labeled as "send" was compared with the cases labeled "do not send-expect low" ( Table 3). As expected, the cases labeled as "send" showed more aggressive histopathologic features.
In addition, we examined the results with respect to individual equations and the mean equation score within the Magee Decision Algorithm™ ( Table 4). The detailed result tables are provided within Supplementary data (Supplementary data-entire dataset).
The data were also analyzed separately for the retrospective cohort and the cases from the prospective value study. The results were similar to the combined dataset and the details are provided in the Supplementary data (Supplementary data-cases from retrospective dataset and Supplementary data-cases from prospective value study).

Discussion
In recent decades, breast medical oncologists in the United States have been trying to de-escalate the use of chemotherapy in ER+, HER2-negative early-stage breast cancer. This approach seems to be taking hold but there is still a lot of variability in chemotherapy use. However, it is important to understand why chemotherapy was overused in the first place. It appears that the National Institute of Health consensus statement in the year 2000 was partly responsible for chemotherapy overuse [9]. This statement basically recommended the use of chemotherapy in any breast cancer >1 cm (both lymph node-positive and lymph node negative). No consideration was given to tumor grade despite having ample data regarding breast cancer grade and prognosis at the time [10]. Subsequent use of breast cancer biomarkers in routine practice and molecular characterization of breast cancer confirmed different prognostic groups of ER+ breast cancers and heterogeneous benefit from cytotoxic chemotherapy [11,12]. Non-pathologists have been critical of the subjective nature of breast cancer grading but the observed variability in grading among pathologists is no worse than categorization of tumors by current molecular assays. In a comparative study, all molecular assays categorized a comparable number of cases as low or high risk, but at the individual tumor level there was significant variability [13,14]. Nevertheless, medical oncologists continue to use molecular assays for making therapy decisions. The chemotherapy recommendation can be different for the same patient depending on the molecular assay utilized to make such a recommendation [13]. Oncotype DX® remains the most frequently used test in the United States for making breast cancer systemic therapy Table 3 Clinical-pathologic characteristics of cases classified as "send" compared with cases labeled as "do not send-expect low".  decisions. Initially described as a three-tiered test (low [0 to <18], intermediate [18][19][20][21][22][23][24][25][26][27][28][29][30], and high risk [>31]), the cutoffs are now changed to low risk (≤25) or high risk (>25) based on the results of TAILORx prospective clinical trial [3]. The recently published results from the TAILORx study showed similar survival of patients with oncotype scores of 11-25 receiving either chemo-endocrine therapy or endocrine therapy alone. There was some benefit in disease-free survival (but not in overall survival) with chemo-endocrine therapy in premenopausal patients with scores 16-25. However, this slight additional benefit of chemotherapy in premenopausal patients could have been due to ovarian suppression. It is to be noted that most patients in the TAILORx trial received tamoxifen alone as the endocrine therapy, while the Suppression of Ovarian Function Trial and the Tamoxifen and Exemestane Trial have shown that aromatase inhibitor with ovarian suppression is a superior form of endocrine therapy [15][16][17]. It is questionable how much additional benefit one can derive from chemotherapy after being treated with aromatase inhibitor and ovarian suppression. After the publication of these prospective trial results, it is generally accepted that postmenopausal patients with oncotype scores 25 or less do not require chemotherapy. These results suggest that if routine pathologic examination can confidently predict for oncotype score of 25 or less in early-stage breast cancer, then it can of be significant clinical value and provides extraordinary healthcare value for patients. Our group has previously published multivariable models to estimate the oncotype score, first as proof of principle and later as a clinically useful tool to decide if a particular tumor needs oncotype testing [5,6]. The models, now commonly known as MEs have been shown to be strongly chemopredictive in the neoadjuvant setting and also appear to have prognostic value [7]. In light of the TAILORx results we recently described an algorithmic approach to safely forgo oncotype testing. In the previously published prospective value study, cases with all MEs scores of <18 and cases with scores 18-25 but mitosis score of 1 almost always showed an actual oncotype score of 25 or less [8]. We also showed that in rare discordant cases, there are generally noninvasive tumor factors that appear to alter the actual score [8]. The current study is large-scale validation of this Magee Decision Algorithm™.
For the current study, we used ER+/HER2-negative cases sent for clinical oncotype testing and had Pathology data for calculation of all 3 ME scores. Using this large database of over 2000 cases, we unequivocally show the clinical usefulness of the Magee Decision Algorithm™. When cases are classified as "do not send (expect low)", then the likelihood of the actual oncotype score coming back as >25 is <5%. Interestingly, even in those rare cases where results are deemed discordant (estimated ≤25, actual >25), chemotherapy use in such patients did not show any survival benefit compared with patients who received only endocrine therapy (Fig. 2). Comparison of clinical-pathologic features of the cases deemed "discordant" with the "concordant" cases (i.e., estimated ≤25 and actual also ≤25) showed only progesterone expression to be significantly different ( Table 2). This is a well-known fact that oncotype score is inversely related to PR expression levels [18][19][20][21][22]. However, when PR expression is the only variable driving up the oncotype score (the "discordant" cases in the current study), then it may not affect patient outcome when they are treated with endocrine therapy alone. This underscores the importance of using a multivariable model, such as MEs over a single variable.
In addition to the decision algorithm that utilizes all three equations, we also analyzed the data using individual equations and the average MEs score (see Supplementary data). Using individual equations in the decision algorithm slightly increased the percentage of cases classified as "do not send", but the accuracy of "do not send" algorithm also decreased slightly, particularly impacting the ability to predict "do not send-expect high risk" category ( Table 4). The results for the use of the average ME score are almost similar to using all equations. Although results for each of the equations are comparable, the use of all equations slightly increases the accuracy of results and shall increase the user's confidence to safely forgo oncotype testing.
Our group was the first to suggest that routine histopathologic data can estimate the oncotype score and also defined a multivariable model in 2008 which was revised in 2013 [5,6]. Since then, there have been several publications that have either validated MEs or defined other similar models [18,19,[21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36][37][38][39]. However, the accuracy and simplicity of MEs make it easier to use in routine practice to make confident clinical decisions. MEs require and also provide more granular data to make therapy decisions compared with other published models. This can be explained by taking a hypothetical example comparing MEs with the University of Tennessee Medical Center (UTMC) Nomogram which was recently updated after the TAILORx trial results [33,34]. The example is of a common type of ER+ breast cancer, i.e., a 55year-old patient with 2.0 cm, grade II (Nottingham score 6, with mitosis score of 1), lymph node negative, ER+ (H-score of 300), PR negative (H-score 0), HER2 negative, Ki-67 labeling index of 15% invasive ductal carcinoma (Fig. 3). Using the University of Tennessee Nomogram (https://utgsm. shinyapps.io/OncotypeDXCalculator/), the probability of a low-risk oncotype is 53% and the probability of a high-risk oncotype is 47%. However, the estimated ME scores (https:// path.upmc.edu/onlineTools/mageeequations.html) on this case are 21.5 (equation 1), 21.7 (equation 2), and 20.6 (equation 3). Using the Magee Decision Algorithm (equation results between 18 and 25 and mitosis score of 1), this case will result in an actual oncotype score of 25 or less with over 95% certainty. In such cases, one could forgo oncotype testing using Magee Decision Algorithm but this decision cannot be taken based on UTMC Nomogram results.
Our study has enormous cost-saving implications. The cases included in this study are the cases sent for clinical oncotype testing, mostly requested by breast medical oncologists. Medical oncologists at our institution generally follow guidelines set forth by national societies (American Society of Clinical Oncology and National Comprehensive Cancer Network), but there is individual variation in ordering oncotype based on individual patient factors. There was no bias in case selection except that there was preponderance of Grade II cases, which are considered "borderline" for treatment purposes. Based on Magee Decision Algorithm™, the oncotype testing could have been avoided in 70% of the cases without having any negative clinical impact. Magee Decision Algorithm utilizes morphoimmunohistologic variables from a routine pathology report for which there is no additional cost. The calculator for MEs is available online on the department website for anyone to use for free (https://path.upmc.edu/onlineTools/ mageeequations.html). Even login information is not required. In contrast, oncotype testing costs over $4000 per test. For every 100 tests, the institution/insurance could have saved~$300,000 without impacting patient care. The counter-argument that savings from avoiding chemotherapy based on oncotype far outweighs the cost of the assay is not valid as similar savings can be attained using MEs/Magee Decision Algorithm. Additional savings come from safely forgoing oncotype testing. Others have also reported significant cost savings with the use of MEs [37,38,40]. This should alarm integrated health systems (provider and insurer) that want to move toward value-based system [41].
The study strength is that it utilized a large database to test the validity of the decision algorithm. In addition, the pathology slides were not reviewed for this study and the results from the report were taken as-is for calculation of ME scores. This is what is expected in routine practice. There has been some concern regarding MEs or similar models that require semi-quantitative results with respect to standardization and reproducibility [42,43]. However, we have shown good interobserver concordance for H-scores [44]. For Ki-67 evaluation, the pathologists at our institution have often used a more pragmatic approach rather than actual counting of 500 or 1000 tumor cells [45,46]. We first estimate the Ki-67 labeling index. If the estimate falls below 10 or above 50, then estimate stands as the final Ki-67 labeling index. If the estimate is between 10 and 50, then 50-100 cells are counted in a representative area based on the pathologist's discretion to arrive at the labeling index. This approach seems to have worked as seen in this study and our prior neoadjuvant study, where the Ki-67 labeling index has been used in a multivariable model to predict chemotherapy benefit [7]. One potential weakness of the study is that all cases are from one institution where pathology reports are signed out by breast pathologists. There are published studies on the usefulness of MEs from other institutions, but it is unclear how the Magee Decision Algorithm™ will perform at other academic and nonacademic institutions. This study can be used as a springboard for studies at other institutions or a multi-institutional study. Fig. 3 Example of a common type of ER positive breast cancer. Hematoxylin and eosin stained section of an invasive ductal carcinoma (a), grade II with Nottingham score of 6 (tubule formation score: 3; nuclear pleomorphism score: 2; mitotic activity score: 1). The tumor is diffusely and strongly positive for estrogen receptor with an H-score of 300 (b), but is negative for progesterone receptor with H-score of 0 (c). The tumor is negative for HER2 (not shown) with a Ki-67 labeling index of 15% (d).
After years of criticism regarding tumor grading and subjective reporting by pathologists, this study clearly shows the value of semi-quantitative scoring and using pathology-derived information in a cohesive manner that clinicians can understand. The data presented in this study provide a strong argument in favor of including MEs™ for stratifying patients in clinical trials. Magee Decision Algorithm™ provides an effective method to safely forgo oncotype DX® testing. This approach will save both time and valuable resources. This is particularly valuable for large institutions and/or integrated health systems.
Acknowledgements We thank the Magee pathologists for meticulously reporting the semi-quantitative receptor results on each breast cancer case without which this study was not possible. We also thank the patients who came to Magee-Womens Hospital of UPMC for their breast cancer care.

Compliance with ethical standards
Conflict of interest None of the authors have any conflict of interest related to the manuscript. However, DD is an independent contractor breast pathologist at PreludeDx (Laguna Hills, CA).
Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/.