Introduction

Breast cancer is the most common malignancy among women in the United States, accounting for 30% of female cancers, with an estimated 281,550 new cases and 43,600 deaths in 2021, respectively1. Thus, accurately predicting breast cancer outcomes is important for patients, treating physicians, and healthcare policy.

The American Joint Committee on Cancer (AJCC) staging system is arguably the most powerful tool in predicting the clinical outcomes for many cancer types, including breast cancer. The essential aims for cancer staging are to determine the extent of the disease and predict the patient’s prognostic outcome, thus assisting physicians in making individualized treatment plans. This historic anatomic framework takes into consideration the size of the primary tumor (T), the presence or absence of regional lymph node involvement (N), and distant metastasis (M), thus labeled as clinical or TNM stage, or anatomic stage groups2.

However, the evolving evidence over the last few decades has established the prognostic value of the histologic and biologic markers for this highly heterogeneous disease, including histologic grade, estrogen receptor (ER), progesterone receptor (PR), and human epidermal growth factor receptor 2 (HER2) status3,4. To that end, the most recent (8th edition) AJCC Cancer Staging Manual, developed using data from 238,265 patients identified in the National Cancer Database (NCDB) between 2010 and 2015, incorporated these four biomarkers into the newly established Prognostic Stage Groups. Furthermore, the Oncotype DX recurrent score, one of the most frequently used genomic tests and the only one with level 1 data available to date, has also been integrated into the prognostic staging2. The marked improvement of this staging scheme was further validated using the MD Anderson Cancer Center cohort and the California Cancer Registry database5. This staging system has been subsequently revised due to its inability to assign a significant proportion of breast cancer cases5. The revised prognostic staging provides two breast cancer prognostic stage tables. The Clinical Prognostic Stage is used to assign the stage for all patients according to the history, physical examination, imaging studies, and relevant biopsy findings, while the Pathologic Prognostic Stage is used only for patients who have undergone surgical resection as the initial treatment of their cancer before any systemic or radiation therapy. Genomic profile information is not utilized in the Clinical Prognostic Stage as this information is derived from the surgical specimen.

While the revised AJCC Clinical Prognostic Stage has been shown to provide a significantly superior discriminatory power in predicting breast cancer outcomes when compared to the TNM staging system6,7, utilization of this long and comprehensive table is quite cumbersome and thus is subject to erroneous stage assignment. In this study, we sought to construct a simple prognostic score to predict clinical outcomes in breast cancer patients based on U.S. population-based Surveillance, Epidemiology, and End Results (SEER) database, and validate the model using a prospective institutional tumor registry.

Materials and methods

Patients

After the approval of the institutional review board of the University of Alabama at Birmingham (UAB), a search of the UAB tumor registry was performed to identify female patients diagnosed with invasive breast cancer between 1998 and 2018. A search to identify breast cancer patients was also conducted in the SEER database, which retrospectively collects data from population-based cancer registries covering ~34.6% of the US population. SEER program statistical analysis software packages were used to identify the patients. Patients meeting the following criteria were included in the study: female patients older than 18 years; histologically confirmed primary breast cancer; diagnosed between 2010 and 2015. The patients’ demographic information (age at diagnosis and race), the pathologic characteristics of the primary tumor (histologic grade, ER, PR, HER2, and TNM status), and clinical outcomes were recorded from both cohorts. The assessment of ER, PR, and HER2 was performed as previously described8,9. Those missing any clinicopathologic factor required for the Clinical Prognostic Stage table in the two cohorts were excluded from the study. A subset of patients with non-Stage IV disease (n = 126) received neoadjuvant therapy in the UAB cohort while this information is not available in the SEER database. The median follow-up time was 53 and 63 months for SEER and UAB cohorts, respectively. Clinical Prognostic Stage Groups were assigned according to the revised 8th edition of AJCC breast cancer staging (available at: https://cancerstaging.org/references-tools/deskreferences/Pages/Breast-Cancer-Staging.aspx).

Construction and validation of the models

Multivariate analyses of the clinicopathologic factors were performed using the SEER database to identify factors significantly associated with disease-specific survival (DSS). A point value of 0–2 was assigned to each independent factor for DSS (P < 0.05) based on the magnitude of the hazard ratio. Variables with a hazard ratio between 1.01 and 4 were assigned 1 point while those with a hazard ratio between 4.01 and 8 was assigned 2 points10. The final score was calculated by summing the scores for all independent factors, as employed in the prognostic score.

Two models were built to assess the discriminating power for DSS. The Model 1 was based on all clinicopathologic factors, including age, race, histologic grade, ER, PR, HER2, and TNM status, whereas Model 2 incorporated all the aforementioned factors, except age and race. The SEER database was used as the training dataset, while the UAB cohort was used as the validation dataset. The performance and goodness of fit of the model were assessed using the Harrell concordance index (C-index) and the Akaike information criteria (AIC), respectively, as described hereafter.

Statistical analysis

The C-index, a measure used to evaluate predictions in a logistic regression model, was calculated to examine the model’s predictive performance with a higher C-index demonstrating a better discriminative performance for a staging scheme. The AIC, calculated based on the Cox proportional hazard model, was used to determine how well the model fits the data from which it was generated. AIC provides a means for model selection, a lower AIC indicating a more effective model in predicting survival outcomes. The categorical data obtained were analyzed by Chi-square testing, while continuous data were evaluated by using an independent t-test. DSS (from the date of diagnosis to the date of death from the disease) was calculated by Kaplan–Meier analysis. Patients who survived or were lost to follow-up were considered as censored data in the analysis. The Cox proportional hazards regression model was used to identify significant factors for DSS. A P value of <0.05 was considered statistically significant. Data analyses were performed using either SAS (version 9.1) software (SAS Institute Inc, Cary, NC) or IBM SPSS Statistics (Version 26) predictive analytics software.

Results

Clinicopathologic characteristics of breast cancer patients

A total of 36,152 patients from the SEER database and 4982 patients from the UAB Tumor Registry, with all clinicopathologic factors and outcome information available, were identified and included in our study for further analyses. The median ages at diagnosis were 60 and 58 years from the SEER and UAB cohorts, respectively. The majority of the patients were Caucasians in both cohorts (78.4% and 76.6%, respectively). However, the proportion of African American patients was significantly higher in our institutional cohort (22.0% vs. 12.0%), thus reflecting the racial and ethnic differences in the Southern United States. The distribution of other clinicopathologic factors was similar when comparing the two cohorts, except that the proportion of patients with metastatic disease at the initial presentation (M1) was significantly higher in the SEER database (9.2% vs. 1.9%). The clinicopathologic features of the breast cancer patients from the two cohorts are summarized in Table 1.

Table 1 Clinicopathologic characteristics of breast cancer.

Factors associated with DSS from the SEER database

A multivariate analysis was performed to identify clinicopathologic factors associated with DSS using the SEER database. To that end, clinical factors including an older age and being African American, and pathologic characteristics including higher histologic grade, larger tumor size, negative ER, PR or HER2 status, regional lymph node involvement, and distant organ metastasis were all independently associated with DSS, as shown in Table 2. Point values for significant predictive factors were assigned by using the criteria described in “Materials and methods.”

Table 2 Multivariate analysis for factors associated with disease-specific survival from the SEER database.

Construction and validation of the prognostic models

Two prognostic models were constructed based on the independent prognostic factors for DSS. Model 1 included all the aforementioned independent clinicopathologic factors, whereas age and race were not included in Model 2. The Kaplan–Meier analyses for DSS were conducted using both models as well as the AJCC Clinical Prognostic Stage (Fig. 1A–C). Among the three prognostic models, the AJCC Clinical Prognostic Stage had the highest C-index (0.841) and the lowest AIC (96,719). When comparing to Model 1, Model 2 had a higher C-index (0.737 vs. 0.969) and a lower AIC (97,096 vs. 97,056), indicating a superior stratification of the latter with respect to DSS (Table 3).

Fig. 1: Kaplan–Meier curves for disease-specific survival in the SEER database.
figure 1

A Model 1 (age, race, ER, PR, HER2, and TNM status); B Model 2 (ER, PR, HER2, and TNM status); C 8th AJCC clinical prognostic stage groups; D prognostic score category.

Table 3 Five-year disease-specific survival stratified by Clinical Prognostic Stage and Prognostic Score Category in the two cohorts.

We next validated the three prognostic models using the UAB cohort. Similar to the observations in the SEER database, the AJCC Clinical Prognostic Stage remained having the highest C-index (0.845) and the lowest AIC (4575), as shown in Fig. 2A–C. Moreover, Model 2 had a higher C-index (0.833) and a lower AIC (4592) when compared to that of Model 1 (C-index 0.813; AIC 4678), indicating a more effective model in predicting DSS. Thus, the addition of age and race did not facilitate stratification using the SEER database or the UAB cohort.

Fig. 2: Kaplan–Meier curves for disease-specific survival in the UAB cohort.
figure 2

A Model 1 (age, race, ER, PR, HER2, and TNM status); B Model 2 (ER, PR, HER2, and TNM status); C 8th AJCC clinical prognostic stage groups; D prognostic score category.

To that end, Model 2 was defined as our Prognostic Score model. Given the overlap among scores of 0, 1, and 2 on the Kaplan–Meier curves in both cohorts, a Prognostic Score Category was established based on the Model 2 as follows: Category I (score 0–2), Category II (score 3), Category III (score 4), Category IV (score 5), Category V (score 6), Category VI (score 7), Category VII (score 8), and Category VIII (score 9). The Kaplan–Meier curves based on the Prognostic Score Category for the SEER database (C-index 0.832; AIC 97082) and the UAB cohort (C-index 0.832; AIC 4590) were shown in Figs. 1D and 2D, respectively. Overall, the AJCC Clinical Prognostic Stage remained optimal, while the performance of the Prognostic Score Category was similar to the AJCC Clinical Prognostic Stage in the SEER database. The Model 2, Prognostic Score Category and AJCC Clinical Prognostic Stage performed similarly when using the UAB cohort (Table 3).

Five-year DSS by prognostic models and the pairwise hazard ratio in the two cohorts

We next examined and compared the 5-year DSS rates for the two cohorts determined by the AJCC Clinical Prognostic Stage and Prognostic Score Category. As summarized in Table 3, the former distinctly stratified the 5-year DSS, while the Prognostic Score Category showed a similar performance in stratifying prognostic outcomes in both cohorts.

Further comparison of the two prognostic schemes was performed by calculating the pairwise hazard ratio in the two cohorts. To that end, a significant difference was seen in all categories when compared to their proximate groups in both prognostic models in the SEER database, reflecting their excellence in stratifying the survival outcomes. In the UAB cohort, no significant difference was identified when comparing IIA vs. IB and IIIB vs. IIIA when utilizing the AJCC Clinical Prognostic Stage, whereas a significance was achieved in all pairwise comparisons using the Prognostic Score Category (Table 4). Thus, the latter demonstrated a similar, if not superior, discriminatory power when compared to the AJCC Clinical Prognostic Stage.

Table 4 Pairwise hazard ratio for disease-specific survival stratified by Clinical Prognostic Stage and Prognostic Score Category in the two cohorts.

Discussion

Over 50 prognostic models for breast cancer have been proposed since the 1980’s, including Adjuvant! Online11, PREDICT-Plus12,13, the Nottingham prognostic index (NPI)14, and CancerMath15. These models varied in terms of methods of development (developed independently or derived from the existing models) and validation (internal vs. external validation), predictors (inclusion of clinical and/or pathologic factors), outcomes (predicting mortality, recurrence, or both), and patients included (European, Asian, or North American populations). Most models performed well in their internal validation cohorts but were less accurate in some independent cohorts, and very few models have been validated widely in different settings. Interestingly, NPI, a prognostic index for patients with primary, operable breast cancer utilizing tumor size, histologic grade and nodal status, remains strongest in its predicting ability in most independent cohorts16. Moreover, there have been very few prognostic models applied utilizing all stage breast cancer patients receiving standard-of-care treatment, including neoadjuvant therapy. While C-index and AIC were most commonly used to assess the discriminatory power of the models, Kaplan–Meier curves and log-rank tests were used in some studies16.

Judging the performance and acceptability of a risk model is complex. In an effort to increasingly incorporate molecular biomarkers, in addition to anatomic stage, for accurate risk stratification that will help retain its fundamental role in defining a patient’s prognosis to guide precision therapy, the AJCC Precision Medicine Core published inclusion and exclusion criteria necessary for a risk model to potentially be endorsed by them in 201617. The committee defined “prognostic model” as a multivariable model when factors predict a clinical outcome that will occur in the future. To that end, only two of the 30 prognostication models identified for breast cancer, Adjuvant! Online and PREDICT-Plus, were found to have met all predefined AJCC inclusion and none of the exclusion criteria2.

Adjuvant! Online is primarily a tool to predict relapse and survival in early-stage breast cancer, developed based on SEER data. The factors included in the model are age, menopausal status, comorbidity, tumor size, number of positive lymph node, and ER status11. Similarly, PREDICT-Plus is positioned to estimate DSS following surgery for early breast cancer. The factors utilized in the model include age, mode of detection (clinical/screening), hormonal therapy, chemotherapy, tumor size, grade, number of positive lymph node, ER, HER2, and Ki-6712,13. Notably, both models were developed to assess survival outcomes for woman with early-stage breast cancer, while race and PR status were not included in these tools. Over the last decade, neoadjuvant systemic therapies have been increasingly applied in the management of patients with early and advanced breast cancer18,19,20,21.

The relationship between age at diagnosis and adjusted risk of breast cancer-specific survival is complex and the data are conflicting. A historical cohort study using the SEER database from 2000 to 2015 indicates a quadratic function22. On the other hand, data from Cancer Statistics have suggest that African American patients have lower stage-specific survival than Caucasians for most cancer types, including breast cancer1. However, another recent population-based study using the SEER database from 2004 to 2011 has indicated that race and ethnicity are important factors for cancer-specific survival after stage I diagnosis among American women, although much of the difference could be accounted for by intrinsic biological differences, such as lymph node/distant metastasis, and the triple-negative nature of the disease in a subset of individuals23. The fact that the addition of age and race did not facilitate stratification of survival outcomes using the SEER database or the UAB cohort in the current study further supports this notion.

The prognostic significance of PR status has long been controversial. In an early study of 3728 patients who underwent surgery as the first intervention between 1997 and 2006, pathologic stage combined with nuclear grade and ER status was most precise in predicting DSS, while adding lymphovascular invasion, PR and HER2 status did not provide further discriminating power. These results were subsequently validated using the SEER data24. Owing to the limitation of this model developed prior to the routine use of HER2-targeted therapy, the same group updated the model using a cohort of 3327 patients treated with surgery as an initial intervention from 2007 to 2013, and found that models incorporating nuclear grade, ER and HER2 status were most precise when validated using comparable cases from the California Cancer Registry diagnosed between 2010 and 2015 (n = 67,944). Thus, the authors proposed a Neo-Bioscore incorporating nuclear grade, ER and HER2 status with AJCC pathologic stage to provide a more refined stratification of breast cancer patients undergoing surgery as an initial intervention with respect to DSS25. HER2 status was latter incorporated to the model for patients treated with neoadjuvant chemotherapy26. This risk score system has also demonstrated added benefit of stratifying the prognostic outcomes of stage IV patients; however, patients treated with neoadjuvant chemotherapy or unknown sequencing of systemic therapy were excluded from the study27. This is in contrast to the present study in which all stage patients who received standard-of-care therapy were included, thus further strengthening the proposed model. Notably, PR status was not identified as an independent factor for DSS in the training cohort and thus, was not included in the Neo-Bioscore25. Noteworthy is that the Nottingham histologic grade (also termed the modified Scarff–Bloom–Richardson grade), when taking into account three histologic components (tubule formation, nuclear grade, and mitotic index), is the universally utilized breast cancer grading system, whereas nuclear grade itself was not readily available in the pathology reports for biopsies as well as resected specimens prior to the synaptic reporting era in most practices.

The revised 8th edition of AJCC used the National Cancer Data Base (NCDB) to establish Clinical and Pathological Prognostic Stage Groups. The inclusion of the Nottingham histologic grade, ER, PR, and HER2 status along with conventional TNM variables resulted in stage reassignment in over 40% of non-stage IV breast cancer cases. More importantly, the incorporation of these prognostic groups provide a marked improvement in categorizing survival outcomes when compared to anatomic stage6,7. It is important to note that PR status was utilized in the newly established Prognostic Stage Groups to further stratify ER-positive and ER-negative cases into different prognostic categories, further accentuating the prognostic value of this receptor8,28. The added values for PR in prognostic stratification have been further demonstrated in the current study using both SEER data and our institutional cohort.

The current study has some limitations due to its inherent nature. First, the models were built based on the retrospective SEER database and validated using our institutional cohort for which the sample size may be suboptimal for model validation. Thus, further validation in large patient populations from different registries are imperative. However, the fact that the prognostic score demonstrated similar, if not superior, discriminative power when comparing to the AJCC Clinical Prognostic Stage in the validation set further argue for the prognostic significance of the former. Second, the median follow-up in the SEER cohort was 53 months due to the lack of HER2 status prior to 2010. Thus, further validation with longer follow-up is necessary, especially in patients with early-stage/low prognostic score disease.

In conclusion, we proposed a simple breast cancer prognostic score based on the SEER database and further validated using an institutional tumor registry. This prognostic model demonstrated at least a similar predicting power for survival outcomes in all patients, including those with stage IV disease and those who underwent neoadjuvant chemotherapy, when compared to that of the Clinical Prognostic Stage Groups in the 8th edition of AJCC, the foremost prognostic model for breast cancer. While further validation is essential, this prognostic model provides a simple and alternative modality in clinical practice.