Validation of CTS5 model in large-scale breast cancer population and the impact of menopausal and HER2 status on its prognostic value

Clinical Treatment Score post-5 years (CTS5) is a promising prognostic tool to evaluate late recurrence risk for breast cancer. Our study aimed to validate its prognostic value in large-scale population and explore the impact of menopausal and HER2 status on CTS5 model. We performed a retrospective cohort study using the Surveillance, Epidemiology, and End Results (SEER) database. Survival analyses were conducted to assess the prognostic value of CTS5 in different breast cancer subgroups in terms of overall survival (OS) and breast cancer specific survival (BCSS) after five years. A total of 23,168 breast cancer patients with positive hormone receptor (HoR) were enrolled. Postmenopausal and premenopausal patients were 13,686 and 9,482, respectively. Taking CTS5 score as a continuous variable, it had significant positive correlation with poor prognosis beyond five years in both postmenopausal and premenopausal subgroups. Nevertheless, for HER2+ postmenopausal patients, the model has less effective prognostic value on long-term BCSS [HR1.177 (95%CI 0.960–1.443), p = 0.117]. Using CTS5 score as a categorical variable, HER2- patients with high-risk level revealed significant poor survival in terms of both BCSS and OS, irrespective of menopausal status. Our study showed the CTS5 model could be a useful prognostic tool for predict long-term survival in HoR+/HER2- patients. And further large-scale studies are warranted to assess its prognostic value for HER2+ patients and develop novel prediction model for late recurrence risk estimation.

*(0.093 *size − 0.001 * size 2 + 0.375* grade + 0.017* age). The model divided 5 to 10 years of long-term recurrence risk into three groups based on calculation results: low risk, <5%; medium risk, 5% to 10%; and high risk, >10%. However, the CTS5 model was developed based on retrospective data, its calibration and discrimination needed further external evaluation in large-scale breast cancer datasets. Given that the CTS5 model was build based on post-menopausal and HER2-patients, its prognostic value for the other subgroups remained undetermined. Therefore, the present study intended to use large-scale data from SEER database to assess the power of CTS5 as a prediction model and the impact of menopausal and HER2 status on its performance.
Late recurrence risk distribution of study population. Patients could be divided into three subgroups based on CTS5 risk score: low risk (<5%), intermediate risk (5-10%), and high risk (>10%). The distribution of risk categories in SEER cohort was shown in Table 2 ("postmenopausal" was defined as age ≥ 55). For premenopausal patients, it was noteworthy that all small tumors (<10 mm) were at low-risk group (97.4%). And the majority of node-negative patients (76.9%) were also classified as low-risk recurrent group, while 25.6% patients who had only one lymph node-positive was classified as a high-risk group, and 31.8% of 2-3 lymph node-positive patients were classified as intermediate and low-risk. For tumor grade, only 38.3% of patients with grade III were at high-risk for recurrence. For postmenopausal patients, 71.2% of node-negative patients were classified as low-risk, and almost all patients with ≥4 positive lymph nodes had high risk of recurrence. And 59.1% of patients Survival analysis for different subgroups. All the patients included were followed up more than 60 months (median 65 m). The overall survival (OS) rate and breast cancer specific survival (BCSS) rate were 99.2% and 99.3%, respectively. The survival curves with different CTS5 risk score were shown in Figs. 1 and 2 ("postmenopausal" was defined as age ≥ 55) and Supplementary Figs. S1 and S2 ("postmenopausal" was defined as age ≥ 60).
All the results above were using 55 years old to define menopausal status. And for setting 60 years as cutoff, CTS5 was also proved to a powerful prognostic tool (Supplementary Tables S3 and S4).

Discussion
Prolonging endocrine therapy may reduce the occurrence of recurrence and metastases beyond five years for HoR+ patients, but potentially increased risk of endometrial complications (including cancer), thromboembolic events, fractures, cardiovascular disease and other adverse effects [12][13][14][15] . CTS5 model was a promising prediction tool for assessing long-term recurrence risk, which could help us select appropriate patients for extending endocrine therapy. However, its prognostic value had not been verified in large-scale populations or subgroups. We carried out this study to validate the model by 23,168 patients from SEER database, and evaluated the impact of HER2 status and menopausal status on its prediction performance.
The present study validated that CTS5 had a good discriminating power for long-term recurrence risk of HER2-patients, irrespective of menopausal status. This was concordant with the study by Richman et al. 16 that CTS5 could be used for both pre-and post-menopausal patients. However, the HER2 status has a great impact on its performance. The high-risk patients generally had large tumors, poor tumor differentiation and lymph node metastasis. Parameters like age, tumor size, tumor grade, and lymph node were involved in the CTS5 model, among which lymph node involvement played an important role. In this study, almost all patients with lymph node metastasis ≥ 4 were classified as high-risk. So, the lymph node metastasis is a strong predictor of late recurrence, which is consistent with previous studies 17,18 . Compared with the original model population, the present study also included premenopausal patients. By Cox univariate analysis, it was proved that the model can distinguish the low, middle and high risk of long-term recurrence well in premenopausal patients with HoR+/HER2−, even better than in postmenopausal patients. So menopausal status did not significantly affect the model's performance, indicating its application area could be extended to premenopausal women. For patients who received chemotherapy, our data showed the CTS5 model did not have a strong discriminating power for postmenopausal women in terms of OS. It could probably be attributed to the absence of other independent prognostic indicators in the original model, such as KI67 19,20 . Thus, novel model that integrated immunohistochemical features and gene signature (such as IHC4 model) would be helpful to determine the late recurrence risk and deliver personalized medicine 21 .
Menopausal status had a great impact on clinical decision making, especially for HoR+ patients that needed endocrine therapy. However, the definition of menopause remained controversial. Generally, age ≥ 55 was considered to be post-menopausal. According to this standard, the present study proved that CTS5 model was effective for both pre-and post-menopausal HoR+/HER2− patients. To further validate the above results, another cutoff using age ≥ 60 according to NCCN guideline was adopted 22 . This criterion had higher specificity for diagnosis of menopause than the previous one, indicating the postmenopausal subgroup was more homogeneous and could provide more reliable conclusion. Using either age ≥ 55 or 60 as cutoff, CTS5 was proved to have good www.nature.com/scientificreports www.nature.com/scientificreports/ performance to evaluate late recurrence risk irrespective of menopausal status. Additionally, it was of note that menopause was a complex biological phenomenon and age as a single indicator may not be sufficient to clearly define menopausal status. It may inevitably introduce bias to the final conclusion and a prospective large-scale population study may be needed for further validation.
HER2 was generally considered as an important prognostic indicator. The present study also evaluated the impact of HER2 status on CTS5 model and there was no significant correlation between CTS5 score and survival of postmenopausal HER2+ patients. Although CTS5 as categorical variable had certain prognostic value in HER2+ premenopausal women, it should be cautious to apply this finding to clinical practice. HER2 itself is an www.nature.com/scientificreports www.nature.com/scientificreports/ independent prognostic risk factor, indicating a high degree of malignancy and worse prognosis [23][24][25] . It may mask the effects of the other parameters, especially for late recurrence, with the existence of plenty of confounding factors. Meanwhile, the advance of HER2 targeted therapy such as trastuzumab, even dual-target therapy has largely improved the prognosis of HER2+ patients. Thus, risk prediction models only included common parameters, such as age, tumor size, lymph node status and tumor grade, may not be sufficient to serve as an effective prediction model for HER2+ patients. Further studies may need more homogeneous patient cohort and more clinical meaningful parameters to develop powerful prediction models for HER2+ patients.
For determination of the optimal endocrine therapy duration, genetic testing may provide additional valuable prognostic information on late recurrence. For instance, Oncotype DX recurrence score, using reverse transcription PCR (RT-PCR) assays incorporates 21 genes (including six housekeeping genes) related to proliferation, survival, invasion and estrogen receptor signaling 26 . Although it was widely used for early recurrence evaluation, retrospective study using TransATAC data proved the performance of Oncotype DX alone for predicting late recurrence risk was not satisfactory 18,27,28 . Another useful genetic tool, PAM50, combines the expression levels of 50 genes and tumor size to define the breast cancer intrinsic subtype and provide proliferation information by proliferation-related genes 29 . Studies showed that PAM50 exhibited potential to predict both early and late recurrence risk up to 10-year recurrence 28,30 . However, this result was limited to postmenopausal, HER2-patients. Sestak et al. 31 demonstrated the prognostic value for PAM50 of late recurrence in a combined analysis of the ATAC and ABCSG 8 trial populations. The prognostic performance of PAM50 plus CTS5 was superior to CTS5 alone. Future studies should integrate prognostic information from multiple dimensions, such as demography, clinicopathological parameters, gene signature, epigenetics and so on, to develop comprehensive and precise prediction models.
Our study also has several limitations. First, due to the retrospective nature of the present study, selection bias could not be totally eliminated. The mismatch of baseline characteristics could not be totally justified with multivariate analyses. Secondly, the SEER database did not incorporate treatment information regarding adjuvant/ neoadjuvant therapies and duration. Different treatment options served as an important confounding factor, especially for ovarian function suppression. Thirdly, the definition of menopause remained controversial and SEER registry did not include patient menstrual status, it may potentially introduce bias. And we used two cutoffs (55 and 60 years) to evaluate the impact of menstrual status on CTS5 performance, and the final conclusions of   www.nature.com/scientificreports www.nature.com/scientificreports/ the two grouping methods were comparable. Finally, the median follow-up time was 65 months, it was not long enough to evaluate late recurrence and validate the benefit of extending endocrine treatment for 10 year. Since only patients after 2010 had HER2 records, this was one of the reasons why the present study only included patients between 2010 and 2013 to ensure at least five years follow-up. Future studies with longer observation period may provide more robust evidence for late recurrence.
In conclusion, our study proved the CTS5 model is a useful tool to evaluate late recurrence risk in HoR+/ HER2-negetive patients, irrespective of menopausal status. Further large-scale studies are warranted to assess its prognostic value for HER2+ patients and develop novel prediction model for late recurrence risk estimation.

Methods
Study population. Female patients with invasive breast cancer in the SEER database from 2010-2013 who had no distant metastasis and had been followed up for ≥5 years were included. The SEER database includes morbidity and survival data routinely collected from multiple population-based cancer registries 28 . SEER* Stat version 8.3.5 was used to generate the data sheet including individual cancer records, patient characteristics, and the following variables: patient identification number, year of diagnosis, age, ethnicity, tumor size, nuclear grade, lymph node metastasis status, TNM Staging, estrogen receptor (ER) status, progesterone receptor (PR) status, HER2 status, radiotherapy, cause-specific death classification, other causes of death, survival months, marital status, radiotherapy and chemotherapy. Patient record without lymphatic involved data, tumor size or tumor grade were excluded (Fig. 3). Subgroup analyses were carried out based on menopausal status (menopausal was defined as age ≥ 55 years) and HER2 status.

Outcome of interests.
We calculated CTS5 score for each patient using the equation of CTS5 = 0.438 * nodes + 0.988 *(0.093 *size − 0.001 * size 2 + 0.375* grade + 0.017* age) and then divided into three groups using the cutoff points for low-(CTS5 <3.13 for risk <5%), intermediate-(3.13 to 3.86 for risk of 5%-10%), and high-risk (CTS5 >3.86 for risk >10%)groups from this model 9 . The primary endpoint was breast cancer-specific death. SEER defines mortality data on the basis of the International Classification of Diseases Revisions 8-10 32-34 . The SEER cause of death recode was used to categorize the death as breast cancer specific death, other cancer death, death as a result of heart disease, or noncancer cause of death. The OS and BCSS were calculated as the time period from the date of diagnosis until the last date for which completed vital status data were available. The data regarding deaths were ascertained from death certificates that are coded by state health departments and/or state vital records for each SEER region 35 . Statistical analysis. The demographical and clinicopathological variables such as age, histological grade, tumor size, N-stage, chemotherapy and radiation therapy were assessed by t-test for continuous data and Pearson Chi-square test for categorical data. Kaplan-Meier method and Cox proportion hazard regression were used to perform survival analysis. BCSS was defined as the time between breast cancer diagnosis and death due to breast cancer, while OS was the period between diagnosis and death due to all causes (including breast cancer). Statistical analyses were performed using SPSS statistical software version 22 and R (3.6.0) software. All the statistical tests were two-sided, and statistical significance was defined as p value < 0.05.