A comparison of Chinese multicenter breast cancer database and SEER database

There are different characteristics of BC in developing countries and developed countries. We intended to study the factors which influence the survival and prognosis of BC between southern China and the United States. (a) To study the two groups BC patients in southern China from 2001 to 2016 and SEER database from 1975 to 2016. (b) To register, collect and analyze the clinicopathological features and treatment information. Our study found that there are significant differences in tumor size, positive lymph node status and KI-67 between southern China and SEER cohort (P < 0.000). The positive lymph node status may be one of the causes of difference of morbidity and mortality of BC patients in China. Furthermore, the differences in treatment methods may also account for the differences between China and seer databases.

www.nature.com/scientificreports/ There are different researchers in the world comparing BC in China and other regions, and finding that there exist some differences between them. However, the comparison of BC patients between southern China and the United States has not been reported. This study aims to investigate the differences of BC patients between China and the population-based Surveillance, Epidemiology, and End Results (SEER) cohort. In addition, our study intends to examine the age, stage and grade of tumor, ER, PR, HER2, KI-67 and treatment methods, in order to analyze the age distribution, clinical characteristics, treatment and prognosis of BC patients in Chinese multicenter breast cancer database and SEER database, so that we could compare the two groups.

Methods
Patients and ethics. We conducted a retrospective analysis and comparison of the patients who have been diagnosed with primary breast cancer in southern China (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016) and SEER database . Overall, there was a total of 525 breast cancer patients were diagnosed in southern China, among them, 15 patients were excluded from this study due to lack of age information. In addition, 129 patients were removed, which without tumor stage, ER, PR, HER2, KI-67 and treatment information. Additionally, there are about 95 patients were lost. Finally, a total of 286 patients were included in the study (Fig. 1). The study was approved by Institutional Review Board of Yunnan Cancer Hospital, Cancer Hospital Affiliated to Guangxi Medical University and Foshan first people's Hospital. Informed consent was obtained from all individual participants included in the study. All procedures implemented in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee, and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. The SEER cohort was derived from the SEER database (November 2018 submission) by using SEER*Stat software provided by the National Cancer Institute (NCI). There were 65,535 breast cancer patients, among them, 26,277 cases were lost. Except for without complete information (age, stage and grade of tumor, ER, PR, HER2, KI-67 and treatment) patients, which was about 38,662 cases, there were 596 patients included in this study (Fig. 1).
Clinical data collection. A retrospective review of medical records and pathology reports was conducted.
Staging was performed according to the American Joint Committee on Cancer (AJCC) guidelines 22 . The age of the patients was classified to young adult group (< 40), middle aged group (40-70) and aged + group (> 70), and then, calculated the median age of each age group (35,48,75), finally, statistical analysis was carried out respectively. A cutoff of 14% for KI-67 was used, which was recommended by 2011 St Gallen consensus panel 23 , and then, we divided the group of KI-67 ≥ 14% into two subgroups, according to the median (51.7% in southern China). Patients in southern China were told to have an examination and treatment according to the guidelines  www.nature.com/scientificreports/ of the breast cancer center and were followed up by telephone, and collect the information about survival and treatment, including date of progression metastasis, date of relapse and date cause of death.
Statistical methods. The IBM SPSS Statistics (Version 21.0; IBM Corp., New York, USA) and GraphPad Prism (Version 6.0; GraphPad software, Inc., LaJolla, CA, USA) were used for statistical analysis. Disease-free survival (DFS) was measured from the beginning of the operation to the first recurrence/metastasis of the tumor or the death of the subject for any reason (the last follow-up time was the patients who lost the follow-up; the patients who were still alive at the end of the study were the end of the follow-up). Overall Survival (OS) was measured from the beginning of operation to death due to any reason. Univariate analysis and multivariate analysis was performed by Cox regression analysis, according to comparing the age (< 40 and ≥ 40), tumor size (≤ 2 cm and > 2 cm), node status, ER, PR, HER2, KI-67, surgery and radiation. Kaplan-Meier method was used to estimate DFS and OS, log-rank test was used to compare the patients with different clinicopathologic characteristics. The count data were tested by χ 2 test, fisher exact probability method was used when the cases was less than 6. Statistical significance was set at a P < 0.05, P < 0.01 had significant difference.
Ethics approval and consent to participate. All (Table 1). According to TNM stage, BC patients were divided into Tx, Tis and invasive group (including ≤ 2 cm, > 2 ≤ 5 cm, > 5 cm), among them, invasive group account for the most, there were 448 (89.96%) patients in southern China and 1364 (2.99%) patients in SEER cohort, P = 0.000. In southern China, the tumor size was 2-5 cm accounted for the most, were 323 (64.86%) patients. However, there were too many data were missing, were 64,071 (97.8%) cases, Tis subgroup in the two was both 1 case (Table 1). There were 257 (50.39%) patients with node metastasis in southern China and 296 (0.45%) cases in SEER cohort. Among the two groups, no lymph node metastasis accounted for the most, were about 206 (40.39%) patients and 1145 (1.75%) patients. While, there were too many data of lymph node status missing, were 29,269 (44.7%) patients (Table 1). Comparing southern China and SEER database, there was statistical significance in tumor stage (P = 0.000). Among them, the proportion of stage 2 was the highest, were 262 (51.37%) cases and 14,936 (22.81%) cases respectively. Next, it was stage 3, were 133 (26.08%) cases and 12,993 (19.75%) cases. However, there were many missing data about tumor stage in SEER database, were about 29,269 (44.7%) cases ( Table 1). The expression of ER was counted in both southern China and SEER cohort (P = 0.000). Among them, ER (+) accounted for a higher proportion, were 290 (56.86%) cases and 20,233 (65.34%) cases respectively, and ER (−) were 178 (34.9%) cases and 5563 (17.96%) cases, respectively (Table 1). Similarly, the differences in expression of PR in southern China and SEER cohort was also statistically significant (P = 0.000). Among them, PR (+) was higher in both southern China and SEER cohort, were 270 (52.94%) cases and 179,194 (55.55%) cases, PR (−) were 182 (35.69%) cases and 8110 (26.2%) cases (Table 1). While, the expression of HER2 was different between the two groups (P = 0.000). The www.nature.com/scientificreports/ expression of HER2 (+) accounted for the most in southern China, was 283 (55.49%) cases, but there was only 169 (9.46%) cases in SEER cohort, with the proportion of PR (−) was high, was 1530 (85.62%) cases (Table 1). Additionally, the expression of KI-67 was different in the two groups, there were 365 (71.57%) cases of KI-67 (+), among them, it was the most between 14 and 51.7%, was 170 (16.58%) cases. However, there was no data about the expression of KI-67 in SEER cohort (Table 1).  Fig. 2A,B). Secondly, since the data of southern China was only included from 2001 to 2016, a layered statistic was used to count the DFS or CSS and OS in both southern China and SEER database from 2001 to 2016. Among them, in the first 70 months of follow-up, DFS in southern China was higher than CSS in SEER cohort, and then, CSS in SEER cohort was significantly higher than that in southern China (P = 0.035), and OS in this period also has significant statistical different (P = 0.000), SEER cohort was significantly higher than that in southern China (Fig. 2C,D). Finally, SEER cohort was analyzed in stages, dividing into 1975-2000 and 2001-2016 two subgroups, furthermore, CSS and OS of each subgroup were counted respectively. The results showed that, in SEER cohort, the DFS or CSS and OS of 2001-2016 were significantly higher than 1975-2000 (P = 0.000) (Fig. 2E,F). www.nature.com/scientificreports/ We analyzed the influence of different clinicopathological features on survival and prognosis of BC patients in southern China. By analyzing the effects of age, tumor size, lymph node status, ER, PR, HER2, KI-67, surgery and radiotherapy on the prognosis of breast cancer, we found that tumor size, positive lymph node status and KI-67 expression affected OS of BC patients in southern China, which showed significant statistical difference (P = 0.018, P = 0.000, P = 0.034 respectively) ( Supplementary Fig. 1). We further analyzed and compared the effect of different tumor size on survival of different BC cohorts. There were statistical differences of DFS or CSS and OS in SEER cohort and southern China when the tumor size (T) > 2 cm (P = 0.01 and P = 0.04), however, DFS or CSS and OS were not statistically different in the two groups when T ≤ 2 cm (P = 0.188 and P = 0.604) (Fig. 3A,D). Secondly, the effects of different tumor sizes on the survival of BC patients in each cohort were analyzed separately. Among them, tumor size had little effect on DFS in southern China (P = 0.487), but for OS, there was significant statistical difference, OS in T > 2 cm group was significantly lower than T ≤ 2 cm (P = 0.012) (Fig. 3E,F). However, for SEER cohort, CSS and OS of T > 2 cm group were slightly lower than that of T ≤ 2 cm group, but there was no statistical difference (P = 0.738 and P = 0.299) (Fig. 3G,H). We analyze and compared the effect of different node stage on survival of different BC cohorts. Positive-node affected DFS or CSS and OS in both southern China and SEER cohort (P = 0.000 and P = 0.044). Meanwhile, negative-node also affected DFS or CSS and OS in the two groups (P = 0.000 and P = 0.000). OS of SEER cohort with different lymph node status was higher than that of southern China (Fig. 4A-D). Analyzing southern China and SEER cohort separately, DFS or CSS and OS of positive-node were lower than negative-node, among them, OS of lymph node status has significant statistical difference (P = 0.000), but DFS or CSS of lymph node status has no statistical difference  www.nature.com/scientificreports/ (P = 0.448) (Fig. 4E,F). But fcer SEER cohort, CSS and OS of positive-node was slightly higher than negative-node, while there was no statistical difference (P = 0.226 and P = 0.087) (Fig. 4G,H). We analyzed and compared the effect of expression of KI-67 on survival of southern China. Among the subjects included in this study, DFS and OS of KI-67 < 14% both higher than ≥ 14%, there were significant statistical difference (P = 0.05 and P = 0.034) (Fig. 5A,B). Multivariate analysis and univariate analysis of southern China and SEER cohort was performed by Cox regression analysis. In univariate analysis of DFS, T > 2 cm, positive-node, ER (+), PR (+), HER2 (+), surgery and radiation all had no significant influence on the increased risk of death. Among them, the hazard ratio (HR) of KI-67 high expression group was 1.376, 95% CI 1.000-1.894, P = 0.050. However, in multivariate analysis of DFS, all the clinicopathological features of the included studies were statistically significant ( Table 2). Multivariate analysis and univariate analysis of southern China was performed by Cox regression analysis. In the univariate analysis of OS, it was significantly associated with increased risk of death and T > 2 cm (HR 3.406, 95% CI 1.232-9.417, P = 0.018), positive-node status (HR 0.308, 95% CI 0.169-0.564, P = 0.000) and KI-67 high expression (HR 2.128, 95% CI 1.057-4.285, P = 0.034). In the multivariate analysis of OS, positive-node status (HR 0.226, 95% CI 0.098-0.519, P = 0.000) was significantly associated with increased risk of disease survival ( Table 3).

Comparison of treatment.
There were differences of treatments between southern China and SEER cohort ( Table 4). We analysed both of the two databases from 2001 to 2016, and there were 389 (97.01%) patients received chemotherapy in southern China, but there were 1574 (26.64%) patients in SEER cohort have received chemotherapy (P = 0.000). However, the treatment of surgery was similar, there were 387 (95.09%) patients performed mastectomy in southern China cohort (including simple resection and modified radical operation), and for SEER cohort, there were 5524 (93.50%) patients had breast surgery (the specific operation method is not clear). However, it was the same of the two about whether to receive radiotherapy or not, and there was a significant statistical different (P < 0.001). Among them, there were 351 (72.82%) patients had not performed radiotherapy in southern China, with 2223 (37.63%) patients in SEER cohort.

Changes in morbidity with years.
We further analyzed and compared the age distribution of BC patients in different years in SEER cohort. Among them, except for 90's, the proportion of BC patients in young adult group (< 40 years old) showed an increasing trend with age. They were respectively: 70's: 6%, 80's: 6.31%, 90's:   (Fig. 6).

Discussion
Our research is a very important one, which is the first to analyze and compare the related factors of survival and prognosis of BC patients in southern China and SEER cohort. We analyzed the multiple factors: including age, tumor stage and grade, ER, PR, HER2, KI-67, surgery and radiotherapy, which may influence the survival and prognosis of BC. It is a multi-regional and big data clinical study. www.nature.com/scientificreports/ In this study, by comparing and analyzing the age of both southern China and SEER cohort, we found that in southern China from 2001 to 2016, there was about 19.8% BC patients were under 40 year, which was the same as the results of Wang, who had reported that the incidence of young BC patients in China is about 21.97% 24 . However, the morbidity of young BC patients in China is significantly higher than that in western countries (about 4-6%) [25][26][27][28] , which was similar to the incidence of our study: it was 6.14% of SEER cohort from 1975 to 2016. All of these suggested that the incidence of BC in China is younger than that in western countries, which indicated that age may be a factor affecting the survival and prognosis of BC patients in southern China and the United States.
To further study DFS and OS, we focused on T stage, positive lymph node status, ER, PR, HER2, KI-67 expression of BC patients, and thought that T stage, positive lymph node status and KI-67 expression all could be regarded as factors, which affected the survival and prognosis of BC patients. Other scholars had also studied tumor stage, and found that there were about 60-70% of BC patients were diagnosed with stage 1, which was higher than Asian countries, but there were only about 10% women were stage 4 29 . This research was similar to our results, our finding showed that most BC patients in southern China from 2001 to 2016 could be diagnosed at early time, among them, there were about 64.86% of patients with T2 BC, 40.39% of patients with N0, 51.37% of patients with stage 2. However, the early diagnosis rate of BC in China is far lower than that in the United States. It showed that in SEER cohort, there were about 65.96% of patients with T1, 79.46% of patients with N0 and 41.18% of patients with stage 2. Cox regression analysis showed that T stage and positive lymph node status were important factors affecting OS of BC. Therefore, we further to prove that stages and grades of tumor had a significant impact on the survival and prognosis of BC. Meanwhile, China should to further strengthen the early diagnosis and treatment of BC, so as to improve the prognosis of BC patients in China.
We further explored the effects of ER, PR, HER2, KI-67 expression on survival and prognosis of BC. The proportion of ER (+) BC patients was similar in both southern China and SEER cohort. It was 56.86% in southern China and 65.34% in SEER cohort, which was slightly lower than that had been reported (about 70%) 30 . It may be related to excessive data deletion in SEER cohort, which was about 16.7% of ER data were missing in this study.
Other studies have showed that the most important factors affecting the prognosis of BC were tumor grade and ER status 31 . However, in our study, ER was not an indicator of survival and prognosis of BC. Additionally, there were different treatment methods of BC according to the different status of hormone receptor (HR). Endocrine therapy could be used for ER or PR positive patients, but the effect of chemotherapy was not as good as these of negative patients, and the different treatment methods could significantly affect the prognosis of BC. PR was also an important factor affecting the prognosis of BC. In our study, there was 52.94% of PR positive in southern China, and 55.55% of that in SEER cohort, there was significant statistical difference between the two groups (P = 0.000). Similarly, the results of Cox regression analysis showed that PR was also not an indicator of survival and prognosis of BC, which may had a relationship between a large data missing. In addition, Ding et al. found that BC with HER2 and KI-67 overexpression had higher lymph node metastasis rate and higher AJCC tumor stage 32,33 , which was similar to our results. In this study, HER2 positive were 55.49% and 9.46% in southern China and SEER cohort respectively (P = 0.000), which was consistent with the literature showing that BC cells from young patients are more likely to show HER2 positive expression 24 . KI-67 positivity was 70.01% in southern China, a high expression is an important factor affecting OS in BC patients. However, DFS was detected by χ 2 test when KI-67 was regarded as an independent factor, P = 0.05, but we believe that this was mainly due to the small sample size, the trend in the conclusion was still valid, as the sample size continues to increase, the value of P may gradually decrease. In summary, our results showed that positive-node status was an important factor affecting the prognosis of BC, which also reflected that BC patients in southern China and the United States have different biological behaviors and pathogenesis.
Additionally, the treatment methods of BC were also important factors affecting its prognosis. At present, the main treatments of BC were surgery, radiotherapy, chemotherapy, targeted therapy and hormone therapy 34,35 . Among them, surgery can significantly reduce the mortality rate, which is the most critical step in the treatment of breast cancer, there are five common surgical methods: breast conserving surgery (BCS), simple mastectomy (SM), modified radical mastectomy (MRM), radical mastectomy (RM) and extensive radical mastectomy (ERM) 36 . Among them, Bartelink et al. reported that BCS has the equivalence with mastectomy 37 . However, the comparison of treatment methods between southern China and the United States has not yet reported, we had studied this for the first time. In this study, BC patients enrolled in the study in southern China all underwent surgery, and the treatment including BCS and mastectomy (including SM and MRM). Among them, there were about 95.09% of BC patients performed mastectomy and 4.91% of that performed BCS, and the implementation rate of BCS was significantly lower than that of developed countries, which was similar to the results of Gupta A: the highest implementation rate of BCS in China is only 8.6% 38 . However, there were only 18.24% of BC patients performed mastectomy in SEER cohort, 81.76% of that had not. While, there was no significant effects on DFS and OS for whether performing surgery, which suggesting that surgery had little effect on the survival and prognosis of BC. But for SEER cohort had not include the details of surgical procedures, so its' effects on survival and prognosis of BC were not studied and analyzed, which may be a potential factor. In recent years, a number of large randomized trials have shown that radiotherapy could significantly reduce the local recurrence of BC, so as to improve the breast preservation rate and obtain good survival rate 38 . However, in this study, the radiotherapy rate of BC patients in southern China and SEER cohort were both not high, were 27.18% and 34.22% respectively, and there were significant statistical different between the two (P = 0.001). Nevertheless, the results of Cox regression analysis showed that radiotherapy had no significant effect on both DFS and OS, which may be related to a low radiotherapy rate, but it should be further studied and analyzed. Besides, chemotherapy was also an important treatment for BC, we found that most of BC patients in southern China are treated with chemotherapy, which was approximately 97.01%. But there were only about 24.08% of BC patients had received www.nature.com/scientificreports/ chemotherapy in SEER cohort (P = 0.000), which indicated that there are great differences in BC treatment between China and the United States. Additionally, with increasing age (from 70's to 90's), the proportion of young BC patients in SEER cohort increased gradually, while the proportion of elderly patients decreased gradually. Furthermore, the median age at diagnosis was relativly unchanged, they were 36 year, 56 year and 77 year respectively, which may be related to the early diagnosis of BC.
There are some limitations to this study. The data from southern China cohort are from Yunnan Cancer Hospital, Afiliated Tumor Hospital of Guangxi Medical University and The First People's Hospital of Foshan, these data may therefore be slightly different from that of the National Cancer Registry System. In addition, the presence of missing data and limited follow-up time can be considered weaknesses of this study. Other limitations include the lack of data on surgical procedures, KI-67 statuses and endocrine therapy in SEER cohort, which limited the analysis of their influence on patients' survival improvements.

Conclusion
In conclusion, our study suggested that positive lymph node status may cause the difference of morbidity and mortality of BC patients in China. Furthermore, the differences in treatment methods may be the main reason for the differences between China and seer databases.

Data availability
The raw data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.