Introduction

Medullary carcinoma of the breast (MCB) is a rare histology type of invasive breast cancer, accounting for 3–5% of all breast cancers. The histopathologic features of MCB include lymphoplasmacytic infiltration, noninvasive microscopic circumscription, syncytial growth pattern >75%, and grade 2 or 3 nuclei1. Immunohistochemical results showed that MCB presented with more absence of estrogen (ER), progesterone (PR), and human epidermal growth factor receptor 2 (HER2)2. Compared with invasive ductal carcinomas of breast cancer (IDC), MCB also exhibited with more BRCA1/2 mutations, greater T stage and tumor size3.

Despite the aggressive features, the prognosis of MCB was conflicting. Studies found MCB showed a better survival than IDC1,3,4,5,6, while other researches showed MCB (IDC) had a similar outcomes with IDC7,8,9. However, since MCB is a rare type of breast cancer, most of these studies were involved with limited samples. Surveillance, Epidemiology, and End Results (SEER) database of the National Cancer Institute is a national collaboration program of Unite State, covering nearly 26% American population’s cancer incidence and survival data, which could provide relatively large sample size for survival study. There were two SEER-based studies associated with MCB prognosis7,10. One study7 only focused on short-term survival status. The inadequate follow-up time might lead to skewed results. The other study10 evaluated the prognostic factors of long-term MCB overall survival (OS) by Cox proportional hazards model. However, OS might unable to accurately describe long-term survival of disease such MCB, as the deaths could be caused by other competing risk events. In this case, the cumulative incidence function (CIF) model11 and Fine-Gray regression for proportional hazards modeling of the subdistribution hazard (SH) model12 should be used.

Duo to the limitation of previous studies, we designed this competing risk model-based study, which involved with SEER data, to evaluate whether MCB has a better prognosis than with IDC in early resectable breast cancer patients, and to evaluate the long-term prognostic value of different clinical factors in MCB. Furthermore, since there was no nomogram drawing for the prognosis of MCB patients, the current study also constructed a competing risk nomogram for MCB patients.

Results

Cohort selection

After selection, we involved 3,580 MCB patients and 319,566 IDC patients (Table 1). There were significant differences between MCB and IDC among age, race, location, grade, tumor size, tumor stage, number of positive regional nodes, ER or PR status, and chemotherapy experience (p < 0.001, Table 1). Compared with IDC, the MCB patients presented with younger age (age < 50, 47.4% vs. 28.8%), more African Americans (21% vs. 9.3%), higher grade (grade III and IV, 63.8% vs. 39.9%), larger tumor size (>2 cm, 51.2% vs. 34.9%), higher proportion-Adjusted AJCC 6th Stage II (54.8% vs. 36.8%), less positive regional nodes (no positive node, 71.9% vs. 65.3%), more ER negative (78.5% vs. 24.8%) and PR negative (81.7% vs. 34.4%) rate, and more experience of chemotherapy (61% vs. 46.2%). The median follow-time were 143 months (interquartile range [IQR], 94 to 204 months) for MCB and 107 months (IQR, 73 to 155 months) for IDC. The median age of diagnosis was 50 years (IQR, 43 to 60 years) for MCB and 57 years (IQR, 48 to 67 years) for IDC. The rate of breast cancer specific mortality (BCSM) and other causes of death were 11.06% and 14.19% for MCB and 13.14% and 15.8% for IDC, respectively. The proportion of deaths due to cancer and other causes in each variable was listed in Supplemental Table 1, from which we observed the deaths caused by other reasons increased rapidly as the age raised.

Table 1 The characteristic of each involved variable in MCB and IDC.

MCB showed better outcomes than IDC

As shown in Table 2, We found IDC had a higher 5-years cumulative incidences of death (CID) of both breast cancer specific survival (BCSS) (p < 0.001) and other causes of deaths (p < 0.001) than the MCB (Fig. 1). The multivariate SH model found IDC had worse BCSS than MCB (Hazard ratio (HR) > 1). Furthermore, since statistical matching methods could lower the differences between groups on confounding variables and made cancer observation studies in somehow be considered as a quasi-experimental study, we performed a coarsened exact matching (CEM) between MCB and IDC by matching all the included variables. The standardized difference in means of all variables between MCB and IDC was eliminated and the histograms of IDC and MCB looked much more similar after CEM (see Supplementary Fig. S1-2 online), indicating our matching was successful. Matched 2,307 MCB patients and 24,398 IDC patients were further analyzed by CIF test and multivariate SH model. The new results remained as pre-matched analyses (Table 2). These multi-face results showed IDC had worse BCSS than MCB in resectable breast cancer patients.

Table 2 The results of CIF test and multivariate SH model between MCB and IDC prognosis before and after CEM.
Figure 1
figure 1

Cumulative incidence plots of competing risk of breast cancer specific deaths and other causes of deaths for cohort of MCB and IDC.

Multivariate SH Analysis of BCSS in IDC and MCB

The prognostic value of each involved variable for IDC and MCB BCSS was listed in Table 3. Larger tumor size, a greater number of positive regional lymph nodes and unmarried status were found to be risk factors for BCSS of both IDC and MCB (HR > 1, P < 0.05). Meanwhile there were prognostic discrepancies among other variables between IDC and MCB. We found the location of nipple was a risk factor for MCB (Upper-outer quadrant vs. nipple, HR < 1, p = 0.043) but was a protective factor for IDC (Upper-inner quadrant or lower-inner quadrant vs. nipple, HR > 1, p < 0.05). Higher grade showed a worse BCSS of IDC (HR > 1, p < 0.001). However, higher grade showed no association with MCB BCSS (III/II vs. I, p > 0.05) or even had a better BCSS (IV vs. I, HR < 1, p = 0.013). In addition, we found age, race, Breast-Adjusted AJCC 6th Stage, ER and PR status and treatments of radiotherapy or chemotherapy were significantly correlated with IDC (p < 0.05) but not MCB (p > 0.05). Specifically, we found older age (40–69 vs. 20–29) was a protective factor (HR < 1, p < 0.05) of IDC. Compared with Caucasians, the African American and American Indian/Alaska Native had worse outcomes (IDC, HR > 1, p < 0.05). The Asian or Pacific Islander showed a better BCSS than Caucasians (IDC, HR < 1, p < 0.001). Higher stage increased the BCSS of IDC (HR > 1, p < 0.001). Positive ER or PR status and radiotherapy or chemotherapy lower the BCSS of IDC (HR < 1, p < 0.05). There was no association between laterality and BCSS in both IDC and MCB (p > 0.05).

Table 3 The results of multivariate SH model of each variable for MCB and IDC.

Nomogram development and validation

A penalized variable selection was performed for our SH model. As shown in Table 4, the LASSO, SCAD, and MCP analyses all identified tumor size, tumor stage, regional positive nodes number and marital status as the key variables in our SH model. The multivariate SH based nomogram was then constructed with these selected variables. A weighted total score calculated from each variable was used to estimate the 5-year and 10-year cause-specific death (Fig. 2). Time dependent the area under the curve of receiver operating characteristic curve (AUC, which was also referred as C-statistics) plots showed moderate discrimination of our nomogram (Fig. 3a). The AUC were 0.724 (95%confidence index (95%CI) = 0.671–0.767) and 0.698 (95%CI = 0.654–0.741) for 5-year and 10-year predictions, respectively. The brier score plot showed our nomogram model had lower score against the null model (Fig. 3b). The brier score was 0.0605 (95%CI = 0.0519–0.0698) and 0.0805 (95%CI = 0.0710–0.0907) for 5-year and 10-year predictions, respectively, indicating that our nomogram had good prediction. The calibration plots showed our nomogram predicted well in patients estimated at lower risk of deaths but not in patients estimated at high risk of deaths, which might come from that few patients were estimated with higher risk of death (Fig. 3c,d).

Table 4 Variable selection: Estimated coefficients (SEs) for the current SH model.
Figure 2
figure 2

SH model-based nomogram for predicting 5- and 10-year risk of death of MCB patients. The nomogram is used by summing the points identified on the top scale for each independent variable and drawing a vertical line from the total points scale to the 5- and 10-year (60 and 120 months) risk of death. The total points projected to the bottom scale indicate the % probability of the 5- and 10-year risk of death. Tumor stage, 1 = Stage I, 2 = Stage II, 3 = Stage III and 4 = Stage IV; NO. Nodes, the number of positive regional lymph nodes; Marital status: 0 = married; 1 = widowed or single (never married or having a domestic partner) or divorced or separated.

Figure 3
figure 3

The discrimination and calibration of the nomogram of current study. (a) the time dependent AUC plot; (b) the time dependent brier score plot; c, the 5-year calibration plot; d, the 10-year calibration plot.

Discussion

Most clinical cancer survival studies used Kaplan-Meier and Cox proportional hazards models to evaluate the prognostic value of their interests. Both models consider that there is a single cause to the event of interest, such as death. However, there were actually more causes (which are known as competing events) of deaths during the disease management, especially for cancers with long survival time such as breast cancer. Hence, the competing risk should be considered in the survival analysis. The current study was the first one to use competing risk models (CIF and SH models) to analyze the prognostic value of clinical variables to the BCSS of MCB and IDC. Our results showed IDC had a worse BCSS than MCB, which was consistent with previous studies4,13,14,15. Larger tumor size, a greater number of positive regional nodes and unmarried status were found to promote the progression of both MCB and IDC. We also found there were many discrepancies between IDC and MCB prognosis in variables included age, race, tumor location, tumor grade, tumor stage, ER and PR status, and records of radiotherapy or chemotherapy. This is currently the largest sample size study on MCB prognosis. Furthermore, we constructed the first competing model-based nomogram for estimating the 5-year and 10-year risk of death of MCB patients.

MCB patients were found to have younger age than IDC3. Our study with larger scale sample size confirmed this result. Younger patients exhibited a worse BCSS in IDC but there was no association between younger age and MCB BCSS.

Our results showed race played important parts in IDC. African American showed a relative worse outcome while patients from Asia exhibited better prognosis. However, there was limited influence of races in MCB. There was a study showed African American had a worse OS than Caucasians in MCB10. This study did not exclude severe patients that had no surgery chance. In the other hand, we only included early diagnosed patients who were resectable. We speculated African American might had more severe MCB patients which caused this difference.

Previous studies found MCB patients had higher grade, higher tumor stage and greater tumor size but meanwhile had favorable long-term distant relapse-free survival2,3. Our study showed similar results that MCB patients exhibited higher grade, higher tumor stage and larger tumor size than IDC. However, there was limited association of grade or tumor stage and MCB prognosis. Moreover, only tumor size over 30 millimetres had a significant worse outcome compared with tumors less than 10 millimetres in MCB. Increasing number of positive lymph nodes associated with worse OS and BCSS of early diagnosed breast cancer patients16. Our study also proved more positive lymph nodes would promote the cancer progression of both MCB and IDC patients.

Previous studies showed marriage was associated with the improvements in cardiovascular, endocrine, immune function, and cancer prognosis17,18,19. A study found that even after adjusting for known confounders, unmarried patients are at significantly higher risk of presentation with metastatic cancer, undertreatment, and death resulting from their cancer17. Having high levels of perceived social support, larger social network, and being married were found to be associated with decreases in relative risk for cancer mortality of 25%, 20%, and 12%, respectively20. We found the married status would improve the prognosis of MCB and IDC, which provide new evidences to the association between social support and cancer outcomes.

Positive ER or PR status were rarely found in MCB9,10,15,21,22. Our study found MCB had fewer ER or PR positive status than IDC. ER or PR positive status was often considered as good prognostic factors for breast cancer as the use of hormone treatment might help for certain patients23. Indeed, ER or PR positive status was found to be associated with better outcomes in IDC in our study. However, we found there was no association between ER or PR status and MCB prognosis. A previous study24 stratified breast cancer patients into 4 gene based clusters using hierarchical cluster analysis, among which Cluster B and Cluster A were both ER/PR positive and HER2 negative breast cancers. However, it found that Cluster B exhibited a worse cancer-specific survival than Cluster A. Moreover, previous study also found there were groups of ER positive breast cancers that were resistant to hormone study25. Furthermore, a study found no improvement of OS in MCB with adjuvant hormonal therapy26. The survival of ER/PR positive breast cancer might be influenced by other unknown biologic determinants. We speculated that ER/PR positive MCB might have cross gene expression profiling with these poor survival ER/PR positive breast cancer, which needed future genomic analysis to confirm.

Radiotherapy and chemotherapy were common adjuvant therapies for invasive breast cancers. However, it was often suggested that MCB had good prognosis and therefore may not benefit from systemic therapy. There was a study found chemotherapy would improve 5 and 10-year OS. However, the p value was 0.08, which might not be solid26. Previous study also showed radiotherapy had no association with MCB OS10. We found both radiotherapy and chemotherapy were not associated with MCB prognosis, which might be from the good prognostic feature of MCB. It is better to perform prognostic study among more severe MCB, such as patients with tumor size > 30 millimetres or had positive lymph nodes. In addition, according to NCCN27, the typical MCB is uncommon while many cases classified as MCB do not have all the pathologic features on subsequent pathologic review. High grade IDC patients might be mistakenly diagnosed as MCB. Therefore, the guideline recommends that cases diagnosed as MCB be treated as other IDC based on tumor size, grade and lymph node status. However, our study found the grade was limited associated with MCB prognosis, which might prove new hint to the concern of MCB treatments.

Nomogram could generate an individual probability of a clinical event by integrating various clinical variables, which is a valuable quantitative tool for personalized medicine28. Nomograms have been found to compare favorably to traditional TNM staging systems in many cancers29,30. To our best knowledge, our study constructed the first competing risk nomogram for MCB patients. The variables involved in current nomogram were easy to be obtained in clinical.

There were some limitations of current study. First, we only included patients with complete information of involved variables, which might cause selection bias. The excluding of patients with missing data would also reduce the sample size and statistical power of our study, especially for the patients with MCB, which is a rare histological type of breast cancer. Nevertheless, this is a study of MCB with the largest sample size. Second, despite the use of statistical matching, our study was based on retrospective cohort, which presented relative low level of clinical evidence. However, since the low incidence of MCB, it is hard to perform prospective study. Third, to obtain enough follow-up time, we excluded the data after 2010, which contained the information of HER2 status that was important for breast cancer prognosis. There was a study found hormone receptor positive/HER2 positive MCB had a better BCSS and OS than hormone receptor positive/HER2 negative MCB7. However, it needed to be further confirmed by larger simple scale study. Fourth, the population-based cancer registry (PBCR) data was limited by its absence of detailed information, and treatment information in PBCR data represent only the first course of treatment planned at diagnosis. For example, potential bias might exist in the data of radiotherapy and chemotherapy as many factors involved in determining the course of treatment will not be captured in the registry data. In addition, there was no information of hormone therapy in SEER, therefore the association between ER positive MCB and the resistant of hormone therapy needed further randomized controlled trials to confirm. Fifth, there was limited pathology information of MCB. Survival rates of atypical MCB was found to be worse than typical MCB1,5,31. The changes of diagnostic criteria in different time might cause bias32. Finally, to be note, the larger sample size could improve the statistical power. Although we involved with the largest sample size of MCB patients, it is much smaller than the IDC cohort. The differences between MCB and IDC might be caused by the difference of sample size. Therefore, the results of current study need to be further validated by study with larger sample size of MCB.

Conclusion

In conclusion, we found MCB had a better BCSS than IDC. Larger tumor size, increasing number of positive lymph nodes and unmarried status were identified to promote the progression of MCB. ER or PR status and the use of radiotherapy or chemotherapy had no association with MCB prognosis. The competing risk nomogram of current study would be good clinical tool for prognostic estimation of MCB patients. Future larger sample studies are required to validate our findings.

Methods

Cohort selection

The cohort was obtained from 18 registries of SEER using SEER*Stat 8.3.5 software. As ER and PR status was registered since 1990 in SEER database, we involved patients diagnosed equal to or after 1990. To ensure adequate follow-up time, patients diagnosed after 2009 were excluded. Patients met the following inclusion criteria would be included: (1) it should be female primary MCB (8510/3: Medullary carcinoma, NOS) or IDC (ICD-O-3 Histology/behavior-8500/3) patients diagnosed between age 20 to79 who had surgery; (2) it should be unilateral invasive ductal carcinoma with location record; (3) it should include clinicopathological information for age at diagnosis, race, laterality, tumor location, grade, tumor size, information of Breast-Adjusted AJCC 6th Stage, number of positive regional nodes, ER and PR status, marital status, and radiotherapy or chemotherapy experience; (4) The survival time should over 3 months, and the vital status should be recoded for survival analyses; Any patient did not meet these criteria or lack of information for certain clinicopathological information would be excluded.

Study variables

The following variables were extracted for the selected cohorts that included age at diagnosis, race (Caucasian, African American, American Indian/Alaska Native, Asian or Pacific Islander), laterality (right or left side), tumor location, grade (well-differentiated, moderately differentiated, poorly differentiated, undifferentiated or anaplastic), tumor size, information of Breast-Adjusted AJCC 6th Stage, number of positive regional nodes, ER and PR status (positive, borderline and negative), marital status, and radiotherapy and chemotherapy experience. The tumor location was defined through the SEER Site Specific Coding Modules (https://seer.cancer.gov/manuals/2016/appendixc.html), which comprised nipple, central portion, upper-inner quadrant, lower-inner quadrant, upper-outer quadrant, lower-outer quadrant, axillary tail, overlapping lesion and breast that is not otherwise specified. The Breast-Adjusted AJCC 6th Stage was roughly considered as I, II, III and IV. The widowed or single (never married or having a domestic partner) or divorced or separated patients was classified as unmarried. The value of age at diagnosis, tumor size and number of positive regional nodes were transformed into small categorical variables to fit the linear assumption. The median follow-up was estimated as the median observed survival time.

Statistical analyses

The difference of each variable between MCB and IDC was analyzed by Chi-Squared tests. The CID were assessed for BCSM and other causes of death. Multivariate SH model was used to assess the BCSS. All the variables were involved in the multivariate SH analysis. HR and 95%CI were calculated. Statistical matching of all variables of MCB and IDC was performed by CEM, which was identified to have the ability to achieve lower levels of imbalance, model dependence, and bias than Propensity Score matching33,34. The standardized difference in means of all variables before and after CEM was calculated. Histograms of the distance measure before and after matching were plotted to estimate the efficacy of CEM. There were 4 histograms provided: the original treated and control groups and the matched treated and control groups. The increase of similarity of matched treated and control groups were considered the success of statistical matching35. The matched data was then analyzed by CIF test and multivariate SH model.

Multivariate SH model-based nomogram was constructed from the multivariate logistic regression model to predict the 5-year and 10-year cause-specific death of MCB patients. Over-fitting can become a serious problem when there are many potential variables in one predict model36. The variable selection was required to improve the prediction, and interpretation of our nomogram. Therefore, penalized variable selection was performed for our SH model by using techniques of least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD) and measure–correlate-predict (MCP). The selected variables would be involved in the SH model based nomogram. Internal validation of nomogram was estimated by discrimination and calibration. Discrimination is the ability of a model to distinguish between patients who have an event from those who do not. Calibration assesses how far the predictions are from the actual outcomes28. AUC was evaluated for the discrimination of nomograms. The AUC ranges from 0.5–1.0, with 0.5 indicates the outcomes is completely random and 1.0 indicates the perfect discrimination. Calibration curve was used to assess the calibration by comparing how close the nomogram estimated risk line was to the observed risk line in an axis. The brier score for an event at a time is defined as the expected squared distance between the observed status at that time and the predicted probability. Hence, a smaller value of brier score suggests a better model37. The brier score could account the discrimination and calibration at the same time38. These validation methods of nomogram were performed with 1,000 bootstraps to avoid the bias from overfitting. The bootstraps methodology is commonly used in the internal validation of nomograms whereby the model is iteratively applied to randomly selected sample sets of the original cohort29.

All the statistical analyses were performed using R version 3.4.2. R package “cmprsk” was used to perform the CIF test and multivariate SH analysis. CEM analysis was performed by R package “MatchIt”. The variable selection of SH model was performed by R package “crrp”. The competing risk nomogram was plotted by R packages “mstate” and “regplot”. The AUC and brier score were calculated by R package “riskRegression” and plotted by “ggplot2”. The calibration curve was drawn by R package “riskRegression”. A p-value less than 0.05 was considered statistically significant.