Competing Risk Analyses of Medullary Carcinoma of Breast in Comparison to Infiltrating Ductal Carcinoma

The aim of current study was to use competing risk model to assess whether medullary carcinoma of the breast (MCB) has a better prognosis than invasive ductal carcinomas of breast cancer (IDC), and to build a competing risk nomogram for predicting the risk of death of MCB. We involved 3,580 MCB patients and 319,566 IDC patients from Surveillance, Epidemiology, and End Results (SEER) database. IDC was found to have a worse BCSS than MCB (Hazard ratio (HR) > 1, p < 0.001). The 5-year cumulative incidences of death (CID) was higher in IDC than MCB (p < 0.001). Larger tumor size, increasing number of positive lymph nodes and unmarried status were found to worsen the BCSS of MCB (HR > 1, p < 0.001). We found no association between ER, PR, radiotherapy or chemotherapy and MCB prognosis (p > 0.05). After a penalized variable selection process, the SH model-based nomogram showed moderate accuracy of prediction by internal validation of discrimination and calibration with 1,000 bootstraps. In summary, MCB patients had a better prognosis than IDC patients. Interestingly, unmarried status in addition to expected risk factors such as larger tumor size and increasing number of positive lymph nodes were found to worsen the BCSS of MCB. We also established a competing risk nomogram as an easy-to-use tool for prognostic estimation of MCB patients.


Results
cohort selection. After selection, we involved 3,580 MCB patients and 319,566 IDC patients (Table 1). There were significant differences between MCB and IDC among age, race, location, grade, tumor size, tumor stage, number of positive regional nodes, ER or PR status, and chemotherapy experience (p < 0.001, Table 1). Compared with IDC, the MCB patients presented with younger age (age < 50, 47.4% vs. 28.8%), more African Americans (21% vs. 9.3%), higher grade (grade III and IV, 63.8% vs. 39.9%), larger tumor size (>2 cm, 51.2% vs. 34.9%), higher proportion-Adjusted AJCC 6th Stage II (54.8% vs. 36.8%), less positive regional nodes (no positive node, 71.9% vs. 65.3%), more ER negative (78.5% vs. 24.8%) and PR negative (81.7% vs. 34.4%) rate, and more experience of chemotherapy (61% vs. 46.2%). The median follow-time were 143 months (interquartile range [IQR], 94 to 204 months) for MCB and 107 months (IQR, 73 to 155 months) for IDC. The median age of diagnosis was 50 years (IQR, 43 to 60 years) for MCB and 57 years (IQR, 48 to 67 years) for IDC. The rate of breast cancer specific mortality (BCSM) and other causes of death were 11.06% and 14.19% for MCB and 13.14% and 15.8% for IDC, respectively. The proportion of deaths due to cancer and other causes in each variable was listed in Supplemental Table 1, from which we observed the deaths caused by other reasons increased rapidly as the age raised.
MCB showed better outcomes than IDC. As shown in Table 2, We found IDC had a higher 5-years cumulative incidences of death (CID) of both breast cancer specific survival (BCSS) (p < 0.001) and other causes of deaths (p < 0.001) than the MCB (Fig. 1). The multivariate SH model found IDC had worse BCSS than MCB (Hazard ratio (HR) > 1). Furthermore, since statistical matching methods could lower the differences between groups on confounding variables and made cancer observation studies in somehow be considered as a quasi-experimental study, we performed a coarsened exact matching (CEM) between MCB and IDC by matching all the included variables. The standardized difference in means of all variables between MCB and IDC was eliminated and the histograms of IDC and MCB looked much more similar after CEM (see Supplementary Fig. S1-2 online), indicating our matching was successful. Matched 2,307 MCB patients and 24,398 IDC patients were further analyzed by CIF test and multivariate SH model. The new results remained as pre-matched analyses ( Table 2). These multi-face results showed IDC had worse BCSS than MCB in resectable breast cancer patients.

Multivariate SH Analysis of BCSS in IDC and MCB.
The prognostic value of each involved variable for IDC and MCB BCSS was listed in Table 3. Larger tumor size, a greater number of positive regional lymph nodes and unmarried status were found to be risk factors for BCSS of both IDC and MCB (HR > 1, P < 0.05). Meanwhile there were prognostic discrepancies among other variables between IDC and MCB. We found the location of nipple was a risk factor for MCB (Upper-outer quadrant vs. nipple, HR < 1, p = 0.043) but was a protective factor for IDC (Upper-inner quadrant or lower-inner quadrant vs. nipple, HR > 1, p < 0.05). Higher grade showed a worse BCSS of IDC (HR > 1, p < 0.001). However, higher grade showed no association with MCB BCSS (III/II vs. I, p > 0.05) or even had a better BCSS (IV vs. I, HR < 1, p = 0.013). In addition, we found age, race, Breast-Adjusted AJCC 6th Stage, ER and PR status and treatments of radiotherapy or chemotherapy were significantly correlated with IDC (p < 0.05) but not MCB (p > 0.05). Specifically, we found older age (40-69 vs. 20-29) was a protective factor (HR < 1, p < 0.05) of IDC. Compared with Caucasians, the African American and American Indian/Alaska Native had worse outcomes (IDC, HR > 1, p < 0.05). The Asian or Pacific Islander showed a better BCSS than Caucasians (IDC, HR < 1, p < 0.001). Higher stage increased the BCSS of IDC (HR > 1, p < 0.001). Positive ER or PR status and radiotherapy or chemotherapy lower the BCSS of IDC (HR < 1, p < 0.05). There was no association between laterality and BCSS in both IDC and MCB (p > 0.05). nomogram development and validation. A penalized variable selection was performed for our SH model. As shown in Table 4, the LASSO, SCAD, and MCP analyses all identified tumor size, tumor stage, regional positive nodes number and marital status as the key variables in our SH model. The multivariate SH based nomogram was then constructed with these selected variables. A weighted total score calculated from each variable was used to estimate the 5-year and 10-year cause-specific death (Fig. 2). Time dependent the area under the curve of receiver operating characteristic curve (AUC, which was also referred as C-statistics) plots showed moderate discrimination of our nomogram (Fig. 3a). The AUC were 0.724 (95%confidence index (95%CI) = 0.671-0.767) and 0.698 (95%CI = 0.654-0.741) for 5-year and 10-year predictions, respectively. The brier score plot showed our nomogram model had lower score against the null model (Fig. 3b). The brier score was 0.0605 (95%CI = 0.0519-0.0698) and 0.0805 (95%CI = 0.0710-0.0907) for 5-year and 10-year predictions, respectively, indicating that our nomogram had good prediction. The calibration plots showed our nomogram predicted well in patients estimated at lower risk of deaths but not in patients estimated at high risk of deaths, which might come from that few patients were estimated with higher risk of death (Fig. 3c,d).

Discussion
Most clinical cancer survival studies used Kaplan-Meier and Cox proportional hazards models to evaluate the prognostic value of their interests. Both models consider that there is a single cause to the event of interest, such as death. However, there were actually more causes (which are known as competing events) of deaths during the disease management, especially for cancers with long survival time such as breast cancer. Hence, the competing risk should be considered in the survival analysis. The current study was the first one to use competing risk models (CIF and SH models) to analyze the prognostic value of clinical variables to the BCSS of MCB and IDC. Our results showed IDC had a worse BCSS than MCB, which was consistent with previous studies 4, [13][14][15] . Larger tumor size, a greater number of positive regional nodes and unmarried status were found to promote the progression of www.nature.com/scientificreports www.nature.com/scientificreports/ Continued both MCB and IDC. We also found there were many discrepancies between IDC and MCB prognosis in variables included age, race, tumor location, tumor grade, tumor stage, ER and PR status, and records of radiotherapy or chemotherapy. This is currently the largest sample size study on MCB prognosis. Furthermore, we constructed the first competing model-based nomogram for estimating the 5-year and 10-year risk of death of MCB patients. MCB patients were found to have younger age than IDC 3 . Our study with larger scale sample size confirmed this result. Younger patients exhibited a worse BCSS in IDC but there was no association between younger age and MCB BCSS.
Our results showed race played important parts in IDC. African American showed a relative worse outcome while patients from Asia exhibited better prognosis. However, there was limited influence of races in MCB. There was a study showed African American had a worse OS than Caucasians in MCB 10 . This study did not exclude severe patients that had no surgery chance. In the other hand, we only included early diagnosed patients who were resectable. We speculated African American might had more severe MCB patients which caused this difference.  www.nature.com/scientificreports www.nature.com/scientificreports/ Previous studies found MCB patients had higher grade, higher tumor stage and greater tumor size but meanwhile had favorable long-term distant relapse-free survival 2,3 . Our study showed similar results that MCB patients exhibited higher grade, higher tumor stage and larger tumor size than IDC. However, there was limited association of grade or tumor stage and MCB prognosis. Moreover, only tumor size over 30 millimetres had a significant worse outcome compared with tumors less than 10 millimetres in MCB. Increasing number of positive lymph nodes associated with worse OS and BCSS of early diagnosed breast cancer patients 16 . Our study also proved more positive lymph nodes would promote the cancer progression of both MCB and IDC patients.
Previous studies showed marriage was associated with the improvements in cardiovascular, endocrine, immune function, and cancer prognosis [17][18][19] . A study found that even after adjusting for known confounders, unmarried patients are at significantly higher risk of presentation with metastatic cancer, undertreatment, and death resulting from their cancer 17 . Having high levels of perceived social support, larger social network, and being married were found to be associated with decreases in relative risk for cancer mortality of 25%, 20%, and 12%, respectively 20 . We found the married status would improve the prognosis of MCB and IDC, which provide new evidences to the association between social support and cancer outcomes.
Positive ER or PR status were rarely found in MCB 9,10,15,21,22 . Our study found MCB had fewer ER or PR positive status than IDC. ER or PR positive status was often considered as good prognostic factors for breast cancer as the use of hormone treatment might help for certain patients 23 . Indeed, ER or PR positive status was found to be associated with better outcomes in IDC in our study. However, we found there was no association between ER or PR status and MCB prognosis. A previous study 24 stratified breast cancer patients into 4 gene based clusters using hierarchical cluster analysis, among which Cluster B and Cluster A were both ER/PR positive and HER2 negative breast cancers. However, it found that Cluster B exhibited a worse cancer-specific survival than Cluster A. Moreover, previous study also found there were groups of ER positive breast cancers that were resistant to hormone study 25 . Furthermore, a study found no improvement of OS in MCB with adjuvant hormonal therapy 26 . The survival of ER/PR positive breast cancer might be influenced by other unknown biologic determinants. We speculated that ER/PR positive MCB might have cross gene expression profiling with these poor survival ER/PR positive breast cancer, which needed future genomic analysis to confirm.
Radiotherapy and chemotherapy were common adjuvant therapies for invasive breast cancers. However, it was often suggested that MCB had good prognosis and therefore may not benefit from systemic therapy. There was a study found chemotherapy would improve 5 and 10-year OS. However, the p value was 0.08, which might not be solid 26 . Previous study also showed radiotherapy had no association with MCB OS 10 . We found both radiotherapy and chemotherapy were not associated with MCB prognosis, which might be from the good prognostic feature of MCB. It is better to perform prognostic study among more severe MCB, such as patients with tumor size > 30 millimetres or had positive lymph nodes. In addition, according to NCCN 27 , the typical MCB is uncommon while many cases classified as MCB do not have all the pathologic features on subsequent pathologic review. High grade IDC patients might be mistakenly diagnosed as MCB. Therefore, the guideline recommends that cases diagnosed as MCB be treated as other IDC based on tumor size, grade and lymph node status. However, our study found the grade was limited associated with MCB prognosis, which might prove new hint to the concern of MCB treatments.
Nomogram could generate an individual probability of a clinical event by integrating various clinical variables, which is a valuable quantitative tool for personalized medicine 28 . Nomograms have been found to compare favorably to traditional TNM staging systems in many cancers 29,30 . To our best knowledge, our study constructed the first competing risk nomogram for MCB patients. The variables involved in current nomogram were easy to be obtained in clinical.  www.nature.com/scientificreports www.nature.com/scientificreports/ There were some limitations of current study. First, we only included patients with complete information of involved variables, which might cause selection bias. The excluding of patients with missing data would also reduce the sample size and statistical power of our study, especially for the patients with MCB, which is a rare histological type of breast cancer. Nevertheless, this is a study of MCB with the largest sample size. Second, despite the use of statistical matching, our study was based on retrospective cohort, which presented relative low level of clinical evidence. However, since the low incidence of MCB, it is hard to perform prospective study. Third, to  www.nature.com/scientificreports www.nature.com/scientificreports/ obtain enough follow-up time, we excluded the data after 2010, which contained the information of HER2 status that was important for breast cancer prognosis. There was a study found hormone receptor positive/HER2 positive MCB had a better BCSS and OS than hormone receptor positive/HER2 negative MCB 7 . However, it needed to be further confirmed by larger simple scale study. Fourth, the population-based cancer registry (PBCR) data was limited by its absence of detailed information, and treatment information in PBCR data represent only the first course of treatment planned at diagnosis. For example, potential bias might exist in the data of radiotherapy and chemotherapy as many factors involved in determining the course of treatment will not be captured in the registry data. In addition, there was no information of hormone therapy in SEER, therefore the association between ER positive MCB and the resistant of hormone therapy needed further randomized controlled trials to confirm. Fifth, there was limited pathology information of MCB. Survival rates of atypical MCB was found to be worse than typical MCB 1,5,31 . The changes of diagnostic criteria in different time might cause bias 32 . Finally, to be note, the larger sample size could improve the statistical power. Although we involved with the largest sample size of MCB patients, it is much smaller than the IDC cohort. The differences between MCB and IDC might be caused by the difference of sample size. Therefore, the results of current study need to be further validated by study with larger sample size of MCB.

conclusion
In conclusion, we found MCB had a better BCSS than IDC. Larger tumor size, increasing number of positive lymph nodes and unmarried status were identified to promote the progression of MCB. ER or PR status and the use of radiotherapy or chemotherapy had no association with MCB prognosis. The competing risk nomogram of current study would be good clinical tool for prognostic estimation of MCB patients. Future larger sample studies are required to validate our findings.  www.nature.com/scientificreports www.nature.com/scientificreports/ clinicopathological information for age at diagnosis, race, laterality, tumor location, grade, tumor size, information of Breast-Adjusted AJCC 6th Stage, number of positive regional nodes, ER and PR status, marital status, and radiotherapy or chemotherapy experience; (4) The survival time should over 3 months, and the vital status should be recoded for survival analyses; Any patient did not meet these criteria or lack of information for certain clinicopathological information would be excluded.

Study variables.
The following variables were extracted for the selected cohorts that included age at diagnosis, race (Caucasian, African American, American Indian/Alaska Native, Asian or Pacific Islander), laterality (right or left side), tumor location, grade (well-differentiated, moderately differentiated, poorly differentiated, undifferentiated or anaplastic), tumor size, information of Breast-Adjusted AJCC 6th Stage, number of positive regional nodes, ER and PR status (positive, borderline and negative), marital status, and radiotherapy and chemotherapy experience. The tumor location was defined through the SEER Site Specific Coding Modules (https:// seer.cancer.gov/manuals/2016/appendixc.html), which comprised nipple, central portion, upper-inner quadrant, lower-inner quadrant, upper-outer quadrant, lower-outer quadrant, axillary tail, overlapping lesion and breast that is not otherwise specified. The Breast-Adjusted AJCC 6th Stage was roughly considered as I, II, III and IV. The widowed or single (never married or having a domestic partner) or divorced or separated patients was classified as unmarried. The value of age at diagnosis, tumor size and number of positive regional nodes were transformed into small categorical variables to fit the linear assumption. The median follow-up was estimated as the median observed survival time.
Statistical analyses. The difference of each variable between MCB and IDC was analyzed by Chi-Squared tests. The CID were assessed for BCSM and other causes of death. Multivariate SH model was used to assess the BCSS. All the variables were involved in the multivariate SH analysis. HR and 95%CI were calculated. Statistical matching of all variables of MCB and IDC was performed by CEM, which was identified to have the ability to achieve lower levels of imbalance, model dependence, and bias than Propensity Score matching 33,34 . The standardized difference in means of all variables before and after CEM was calculated. Histograms of the distance measure before and after matching were plotted to estimate the efficacy of CEM. There were 4 histograms provided: the original treated and control groups and the matched treated and control groups. The increase of similarity of matched treated and control groups were considered the success of statistical matching 35 . The matched data was then analyzed by CIF test and multivariate SH model.
Multivariate SH model-based nomogram was constructed from the multivariate logistic regression model to predict the 5-year and 10-year cause-specific death of MCB patients. Over-fitting can become a serious problem when there are many potential variables in one predict model 36 . The variable selection was required to improve the prediction, and interpretation of our nomogram. Therefore, penalized variable selection was performed for our SH model by using techniques of least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD) and measure-correlate-predict (MCP). The selected variables would be involved in the SH model based nomogram. Internal validation of nomogram was estimated by discrimination and calibration. Discrimination is the ability of a model to distinguish between patients who have an event from those who do not. Calibration assesses how far the predictions are from the actual outcomes 28 . AUC was evaluated for the discrimination of nomograms. The AUC ranges from 0.5-1.0, with 0.5 indicates the outcomes is completely random and 1.0 indicates the perfect discrimination. Calibration curve was used to assess the calibration by comparing how close the nomogram estimated risk line was to the observed risk line in an axis. The brier score for an event at a time is defined as the expected squared distance between the observed status at that time and the predicted probability. Hence, a smaller value of brier score suggests a better model 37 . The brier score could account the discrimination and calibration at the same time 38 . These validation methods of nomogram were performed with 1,000 bootstraps to avoid the bias from overfitting. The bootstraps methodology is commonly used in the internal validation of nomograms whereby the model is iteratively applied to randomly selected sample sets of the original cohort 29 .
All the statistical analyses were performed using R version 3.4.2. R package "cmprsk" was used to perform the CIF test and multivariate SH analysis. CEM analysis was performed by R package "MatchIt". The variable selection of SH model was performed by R package "crrp". The competing risk nomogram was plotted by R packages "mstate" and "regplot". The AUC and brier score were calculated by R package "riskRegression" and plotted by "ggplot2". The calibration curve was drawn by R package "riskRegression". A p-value less than 0.05 was considered statistically significant.