Development of a prediction model based on LASSO regression to evaluate the risk of non-sentinel lymph node metastasis in Chinese breast cancer patients with 1–2 positive sentinel lymph nodes

This study aimed to develop an intraoperative prediction model to evaluate the risk of non-sentinel lymph node (NSLN) metastasis in Chinese breast cancer patients with 1–2 positive sentinel lymph nodes (SLNs). The clinicopathologic data of 714 patients with 1–2 positive SLNs were investigated. Univariate and multivariate analyses were performed to identify the risk factors of NSLN metastasis. A new mathematical prediction model was developed based on LASSO and validated in an independent cohort of 131 patients. The area under the receiver operating characteristic curve (AUC) was used to quantify performance of the model. Patients with NSLN metastasis accounted for 37.3% (266/714) and 34.3% (45/131) of the training and validation cohorts, respectively. A LASSO regression-based prediction model was developed and included the 13 most powerful factors (age group, clinical tumour stage, histologic type, number of positive SLNs, number of negative SLNs, number of SLNs dissected, SLN metastasis ratio, ER status, PR status, HER2 status, Ki67 staining percentage, molecular subtype and P53 status). The AUCs of training and validation cohorts were 0.764 (95% CI 0.729–0.798) and 0.777 (95% CI 0.692–0.862), respectively. We presented a new prediction model with excellent clinical applicability and diagnostic performance for use by clinicians as an intraoperative clinical tool to predict risk of NSLN metastasis in Chinese breast cancer patients with 1–2 positive SLNs and make the final decisions regarding axillary lymph node dissection.

www.nature.com/scientificreports/ dissected during surgery were routinely submitted for postoperative pathological sections and immunohistochemical (IHC) staining.
Diagnostic criteria. The tumours were classified by size as T1, T2 and T3 according to the 8th edition of the American Joint Committee on Cancer (AJCC) staging guidelines 14 . IHC staining was performed to determine the estrogen receptor (ER), progesterone receptor (PR) and Ki67 status. The HER2 status was determined by IHC staining combined with fluorescence in situ hybridization (FISH). The samples that were IHC (−) and IHC (1+) were considered HER2-negative, and the samples that were IHC (3+) and FISH (+) were considered HER2-positive. All cases in the study were classified as the luminal A, luminal B, HER2 overexpression or Triple negative subtype by the 2013 St Gallen International Expert Consensus 15 .
Statistical analysis. The χ 2 test and logistic regression were utilized for the univariate and multivariate analyses, respectively. The analyses were performed using SPSS 23.0 software. The LASSO regression was performed using the glmnet package in R version 3.6.2 to establish a mathematical prediction model calculating the risk scores (RS) of the patients. The RS of each patient was calculated by the mathematical formula. We used SPSS 23.0 software to generate the receiver operating characteristic (ROC) curve. The performance of the prediction model was assessed by the area under the ROC curve (AUC) in the training cohort and the validation cohort. P < 0.05 was considered significant.
Ethical approval and consent to participate. This study was approved by the ethics committee of the First Affiliated Hospital of Chongqing Medical University (2020-309), and each participating patient provided written informed consent. All methods were confirmed to be performed in accordance with the relevant guidelines and regulations.

Results
General demographic and characteristics. Ultimately, in total, 845 patients with 1-2 positive SLNs were enrolled in our study; 714 patients were included in the training cohort and 131 patients were included in the validation cohort. All patients were females who ranged in age from 22 to 88 years. The median age was 49 years in both groups. In the training cohort, 266 patients (37.3%) were demonstrated to have NSLN metastasis by an analysis of postoperative pathological sections, indicating that the other 448 patients (62.7%) underwent unnecessary ALND. Then, the patients in the training cohort were divided into the low-RS group and the high-RS group, in which 26 patients (10.8%) and 240 patients (50.6%), respectively, were observed to have further NSLN metastasis. Additional clinicopathological parameters in the training cohort are shown in Table 1.
The clinicopathological characteristics of patients in the validation cohort are shown in Table 2.
Univariate and multivariate analyses in the training cohort. The univariate analysis showed that the histologic grade (P = 0.010), number of positive SLNs (P < 0.001), number of negative SLNs (P < 0.001), number of SLNs dissected (P < 0.001), SLN metastasis ratio (P < 0.001), LVI status (P < 0.001), ER status (P = 0.011), HER2 status (P = 0.005), molecular subtype (P = 0.001), and RS (P < 0.001) were associated with NSLN involvement in breast cancer patients with 1-2 positive SLNs (Table 1). In further logistic regression multivariate analysis, the histologic grade (P = 0.026), LVI status (P = 0.005), number of positive SLNs (P = 0.001), number of negative SLNs (P = 0.005), SLN metastasis ratio (P = 0.005), and molecular subtype (P = 0.007) were identified as independent predictive factors for NSLN metastasis (  (Fig. 1). Among these factors, the SLN metastasis ratio was the most influential factor in the RS, with the maximum absolute value of the coefficient. The regression coefficients of each factor are shown in Table 4. The RS of each patient in the study was calculated using the following model equation: Performance of the prediction model. Subsequently, we generated the ROC curve of the prediction model. The AUC was 0.764 (95% CI 0.729-0.798), highlighting the excellent diagnostic performance of this model (Fig. 2). The RS value of 1.87239924305 was identified as the cut-off value with the highest Youden index. The sensitivity, specificity and overall accuracy of the model were 74.1%, 69.6% and 71.3%, respectively. More than 30% of the patients in the study would avoid unnecessary ALND with the prediction model. In the validation cohort of 131 patients, the AUC was 0.777 (95% CI 0.692-0.862) (Fig. 3), showing a satisfying predictive value.

Discussion
In recent years, the results of the ACOSOG Z0011 and IBCSG 23-01 trials showed that neither the DFS nor the OS significantly differed between the SLNB-only and ALND groups among breast cancer patients with limited SLN involvement. Based on the results of these trials, the latest NCCN guidelines also recommended not performing ALND in patients with 1-2 involved SLNs who are planning to undergo breast-conserving surgery and subsequent radiotherapy 16 . However, in most developing countries, such as China, the BCR is only approximately 20%, while that in Western countries is 50-80% 6,17,18 . Even in some leading centres in China, the BCR is only 30% 19 . Because of the low BCR in developing countries and the absence of evidence of ALND omission in Eastern populations, most clinicians in developing countries such as China still hold a conservative view and recommend ALND for patients with positive SLNs 17 . In addition, the ALN status remains one of the most important prognostic factors. In our present study, only 266 (37.3%) of the patients with 1-2 SLN metastases in the training cohort were demonstrated to have NSLN metastasis after ALND, which is consistent with the results of previous studies 8,20 . Thus, more than 60% of the patients received unnecessary ALND. Therefore, it is of great importance to accurately predict NSLN metastasis either intraoperatively or preoperatively. Our study retrospectively analysed the clinicopathological data of the 714 breast cancer patients with 1-2 positive SLNs in the training cohort to determine the factors associated with axillary involvement. We further developed a new mathematical prediction model based on this Chinese population to evaluate the risk of NSLN metastasis. Previous studies have shown that LVI is a feature related to a poor prognosis and that promotes local recurrence and distant metastasis of tumours 21 . Several recent studies have recognized LVI as an independent predictor of NSLN metastasis in patients with 1-2 positive SLNs 22,23 . We drew the same conclusion. In our study, 76.0% and 35.8% of the patients with LVI and without LVI, respectively, were found to have NSLN involvement, and this difference was significant. Whether histologic grade is associated with NSLN metastasis remains controversial, and the conclusions reported by Maimaitiaili A and Wang XY are inconsistent 24,25 . Our univariate analysis showed that the patients with higher histologic grades were more likely to have at least one positive ALN, and the histologic grade remained an independent predictor of NSLN metastasis in the subsequent multivariate analysis (OR 1.630; 95% CI 1.061-2.505; P = 0.026).
Moreover, we divided all patients in the training cohort into the luminal A, luminal B, HER2 overexpression and triple negative subtypes according to the St Gallen International Expert Consensus (2013 edition) 15 . In these respective molecular subtype groups, 32.0%, 35.1%, 56.5% and 37.8% of the patients exhibited NSLN involvement. Compared to the triple negative type, the HER2 overexpression type, but not the luminal A and luminal B subtypes, was associated with a statistically higher risk of positive NSLNs. Whether NSLN metastasis is associated with the molecular subtype remains controversial. The results of a recent single-centre study involving 291 patients demonstrated that patients with luminal B and HER2 overexpression breast cancer had a significantly higher possibility of having at least one positive NSLN than patients with luminal A breast cancer 26 . However, in another retrospective study, investigators failed to identify the molecular subtype as an independent predictor of NSLN metastasis. The patients with positive SLNs had the same risk of axillary involvement regardless of their molecular subtypes 22 .
The number of positive SLNs, number of negative SLNs, number of SLNs dissected and SLN metastasis ratio were important predictors of NSLN metastasis in breast cancer patients with 1-2 positive SLNs. These factors heavily rely on assessments of intraoperative frozen sections. Thus, these values are unclear prior to surgery. Two publications considered the numbers of positive and negative SLNs independent risk factors and included these factors in their prediction models 27,28 . The SLN metastasis ratio was incorporated into the model predictions in another clinical study 29 . However, the value of the number of positive SLNs, number of negative SLNs, number of SLNs dissected and SLN metastasis ratio in predicting NSLN metastasis has not been fully clarified because of the collinearity among these factors. In our study, a LASSO regression was used to construct the mathematical model, thereby effectively solving the problem of collinearity among these factors.  www.nature.com/scientificreports/ The MSKCC nomogram, which was the first model to predict NSLN metastasis in patients with positive SLNs, performed well in the original population, with an AUC of 0.76 9 . The results of a previous study showed that the AUC of the MSKCC nomogram was less than 0.7, proving that the performance of the MSKCC nomogram was inferior to that of other models in other populations [30][31][32] . In addition, the MSKCC nomogram and other previous models were developed based on Western populations in developed countries, and thus hardly apply to the Eastern population. In the present retrospective analysis, we developed a new, LASSO algorithm-based  Table 2. Clinicopathological characteristics of the 131 patients in the validation cohort.  Table 4. The higher the absolute value of a regression coefficient, the greater is its influence on the model. In our prediction model, the most powerful predictor was the SLN metastasis ratio, with a coefficient of 0.7672377992. This factor was also included in the Cambridge model and another recent model 33,34 . Previous evidence showed that the absolute agreement rates of histologic grade and LVI between specimens obtained by core needle biopsy (CNB) and those obtained by surgical excision were only 75% and 69%, respectively 35,36 . The small number of specimens from CNB and intratumoural heterogeneity are possible reasons for the low concordance rate of the histologic grade and LVI status between the preoperative CNB and postoperative pathology results. Considering that the aim of our study was to develop an intraoperative prediction model, the histologic grade and LVI status, which are not entirely available via preoperative or intraoperative evaluation, were excluded from our prediction model, although these factors were identified as independent risk factors in the retrospective multivariate analysis. www.nature.com/scientificreports/ Furthermore, we calculated the RS of each patient in the training cohort according to the model equation. The ROC curve of the prediction model was then generated and shown to have an AUC of 0.764 (95% CI 0.729-0.798), which is comparable to that of the MSKCC nomogram 9 . Thus, the predictive power of the model is acceptable. Finally, the ROC curve analysis confirmed the cut-off value of the RS to be 1.87239924305. The sensitivity, specificity and total accuracy were 74.1%, 69.6% and 71.3%, respectively. More than 30% of the patients in the study would avoid unnecessary ALND with the prediction model. Therefore, we believe that ALND may be safely ignored when the RS of a patient, as calculated by the model equation, is less than the cut-off value of 1.87239924305. We divided the patients in the training cohort into a low-RS group and a high-RS group according to the cut-off value. Significantly more patients had positive NSLNs in the high-RS group than in the low-RS group, further confirming the predictive power of our model.
To evaluate the clinical applicability of the prediction model, a subsequent independent cohort of 131 patients was used for validation. The model still showed impressive performance, with an AUC of 0.777 (95% CI 0.692-0.862). Tus, the present prediction model can be considered an intraoperative clinical tool for clinicians to predict the risk of NSLN metastasis in Chinese breast cancer patients with 1-2 positive SLNs and make decisions regarding ALND.
To the best of our knowledge, this study is the first to use the LASSO algorithm to develop a prediction model based on an Eastern population. As more than 13 factors were included in our model, this model offers a more personalized assessment for breast cancer patients. However, there are a few limitations in our study. First, this was a retrospective study, and a prospective clinical trial is greatly needed. Second, this study was a single-centre study. Our prediction model should be validated with population data from other centres.

Conclusion
In conclusion, we developed a new intraoperative mathematical prediction model with 13 predictors based on the LASSO algorithm to evaluate the risk of NSLN metastasis in Chinese breast cancer patients with 1-2 positive SLNs. The model performed well in both the training cohort and validation cohort and has good clinical applicability.

Data availability
The datasets supporting the findings of this study will be available from the corresponding author upon reasonable request. www.nature.com/scientificreports/