Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer

Zheng, Xueyi; Yao, Zhao; Huang, Yini; Yu, Yanyan; Wang, Yun; Liu, Yubo; Mao, Rushuang; Li, Fei; Xiao, Yang; Wang, Yuanyuan; Hu, Yixin; Yu, Jinhua; Zhou, Jianhua

doi:10.1038/s41467-020-15027-z

Download PDF

Article
Open access
Published: 06 March 2020

Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer

Nature Communications volume 11, Article number: 1236 (2020) Cite this article

23k Accesses
247 Citations
9 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 12 July 2021

This article has been updated

Abstract

Accurate identification of axillary lymph node (ALN) involvement in patients with early-stage breast cancer is important for determining appropriate axillary treatment options and therefore avoiding unnecessary axillary surgery and complications. Here, we report deep learning radiomics (DLR) of conventional ultrasound and shear wave elastography of breast cancer for predicting ALN status preoperatively in patients with early-stage breast cancer. Clinical parameter combined DLR yields the best diagnostic performance in predicting ALN status between disease-free axilla and any axillary metastasis with areas under the receiver operating characteristic curve (AUC) of 0.902 (95% confidence interval [CI]: 0.843, 0.961) in the test cohort. This clinical parameter combined DLR can also discriminate between low and heavy metastatic burden of axillary disease with AUC of 0.905 (95% CI: 0.814, 0.996) in the test cohort. Our study offers a noninvasive imaging biomarker to predict the metastatic extent of ALN for patients with early-stage breast cancer.

Deep learning radiomics based prediction of axillary lymph node metastasis in breast cancer

Article Open access 12 March 2024

Prediction of sentinel lymph node metastasis in breast cancer patients based on preoperative features: a deep machine learning approach

Article Open access 16 January 2024

Prediction of pathologic complete response to neoadjuvant systemic therapy in triple negative breast cancer using deep learning on multiparametric MRI

Article Open access 20 January 2023

Introduction

Breast cancer is the most commonly diagnosed cancer among women worldwide and becomes the second leading cause of cancer-related death¹. Accurate identification of axillary lymph-node (ALN) involvement in patients with breast cancer is important for prognosis and therapy decisions². Sentinel lymph-node (SLN) is the first node draining the primary cancer. SLN dissection (SLND) is recommended to predict ALN status, especially for those with clinically negative nodes³. The American College of Surgeons Oncology Group Z0011 (ACOSOG Z0011) trial showed that among patients with clinical T1/T2 breast cancer, if there were two or fewer SLN metastases, the use of SLND alone would not lead to inferior survival compared with ALND^4,5. Compared with ALND, SLND has fewer complications, but it is not risk-free surgery and still has some significant limitations, including increasing considerable anesthesia time and expense, and causing complications such as arm numbness or upper limb edema in 3.5–10.9% of patients^6,7. There were studies showing that 43–65% of patients who had positive SLNs underwent unnecessary axillary surgery because of no additional non-SLN metastasis, resulting in high morbidity^8,9. In fact, SLN biopsy could be avoided if there was reliable preoperative evaluation of ALN status because most patients who had early-stage breast cancer have disease-free axilla¹⁰.

Ultrasound (US) has been widely used to preoperatively characterize breast lesions and determine ALN status¹¹. A study showed that clinical T stage and preoperative axillary ultrasound (US) results were associated with the ALN status in patients with early-stage breast cancer¹⁰, but the diagnostic performance of axillary US was poor to determine the ALN status with an area under the receiver operating characteristic curve (AUC) of 0.585–0.719 (ref. ¹²). Several studies intended to predict the ALN status by clinicopathological data, such as tumor grade, histological tumor size, lymphovascular invasion, Ki-67 proliferation index, and hormone receptor status^13,14. However, using clinicopathological data only is not accurate enough with an AUC of 0.66–0.74 in previous studies¹⁵. In addition, some data like lymphovascular invasion and histological tumor size could not be available preoperatively, but preoperative knowledge of ALN status is important for determining appropriate axillary treatment options⁶.

Two-dimensional (2D) shear wave elastography (SWE), a new US technology to measure tissue stiffness, integrates B-mode image with a color-coded map which shows the distribution of shear wave velocity (SWV)¹⁶. This technique showed promise in distinguishing malignant and benign breast lesions¹⁷. Some studies suggested that stiffness of breast cancer was a predictor of ALN status. Higher SWV of breast cancer showed higher possibility of ALN metastasis^18,19. However, the performance of 2D-SWE depends on the placement of regions of interest (ROI) and the AUC was only 0.759 for the prediction of ALN status^17,20. Therefore, only SWE images of breast cancer might be insufficient to evaluate ALN status accurately.

Radiomics can automatically provide a large number of quantitative image features from medical images, which tends to be hard for naked eyes to recognize^21,22. This method was first demonstrated to be useful in analyzing CT or MRI images on clinical oncology^23,24. Recently, radiomics based on analysis of US images showed better performances than other routine methods²⁵. However, analyzing US images by radiomics has some limitations including object segmentation and extraction of hard-coded features²². Deep learning radiomics, a newly developed method, can provide quantitative and high-throughput features from medical images by supervised learning^21,22. A recent study demonstrated that this DLR was useful in analyzing SWE images and showed excellent performance in predicting the stages of liver fibrosis²². When applied to analyze medical images, DLR usually confronts small-sample learning problems. Clinical parameter combined DLR, which integrates clinical information with network characteristics, can help provide complementary information for image features and collaboratively use clinical information and US images features to build model, thus improving model performance²⁶. Our hypothesis is that clinical parameter combined DLR might be able to extract more valuable information from images of breast conventional US and SWE and thus provide better prediction and stratification of ALN status according to the cuts off for axillary surgery of ACOSOG Z0011 trial.

Hence, the purpose of this study is to evaluate the diagnostic performance of clinical parameter combined DLR on conventional US images and SWE images of breast cancer in predicting the extent of ALN involvement in patients with early-stage breast cancer. Our results reveal that clinical parameter combined DLR yields the best diagnostic performance in predicting ALN status between disease-free axilla (N0) and any axillary metastasis (N₊(≥1)) with an AUC of 0.902 in the test cohort, which was significantly higher than that of axillary US (P < 0.001, Hanley & McNeil), classification by clinicopathological data (P = 0.002, Delong et al.) and DLR on images only (P = 0.004, Delong et al.). Clinical parameter combined DLR on breast conventional US and SWE images provides a noninvasive imaging biomarker for predicting the extent of ALN involvement preoperatively and have the potential to determine appropriate axillary treatment options for patients with early-stage breast cancer.

Results

Baseline characters

Between January 2016 and April 2019, a total of 1342 women with 1342 breast lesions was studied and finally 584 women (mean age, 50 years; range, 26–83 years) with 584 malignant breast lesions were enrolled for analysis. Figure 1 shows the patient recruitment workflow. According to the results of SLND or ALN dissection, 337 had disease-free axilla (N0), 150 had low metastatic burden of axillary disease (N₊(1–2)) and 97 had heavy metastatic burden of axillary disease (N₊(≥3)).

**Fig. 1: Patient recruitment workflow.**

Base model selection and clinical information integration

The base model acted as a feature encoder, which had a significant impact on classification. In order to find the most suitable base model for ALN prediction tasks, the performances of ResNet50, ResNet101, Inception V3, and VGG19 in predicting ALN status between N0 and N₊(≥1) were compared. When ResNet50 was selected as the basic model with best performance, clinical information was further added to the diagnostic model. The model incorporating clinical information was called ResNet50 + C, where C stands for clinical information. The method of adding clinical information was to directly input clinical information into the penultimate layer of the fully connected (FC) layer of ResNet50 by increasing the number of neurons. The detailed results were summarized in Table 1. The ResNet50, which integrated the deep features and clinical information offline, was proved to be the best in terms of performance and memory usage.

Table 1 The performance comparison of different models.

Full size table

Prediction of ALN status between N0 and N₊(≥1)

Adopting N0 as negative reference standard, 466 lesions were randomly assigned as training cohort and the other 118 lesions as independent test cohort. The detailed characteristics including patient age, US size, Breast Imaging-Reporting and Data System (BI-RADS) category, tumor type, estrogen receptor (ER) status, progesterone receptor (PR) status, human epidermal growth factor receptor 2 (HER-2), Ki-67 proliferation index were demonstrated in Table 2. There was no significant difference between the detailed characteristics of the two cohorts (all P > 0.05, t-test or Mann-Whitney U test). Based on axillary US findings evaluated by an experienced radiologist, axillary US findings had an AUC of 0.735, accuracy of 0.635, sensitivity of 0.721 and specificity of 0.573. The Kappa values for axillary US were 0.933 for inter-observer agreement and 1 for intra-observer agreement (both P < 0.001, Kappa test).

Table 2 Patient and tumor characteristics.

Full size table

In the training cohort, clinical parameter combined DLR achieved the highest AUC of 0.936 while DLR based on images only and classification by clinicopathologic data only achieved AUCs of 0.850 and 0.771, respectively. In the independent test cohort, AUCs dropped slightly for predicting ALN metastasis and was consistent with the performance of training cohort. The clinical parameter combined DLR still achieved the highest AUC of 0.902, which was significantly higher than the AUC of other methods including axillary US findings (AUC: 0.735, P < 0.001, Hanley & McNeil test), DLR based on images only (AUC:0.796, P = 0.004, Delong et al.) and classification by clinicopathologic data (AUC:0.727, P = 0.002, Delong et al.). The accuracy, sensitivity, specificity, PPV, and NPV of clinical parameter combined DLR were also universally better than other methods. The detailed statistical results were summarized in Table 3 and its corresponding ROCs were shown in Fig. 2. The prediction results of DLR based on using only US or SWE images combined with clinical parameters were poorer than DLR based on using both US and SWE images combined with clinical parameters (P = 0.006 and P = 0.002, respectively, Delong et al.) (Supplementary Note 1 and Supplementary Table 2).

Table 3 The prediction of ALN status results (N0 v.s. N₊(≥1)).

Full size table

**Fig. 2: Comparison of receiver operating characteristic (ROC) curves between different models for predicting disease-free axilla (N0) and any axillary metastasis (N₊(≥1)).**

Prediction of ALN status between N₊(1–2) and N₊(≥3)

Adopting N₊(1–2) as negative reference standard, this experiment assigned 197 lesions as training cohort and 50 as independent test cohort. In the training cohort, DLR based on images only and classification by clinicopathologic data achieved AUCs of 0.874 and 0.756, respectively, while clinical parameter combined DLR achieved the AUC of 0.956. In the independent test cohort, the AUC of clinical parameter combined DLR dropped slightly but still reached 0.905, which was significantly higher than the AUC of DLR based on images only (AUC: 0.777, P = 0.04, Delong et al.) and the AUC of classification by clinicopathologic data (AUC: 0.686, P = 0.03, Delong et al.). The detailed statistical results were summarized in Table 4. The corresponding ROCs depicted the comparisons (Fig. 3).

Table 4 The prediction of ALN status results (N₊(1–2) v.s. N₊(≥3)).

Full size table

Fig. 3: Receiver operating characteristic (ROC) curves comparison between different models for predicting low metastatic burden of axillary disease (N₊(1–2)) and heavy metastatic burden of axillary disease (N₊(≥3)).

Prediction of ALN status among N0, N₊(1–2) and N₊(≥3)

This model was extended to be compatible with three groups of tasks to predict ALN status. As described above, the clinical endpoints were categorized into three parts: N0, N₊(1–2), and N₊(≥3). The number of lesions of the three categories is 337 (N0), 150 (N₊(1–2)), and 97 (N₊(≥3)), respectively. The DLR model was built on breast conventional US and SWE images and was classified by axillary US findings, clinicopathologic data. The overall accuracy of differentiating the three groups was 0.805 and the confusion matrix was shown in Fig. 4. The model performed well in differentiating the N0 group while showed poorer results in the other two groups.

Fig. 4: The confusion matrix of predicting metastasis among disease-free axilla (N0), low metastatic burden of axillary disease (N₊(1–2)) and heavy metastatic burden of axillary disease (N₊(≥3)).

Interpretability of DLR model

For investigating the interpretability of the DLR, the network was visualized by applying the Gradient-weighted Class Activation Mapping (Grad-CAM), which could produce a coarse localization map highlighting the import regions for classification target²⁷. The last convolutional layer of the last res-block was made transparent to the prediction of ALN status as shown in Fig. 5. We found that there were usually two locations valuable for predicting ALN status based on DLR model. One is the boundary of the tumor and the other is the low echo area inside the tumor. To some extent, this proved the effectiveness of the model.

**Fig. 5: Visualization of two patient examples.**

Discussion

According to ACOSOG Z0011 trial, patients who had early-stage breast cancer with less than 2 SLN metastasis had no inferior survival if they underwent SLND only rather than ALN dissection^4,5. Based on the results of ACOSOG Z0011 trial, all patients should undergo SLND to predict ALN status whether patients have clinically positive node or not²⁸. However, SLND has some limitations, including causing some complications⁶, having false-negative rates ranging from 7.8–27.3%^29,30,31 and resulting in unnecessary axillary surgery⁹. Hence, there is an increasing need for predicting metastatic extent of ALN accurately in a noninvasive way.

In this study, we developed and validated a clinical parameter combined DLR method based on breast conventional US and SWE images for preoperative prediction of ALN status in patients with clinical T1 or T2 breast cancer. This method showed significantly better diagnostic performances in distinguishing patients with a negative axilla (N0) and patients with any axillary metastasis (N₊(≥1)) than any single method. Encouragingly, our model showed favorable discriminating ability between patients with low metastatic burden of axillary disease (N₊(1–2)) and patients with heavy metastatic burden of axillary disease (N₊(≥3)). With false-negative rate similar to SLND, this clinical parameter combined DLR might have the potential to serve as a noninvasively imaging biomarker to replace SLND for patients with early-stage breast cancer. The clinical parameter combined DLR showed the possibility to assist breast clinicians to make decisions for appropriate axillary treatment: no need for SLN biopsy or ALN dissection in patients with N0, SLND only for patients with N₊(1–2) and ALN dissection for patients with N₊(≥3)⁴.

Some studies argued that compared with SLND, axillary US combined with fine needle aspiration or core needle biopsy could be cost saving for patients with positive nodal status¹¹. However, axillary US was not accurate enough to predict ALN status with an AUC of 0.585–0.719 (refs. ^12,32). Some studies demonstrated that the number and morphology of abnormal lymph nodes detected by axillary US were predictors of nodal burden²⁸, but the diagnostic performances were poor with AUC of 0.725–0.747 (refs. ^33,34). In our study, the overall diagnostic performance of preoperative axillary US results was low with an AUC of 0.735, which was concordant with previous studies^35,36. Compared with axillary US results alone, the clinical parameter combined DLR method makes use of all available data including findings of axillary US, clinicopathologic data, breast conventional US and SWE images and therefore, showed significantly better diagnostic performances in predicting ALN metastasis than the routine axillary US evaluated by an experienced radiologist.

Some studies reported that some histopathological data such as tumor grade, lymphovascular invasion, histological tumor size, and hormone receptor status could be a predictor of ALN metastasis^10,37. However, some histopathological data, such as histological tumor size and lymphovascular invasion, could only be evaluated after surgical resection, and could not guide decisions of axillary surgery preoperatively. Although tumor grade could be estimated from core biopsy samples preoperatively, low concordant rates ranging from 67% to 75% were found between core needle biopsy and surgical excision in previous studies^38,39. Different from previous studies, this current study adopts all histopathological data available after biopsy of primary breast tumor⁴⁰, which is a standard procedure preoperatively. Therefore, some clinicopathologic data available preoperatively were kept as candidate factors in developing the predictive model, which could serve as a noninvasive predictive method to assess ALN status.

Clinical parameter combined DLR was completely established on analyzing images of breast conventional US and SWE with the DLR concept²² and was combined with axillary US findings and clinicopathologic data. This DLR method has shown great promise in analyzing SWE images on staging liver fibrosis²². Radiomics method was also applied in other imaging modalities like CT or MRI images of some primary cancer like bladder, colon cancer to predict regional lymph-node metastasis, demonstrating this method was a useful way to make a prediction of lymph-node metastasis^41,42. Compared with the previous study, our study yielded a better diagnostic performance by concentrating on the clinical parameter combined DLR method, which can complement image features with more information and make the model more robust by restraining the features extracted from images²⁶. In addition, for patients with suspected breast lesions, breast and axillary US is a routine practice to characterize breast lesions and axillary lymph-node status, and have the advantages of cost effective and no ionizing radiation comparing with other imaging modalities¹⁰.

Compared with those studies using SWE values as a single parameter to predict ALN status^12,18, our study showed a better diagnostic performance by applying DLR on breast conventional US and SWE images. Instead of measuring the stiffness of breast cancer inside the shear wave ROI based on several parameters of SWE, the whole shear wave ROI was analyzed and a large number of features were quantified automatically by DLR²².

Some limitations have to be addressed in this study. First, this is a single-center study. Acquiring more evidence from multicenter is needed to validate this model before clinical application in the future. Second, patients with multifocal breast lesions and bilateral disease are excluded because it is difficult to determine which lesion would lead to ALN metastases and should be input in the model. Therefore, current clinical parameter combined DLR model could only be used to predict extent of ALN involvement for patients with single breast cancer. Further study is needed to build other model to predict ALN status for patients with multifocal breast lesions and bilateral disease. Third, gene markers of breast cancer like BRCA1 and BRCA2, are used to stratify patients based on the risk for disease⁴³. However, radiogenomics, focusing on the relationship between genomics and imaging phenotypes, is not available currently although it is an interesting attempt.

Conclusions

Clinical parameter combined DLR on breast conventional US and SWE images provides a noninvasive and practical way for predicting the extent of ALN involvement preoperatively and have the potential to determine appropriate axillary treatment options for patients with early-stage breast cancer. Prospective multicenter validation is expected to acquire high-level evidence for clinical use in subsequent studies.

Methods

Patients

This prospective study was approved by Institutional Review Board of Sun Yat-sen University Cancer Center. The inclusion criteria included the followings: (a) women with US-suspected breast masses; (b) availability of clinical data; (c) patients who underwent breast surgery and sentinel lymph-node biopsy or ALN dissection with curative intent. The exclusion criteria included the followings: (a) preoperative therapy (resection biopsy, neoadjuvant radiotherapy or chemotherapy); (b) patients with multifocal lesions or bilateral disease; (c) masses deeper than 3 cm in depth due to the attenuation of SWE or larger than 3.5 cm in diameter due to the limited width of the US probe; (d) unqualified 2D-SWE measurements, which means little or no shear wave signal was acquired in the ROI of SWE; (e) benign breast lesions or carcinoma in situ; (f) missing important histopathological results (immunohistochemical results or lymph-node results); (g) incomplete information or images. Verbal informed consent was obtained from all patients.

Conventional US examinations

One of five radiologists who had 18, 6, 3, 2, and 2 years of experience in breast ultrasound respectively performed preoperative breast and axillary US with Siemens S2000 ultrasound scanner (Siemens Healthineers, Mountain View, CA, USA) equipped with a 4–9 MHz linear array transducer. The target breast mass was measured at maximal-diameter plane to determine US size and classified by using US BI-RADS¹⁷. After performing whole-breast US, the same radiologists performed axillary US routinely and recorded suspicious US features of ALN. Suspicious US features of ALN include the ratio of long axis diameter to short axis diameter<2, diffuse cortical thickening>3 mm, focal cortical bulge >3 mm, eccentric cortical thickening >3 mm, rounded hypoechoic node complete or partial effacement of the fatty hilum, nonhilar cortical blood flow on color Doppler images, complete or partial replacement of the node with an ill-defined or irregular mass and microcalcifications in the node³². The result of axillary US evaluated by the experienced radiologist was regarded as positive as long as at least one suspicious US finding was found. The result of axillary US was regarded as negative when no suspicious findings of ALN were found¹⁰. To evaluate the intra-observer agreement for axillary US, one radiologist repeated evaluating the same 30 ALNs at a time interval of 1 week. Inter-observer agreement was tested by two radiologists, evaluating the same ALNs independently in another 30 ALNs.

SWE

After performing conventional US, SWE was performed thrice at the maximal-diameter plane of the breast lesion. The ROI of SWE was adjusted to include subcutaneous fat layer and superficial pectoral muscle layer, with at least 5 mm of distance from the boundary of the lesion to the lateral borders⁴⁴. With sufficient coupling material filling between probe and skin, the radiologist applied extremely slight pressure to minimize pre-compressions. When acoustic radiation force impulse was generated, patients were asked to suspend respiration for several seconds. The quality map, displayed in red-yellow-green representing low-intermediate-high quality respectively, was obtained to evaluate the quality of the SWE first. Then the velocity map of SWE was obtained. Guided by the quality map, the velocity map of SWE with fewest artifacts and the best quality was chosen and stored as the SWE image for analysis.

Data analysis

Clinical and histopathologic data were obtained from the medical records. Histopathologic results of the breast cancer included tumor type, ER status, PR status, HER-2, and Ki-67 proliferation index. Clinical data included patients age, US size, tumor location, and BI-RADS category. Histopathologic results of SLND and ALN dissection including the total number of resected lymph nodes and total number of positive nodes were recorded.

Deep learning radiomics model

The enrolled patients were randomly divided into the training cohort and independent test cohort with the ratio of 4:1 and the training cohort were then used to optimize the model parameters. We also randomly chose 25% of training images to form a validation cohort to guide the choice of hyper parameters. The whole pipeline of our model was shown in Fig. 6. Resnet was adopted as the base model which pre-trained on Imagenet^45,46. In particular, the last 1000 nodes FC layer was replaced with our specifically designed three FC layers with Xavier initialized weights⁴⁷. The detailed architecture of the network is shown in Supplementary Table 1.

**Fig. 6: The overall pipeline of the model.**

There were two steps included in the entire process, the forward computation and the backward propagation⁴⁸. Before that, the rectangular ROIs were cropped from raw US images according the tumor segmentation mask, resized to 224 × 224 pixels and normalized. The pathology type was encoded to one-hot, which was the label. In the training stage, rectangular ROIs were fed into network to update model parameters by backward propagation. The outputs of the network were used as the classification results, and the cross-entropy of the outputs and the labels were calculated as the loss function. Note that to alleviate the influence of over fitting and sample imbalance, a strategy called online data augmentation was used, which meant randomly horizontal and vertical flipping the input image, randomly cropping every ROI image from four directions in the steps of 2 pixels and feeding each category image into the network with same probability. We set learning rate to 1e-4 and applied the Adam optimizer to update the model parameters with batch size 32. The maximum iteration step was set to 5000, and the learning rate decayed by 1/2 at 2000 and 4000 steps. After training, we replaced the last FC layer with an SVM as classifier and fused the clinical information and network features to collaboratively make a decision⁴⁹.

Network feature extraction

Contrasting with hand-crafted and engineered features designed according to the previous medical experiences, DLR learnt the high-throughput image features in a supervised manner, which could make full use of all embedded information in US images^50,51. The convolutional layers encoded the input rectangular ROIs and adaptively learnt the semantic features and the FC layers then selected the relevant features and reduced the features dimensions. Supervised by the label of input images, the model updated parameters and finally led to the most relevant features in the FC layer. The penultimate FC layer output was used as the network features, which further proved efficient and effective. For comparison, clinicopathological information and network features were used to train SVMs directly to compare predictive performance of ALN status, respectively.

Multi-modal multi-sources features fusion on DLR

As described earlier, a single modal image could be encoded into network features. An additional modal image might provide more effective information. Hence the model was extended to be compatible with bimodal image inputs. A parallel model structure was designed, which contained two ResNet50 base networks and FC layers. The two ResNet50 base model shared parameters, which accepted US and SWE images as bimodal image inputs, respectively. The final convolutional layer outputs of the parallel network were concatenated and fused by the following FC layers. Network features were extracted from layers the same as one modal model. We argued that when deep models were applied to medical images analysis usually confronting small-sample learning problems, they should be combined with the clinical information²⁶. Finally, the deep features combined with clinical features were used to collaboratively train an SVM classifier for predicting ALN status. In the control group model ResNet50+C, the neurons were added to the penultimate FC layer of the ResNet50 with the same number of clinical features. During the model training, the network received dual-modal images and as the input, and the clinical features were directly input into the penultimate FC layer instead of extracting deep features and combining them offline. To use the proposed model, a rectangle ROI which cover the tumor should be manually selected as the network input.

Statistical analysis

By using SLN biopsy or ALN dissection as reference standard, the extent of ALN metastasis was divided into three groups, including disease-free axilla (N0), low metastatic burden of axillary disease (N₊(1–2)) and heavy metastatic burden of axillary disease (N₊(≥3)). The detailed clinicopathological difference of N0 and N₊(≥1) was compared by t-test or Mann-Whitney U test. AUC was used to estimate the performance of axillary US, DLR based on images only, classification by clinicopathological data and clinical parameter combined DLR, and was compared by using Delong et al⁵². or Hanley & McNeil^53,54. The other measurements like accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were also used to estimate the model performance. The calculation method is shown in Supplementary methods. Kappa test was used to compared the intra-observer agreement and inter-observer agreement. All the statistics were two side and a P-value less than 0.05 was considered statistically significant. All statistical analyses were performed using MedCalc software (V.11.2; 2011 MedCalc Software bvba, Mariakerke, Belgium), Python 3.5, matlab R2015b.

Statistics and reproducibility

Models were verified and replicated using regular machine learning metrics on independent test cohorts. We released the software of the model for replication on new data.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Excel files containing raw data included in the main figures and tables can be found in the Source Data File in the article. All other data are available in the Article and Supplementary Information. All other data including the imaging data can be provided upon reasonable request to the corresponding author.

Code availability

The software and code of the proposed method are available as Supplementary Software.

Change history

12 July 2021
A Correction to this paper has been published: https://doi.org/10.1038/s41467-021-24605-8

References

Siegel, R. L., Miller, K. D. & Jemal, A. Cancer statistics, 2018. CA Cancer J. Clin. 68, 7–30 (2018).
Article PubMed Google Scholar
Ahmed, M., Purushotham, A. D. & Douek, M. Novel techniques for sentinel lymph node biopsy in breast cancer: a systematic review. Lancet Oncol. 15, e351–e362 (2014).
Article PubMed Google Scholar
Lyman, G. H. et al. Sentinel lymph node biopsy for patients with early-stage breast cancer: american society of clinical oncology clinical practice guideline update. J. Clin. Oncol. 35, 561–564 (2017).
Article PubMed Google Scholar
Giuliano, A. E. et al. Effect of axillary dissection vs no axillary dissection on 10-year overall survival among women with invasive breast cancer and sentinel node metastasis: The ACOSOG Z0011 (Alliance) Randomized Clinical Trial. JAMA 318, 918–926 (2017).
Article PubMed PubMed Central Google Scholar
Giuliano, A. E. et al. Axillary dissection vs no axillary dissection in women with invasive breast cancer and sentinel node metastasis: a randomized clinical trial. JAMA 305, 569–575 (2011).
Article CAS PubMed PubMed Central Google Scholar
Boughey, J. C. et al. Cost modeling of preoperative axillary ultrasound and fine-needle aspiration to guide surgery for invasive breast cancer. Ann. Surg. Oncol. 17, 953–958 (2010).
Article PubMed PubMed Central Google Scholar
Langer, I. et al. Morbidity of sentinel lymph node biopsy (SLN) alone versus SLN and completion axillary lymph node dissection after breast cancer surgery: a prospective Swiss multicenter study on 659 patients. Ann. Surg. 245, 452–461 (2007).
Article PubMed PubMed Central Google Scholar
Chu, K. U., Turner, R. R., Hansen, N. M., Brennan, M. B. & Giuliano, A. E. Sentinel node metastasis in patients with breast carcinoma accurately predicts immunohistochemically detectable nonsentinel node metastasis. Ann. Surg. Oncol. 6, 756–761 (1999).
Article CAS PubMed Google Scholar
Kamath, V. J. et al. Characteristics of the sentinel lymph node in breast cancer predict further involvement of higher-echelon nodes in the axilla: a study to evaluate the need for complete axillary lymph node dissection. Arch. Surg. 136, 688–692 (2001).
Article CAS PubMed Google Scholar
Kim, G. R. et al. Preoperative axillary US in early-stage breast cancer: potential to prevent unnecessary axillary lymph node dissection. Radiology 288, 55–63 (2018).
Article PubMed Google Scholar
Cools-Lartigue, J. & Meterissian, S. Accuracy of axillary ultrasound in the diagnosis of nodal metastasis in invasive breast cancer: a review. World J. Surg. 36, 46–54 (2012).
Article PubMed Google Scholar
Youk, J. H., Son, E. J., Kim, J. A. & Gweon, H. M. Pre-operative evaluation of axillary lymph node status in patients with suspected breast cancer using shear wave elastography. Ultrasound Med. Biol. 43, 1581–1586 (2017).
Article PubMed Google Scholar
Viale, G. et al. Predicting the status of axillary sentinel lymph nodes in 4351 patients with invasive breast carcinoma treated in a single institution. Cancer 103, 492–500 (2005).
Article PubMed Google Scholar
La Verde, N. et al. Role of patient and tumor characteristics in sentinel lymph node metastasis in patients with luminal early breast cancer: an observational study. Springerplus 5, 114 (2016).
Article PubMed PubMed Central Google Scholar
Tapia, G. et al. Predicting non-sentinel lymph node metastasis in Australian breast cancer patients: are the nomograms still useful in the post-Z0011 era? Anz. J. Surg. 89, 712–717 (2019).
Article PubMed Google Scholar
Shiina, T. et al. WFUMB guidelines and recommendations for clinical use of ultrasound elastography: Part 1: basic principles and terminology. Ultrasound Med. Biol. 41, 1126–1147 (2015).
Article PubMed Google Scholar
Berg, W. A. et al. Shear-wave elastography improves the specificity of breast US: the BE1 multinational study of 939 masses. Radiology 262, 435–449 (2012).
Article PubMed Google Scholar
Evans, A. et al. Does shear wave ultrasound independently predict axillary lymph node metastasis in women with invasive breast cancer? Breast Cancer Res. Treat. 143, 153–157 (2014).
Article PubMed Google Scholar
Zhao, Q. et al. Pre-operative conventional ultrasound and sonoelastography evaluation for predicting axillary lymph node metastasis in patients with malignant breast lesions. Ultrasound Med. Biol. 44, 2587–2595 (2018).
Article PubMed Google Scholar
Kilic, F. et al. Ex vivo assessment of sentinel lymph nodes in breast cancer using shear wave elastography. J. Ultrasound Med. 35, 271–277 (2016).
Article PubMed Google Scholar
Gillies, R. J., Kinahan, P. E. & Hricak, H. Radiomics: images are more than pictures, they are data. Radiology 278, 563–577 (2016).
Article PubMed Google Scholar
Wang, K. et al. Deep learning Radiomics of shear wave elastography significantly improved diagnostic performance for assessing liver fibrosis in chronic hepatitis B: a prospective multicentre study. Gut 68, 729–741 (2019).
Article CAS PubMed Google Scholar
Huang, Y. Q. et al. Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer. J. Clin. Oncol. 34, 2157–2164 (2016).
Article PubMed Google Scholar
Wu, S. et al. Development and validation of an MRI-based radiomics signature for the preoperative prediction of lymph node metastasis in bladder cancer. EBioMedicine 34, 76–84 (2018).
Article PubMed PubMed Central Google Scholar
Liang, J. et al. Predicting malignancy in thyroid nodules: radiomics score versus 2017 American College of Radiology thyroid imaging, reporting and data system. Thyroid 28, 1024–1033 (2018).
Article CAS PubMed Google Scholar
Xie, Y. T., Zhang, J. P., Xia, Y., Fulham, M. & Zhang, Y. N. Fusing texture, shape and deep model-learned information at decision level for automated classification of lung nodules on chest CT. Inf. Fusion 42, 102–110 (2018).
Article Google Scholar
Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128, 336–359 (2020).
Farrell, T. P. et al. The Z0011 Trial: is this the end of axillary ultrasound in the pre-operative assessment of breast cancer patients? Eur. Radiol. 25, 2682–2687 (2015).
Article CAS PubMed Google Scholar
Krag, D. et al. The sentinel node in breast cancer–a multicenter validation study. N. Engl. J. Med. 339, 941–946 (1998).
Article CAS PubMed Google Scholar
Krag, D. N. et al. Technical outcomes of sentinel-lymph-node resection and conventional axillary-lymph-node dissection in patients with clinically node-negative breast cancer: results from the NSABP B-32 randomised phase III trial. Lancet Oncol. 8, 881–888 (2007).
Article CAS PubMed Google Scholar
Pesek, S., Ashikaga, T., Krag, L. E. & Krag, D. The false-negative rate of sentinel node biopsy in patients with breast cancer: a meta-analysis. World J. Surg. 36, 2239–2251 (2012).
Article PubMed PubMed Central Google Scholar
Ecanow, J. S., Abe, H., Newstead, G. M., Ecanow, D. B. & Jeske, J. M. Axillary staging of breast cancer: what the radiologist should know. Radiographics 33, 1589–1612 (2013).
Article PubMed Google Scholar
Wang, X., Chen, L., Sun, Y. & Zhang, B. Evaluation of axillary lymph node metastasis burden by preoperative ultrasound in early-stage breast cancer with needle biopsy-proven metastasis. Clin. Transl. Oncol. https://doi.org/10.1007/s12094-019-02162-3 (2019).
Lim, G. H. et al. Preoperative predictors of high and low axillary nodal burden in Z0011 eligible breast cancer patients with a positive lymph node needle biopsy result. Eur. J. Surg. Oncol. 44, 945–950 (2018).
Article PubMed Google Scholar
Bedi, D. G. et al. Cortical morphologic features of axillary lymph nodes as a predictor of metastasis in breast cancer: in vitro sonographic study. AJR Am. J. Roentgenol. 191, 646–652 (2008).
Article PubMed Google Scholar
Neal, C. H., Daly, C. P., Nees, A. V. & Helvie, M. A. Can preoperative axillary US help exclude N2 and N3 metastatic breast cancer? Radiology 257, 335–341 (2010).
Article PubMed Google Scholar
Yajima, R. et al. Prognostic value of extracapsular invasion of axillary lymph nodes combined with peritumoral vascular invasion in patients with breast cancer. Ann. Surg. Oncol. 22, 52–58 (2015).
Article PubMed Google Scholar
Sharifi, S., Peterson, M. K., Baum, J. K., Raza, S. & Schnitt, S. J. Assessment of pathologic prognostic factors in breast core needle biopsies. Mod. Pathol. 12, 941–945 (1999).
CAS PubMed Google Scholar
Harris, G. C. et al. Correlation of histologic prognostic factors in core biopsies and therapeutic excisions of invasive breast carcinoma. Am. J. Surg. Pathol. 27, 11–15 (2003).
Article PubMed Google Scholar
Cahill, R. A., Walsh, D., Landers, R. J. & Watson, R. G. Preoperative profiling of symptomatic breast cancer by diagnostic core biopsy. Ann. Surg. Oncol. 13, 45–51 (2006).
Article PubMed Google Scholar
Zhong, Y. et al. Radiomics approach to prediction of occult mediastinal lymph node metastasis of lung adenocarcinoma. AJR Am. J. Roentgenol. 211, 109–113 (2018).
Article PubMed Google Scholar
Dong, Y. et al. Preoperative prediction of sentinel lymph node metastasis in breast cancer based on radiomics of T2-weighted fat-suppression and diffusion-weighted MRI. Eur. Radiol. 28, 582–591 (2018).
Article PubMed Google Scholar
Pinker, K., Chin, J., Melsaether, A. N., Morris, E. A. & Moy, L. Precision medicine and radiogenomics in breast cancer: new approaches toward diagnosis and treatment. Radiology 287, 732–747 (2018).
Article PubMed Google Scholar
Itoh, A. et al. Breast disease: clinical application of US elastography for diagnosis. Radiology 239, 341–350 (2006).
Article PubMed Google Scholar
He, K. M., Zhang, X. Y., Ren, S. Q. & Sun, J. in IEEE Conf. Comput. Vis. Pattern Recognit. 770–778 (IEEE, 2016).
Deng, J. et al. in IEEE Conf. Comput. Vis. Pattern Recognit. 248–255 (2009).
Shen, H. Towards a mathematical understanding of the difficulty in learning with feedforward neural networks. IEEE Conf. Comput. Vis. Pattern Recognit. https://doi.org/10.1109/CVPR.2018.00091 (2018).
Shin, H. C. et al. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 35, 1285–1298 (2016).
Article PubMed Google Scholar
Chang, C. C. & Lin, C. J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 1–27 (2011).
Nie, D. et al. Multi-channel 3D deep feature learning for survival time prediction of brain tumor patients using multi-modal neuroimages. Sci. Rep. 9, 1103 (2019).
Article ADS PubMed PubMed Central CAS Google Scholar
Li, Z. J., Wang, Y. Y., Yu, J. H., Guo, Y. & Cao, W. Deep Learning based Radiomics (DLR) and its usage in noninvasive IDH1 prediction for low grade glioma. Sci. Rep. 7, 5467 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).
Article CAS PubMed MATH Google Scholar
Hanley, J. A. & McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982).
Article CAS PubMed Google Scholar
Hanley, J. A. & McNeil, B. J. A method of comparing the areas under receiver operating characteristic curves derived from the same cases. Radiology 148, 839–843 (1983).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

The work was supported by National Key R&D Program of China [No. 2019YFC0118300], National Natural Science Foundation of China [No. 81971631], Major Research plan of the National Natural Science Foundation of China [No. 91959127], Shanghai Municipal Science and Technology Major Project [No. 2018SHZDZX01].

Author information

These authors contributed equally: Xueyi Zheng, Zhao Yao, Yini Huang, Yanyan Yu.

Authors and Affiliations

Department of Ultrasound, Sun Yat-Sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Guangzhou, China
Xueyi Zheng, Yini Huang, Yun Wang, Yubo Liu, Rushuang Mao, Fei Li, Yixin Hu & Jianhua Zhou
Department of Electronic Engineering, Fudan University, Shanghai, China
Zhao Yao, Yuanyuan Wang & Jinhua Yu
Paul C. Lauterbur Research Center for Biomedical Imaging, Institute of Biomedical and Health Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Yanyan Yu & Yang Xiao
The key laboratory of medical imaging computing and computer assisted intervention of Shanghai, Shanghai, China
Yuanyuan Wang & Jinhua Yu

Authors

Xueyi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Yao
View author publications
You can also search for this author in PubMed Google Scholar
Yini Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Yu
View author publications
You can also search for this author in PubMed Google Scholar
Yun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yubo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rushuang Mao
View author publications
You can also search for this author in PubMed Google Scholar
Fei Li
View author publications
You can also search for this author in PubMed Google Scholar
Yang Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yixin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jinhua Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jianhua Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.H.Z. and J.H.Y. conceived and designed the project; Y.B.L., Y.W., Y.N.H., F.L., R.S.M., Y.X.H., and X.Y.Z. performed the research and collected the data; Z.Y., Y.Y.Y., Y.X., and Y.Y.W. analyzed the data; Z.Y., Y.Y.Y., and J.H.Y. proposed the model; X.Y.Z. Z.Y., J.H.Y., and J.H.Z wrote the paper. All authors read and approved the final version of the article.

Corresponding authors

Correspondence to Jinhua Yu or Jianhua Zhou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks Andrew Evans and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Peer Review File

Description of Additional Supplementary Files

Supplementary Software 1

Reporting Summary

Source data

Source Data

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Zheng, X., Yao, Z., Huang, Y. et al. Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat Commun 11, 1236 (2020). https://doi.org/10.1038/s41467-020-15027-z

Download citation

Received: 22 July 2019
Accepted: 14 February 2020
Published: 06 March 2020
DOI: https://doi.org/10.1038/s41467-020-15027-z

This article is cited by

An integrated model incorporating deep learning, hand-crafted radiomics and clinical and US features to diagnose central lymph node metastasis in patients with papillary thyroid cancer
- Yang Gao
- Weizhen Wang
- Yingjia Li
BMC Cancer (2024)
Exploring non-invasive precision treatment in non-small cell lung cancer patients through deep learning radiomics across imaging features and molecular phenotypes
- Xingping Zhang
- Guijuan Zhang
- Yanchun Zhang
Biomarker Research (2024)
A validation of an entropy-based artificial intelligence for ultrasound data in breast tumors
- Zhibin Huang
- Keen Yang
- Fajin Dong
BMC Medical Informatics and Decision Making (2024)
Deep learning-assisted diagnosis of benign and malignant parotid tumors based on ultrasound: a retrospective study
- Tian Jiang
- Chen Chen
- Dong Xu
BMC Cancer (2024)
Intra- and peritumoral radiomics features based on multicenter automatic breast volume scanner for noninvasive and preoperative prediction of HER2 status in breast cancer: a model ensemble research
- Hui Wang
- Wei Chen
- Shunlin Guo
Scientific Reports (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Baseline characters

Base model selection and clinical information integration

Prediction of ALN status between N0 and N+(≥1)

Prediction of ALN status between N+(1–2) and N+(≥3)

Prediction of ALN status among N0, N+(1–2) and N+(≥3)

Interpretability of DLR model

Discussion

Conclusions

Methods

Patients

Conventional US examinations

SWE

Data analysis

Deep learning radiomics model

Network feature extraction

Multi-modal multi-sources features fusion on DLR

Statistical analysis

Statistics and reproducibility

Reporting summary

Data availability

Code availability

Change history

12 July 2021

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links

Prediction of ALN status between N0 and N₊(≥1)

Prediction of ALN status between N₊(1–2) and N₊(≥3)

Prediction of ALN status among N0, N₊(1–2) and N₊(≥3)