Automatic identification of triple negative breast cancer in ultrasonography using a deep convolutional neural network

Triple negative (TN) breast cancer is a subtype of breast cancer which is difficult for early detection and the prognosis is poor. In this paper, 910 benign and 934 malignant (110 TN and 824 NTN) B-mode breast ultrasound images were collected. A Resnet50 deep convolutional neural network was fine-tuned. The results showed that the averaged area under the receiver operating characteristic curve (AUC) of discriminating malignant from benign ones were 0.9789 (benign vs. TN), 0.9689 (benign vs. NTN). To discriminate TN from NTN breast cancer, the AUC was 0.9000, the accuracy was 88.89%, the sensitivity was 87.5%, and the specificity was 90.00%. It showed that the computer-aided system based on DCNN is expected to be a promising noninvasive clinical tool for ultrasound diagnosis of TN breast cancer.

Previously, in order to perform TN breast cancer evaluation, some studies based on magnetic resonance (MR) images were reported. Agner et al. 23 used linear discriminant analysis combined with support vector machine on dynamic contrast material-enhanced MR images, and found good discrimination of TN cancers from NTN cancers. In some ultrasound image related studies, machine learning had been used to evaluate TN breast cancers. Wu et al. 24 analyzed 140 breast masses ultrasound images (including 23 cases of TN breast cancer and 117 cases of NTN breast cancer), and concluded that machine learning can distinguish ultrasound images between TN and NTN breast cancers. Compared with traditional machine learning methods with limited accuracy, deep convolution neural network (DCNN) [25][26][27][28] with the advantage of automatic feature extraction and accurate classification has been widely used in medical field in recent years, i.e. ophthalmology 29 , dermatology 30 , orthopedics 31 , etc. [32][33][34] , which has shown similar performance to doctors. In order to test whether the application of DCNN can improve the accuracy of computer aided diagnosis of breast masses, 1844 breast mass greyscale ultrasound images were retrospectively analyzed in this work and a deep learning-based algorithm was developed for automated detection of benign or malignant breast mass. Furthermore, the ability of DCNN to distinguish TN breast cancers from NTN breast cancer was also discussed.

Results
Patients. This retrospective study reviewed the patients underwent breast ultrasound examination between February 2018 and March 2019 in the First Affiliated Hospital of Nanjing Medical University, China. Ultrasound data were also obtained from Chinese medicine Hospital of Jiangsu Province as external test. The study was approved by the institutional review committee of the First Affiliated Hospital of Nanjing Medical University and Chinese medicine Hospital of Jiangsu Province. The informed consent was obtained from all patients. All research methods were conducted in accordance with the ethical guidelines of the Helsinki Declaration. In the present study, the B mode ultrasound images were used. The criteria of inclusion were as follows: (a) older than eighteen years old; (b) patients underwent biopsy or operation with determinate histopathologically or immunohistochemically; (c) no preoperative treatment or intervention (radiotherapy, chemotherapy, ablation) before ultrasound examination. In total, 1618 images of 1261 patients comprised the benign and malignant training cohort and a test cohort of 226 images from 185 patients was screened with the same criteria from the First Affiliated Hospital of Nanjing Medical University. Patients were grouped into training and test cohorts randomly. All the data, including age, sex, pathologic findings, and ultrasound reports, were derived from the Picture Archiving and Communication Systems (PACS) system. For the TN and NTN training cohort, we randomly selected 102 images of TN breast cancers diagnosed by immunohistochemistry, and randomly selected 102 images of NTN breast cancers at the same time. We randomly selected 8 images of TN breast cancers and 10 images of NTN breast cancers to form the test set. External test cohort includes 29 TN and 28 NTN images were from Chinese medicine Hospital of Jiangsu Province. Figure 1 shows the inclusion criteria for the cohort in this study and the experiment procedure.
Examination technique. The enrolled ultrasound images were mainly performed with three commercially available ultrasound machines: (1) Esaote (Genova, Italy; twice MyLab). (2) Siemens (Buffalo, USA; S3000). (3) Philips (Amsterdam, Netherlands; EPIQ 7), and a small numbers data were from other manufacturers. A linear array probe with a frequency bandwidth of 6-15 MHz was used for this study. In the ultrasound examination, the patient was placed supine, exposing the bilateral breasts, and then scanned laterally, longitudinally, and obliquely. All the US examinations were performed by three radiologists (J.H., J.C., and W.X.Z) with fifteen, ten and six years of experience in breast ultrasound respectively. Image analysis. Two radiologists (J.H., J.C.) performed independent interpretations blinded to pathology results. Interpretations were performed by the evaluation of the primary breast cancer ultrasound images according to the BI-RADS with typical characteristics, including the mass size, shape, orientation, margin, echo pattern, the presence of calcifications. If the classification of the two radiologists was the same, it would be regarded as the final result of the radiologist's interpretation. If the two radiologists (J.H., J.C.)' result did not agree with each other, extra interpretation needed to be performed by the third radiologist (X.H.Y) with twenty five years of experience in breast ultrasound.
The histological and immunohistochemical results were used as gold standard in this study. The core needle biopsy (CNB) was performed on solid breast masses categorized as probably malignant (BI-RADS classes 4A, 4B, 4C and 5). A 14G core needle was used under ultrasound guidance. After the operation, radiotherapy or chemotherapy were performed on malignant breast mass. Follow-up was performed on breast masses with BI-RADS classes 2 and 3, which was recommended in clinical routine 4 . And the breast masses which remained unchanged for more than 2-year follow-up were taken as benign images in the study.
Demographics and histopathologic findings. In this study cohort, a total number of 1844 images extracted from 1446 patients were enrolled in this retrospective study. All the pathological results were summarized in Table 1. The first training data consisted of 1618 images from 1261 patients, including 820 images for benign cohort (598 female, mean age 39.65 ± 10.36 ys) and 798 images for malignant cohort (3 male, mean age 48.30 ± 6.70 ys and 660 female, mean age 52.80 ± 11.50 ys). The 226 images in the first test set were acquired from 185 patients, including 90 benign images (79 female, mean age 40.54 ± 10.23 ys) and 136 malignant images (106 female, mean age 51.02 ± 11.97 ys). The malignant images in the first test set included 40 TN images and 96 NTN images. All the basic information of selected images were collected in Table 2. Figure 2 shows images for a 52-year-old woman with TN breast cancer, where Fig. 2a  Benign and malignant. In this study, an artificial neural network were developed, in which the boundaries of the masses were not required to be drawn by the radiologist. The mass packing by surrounding the calipers used in the clinic for mass measurement was used. A DCNN was trained and used to classify benign and malignant breast masses, and the results were compared with those reported by the radiologist based on BI-RADS. First, a nine fold cross validation was used to verify the validity and performance of this algorithm. Table 2 lists basic information of the images. Those imaging data were used to train the network and determine the hyper parameter of the network. The training data were divided into nine sub-datasets, and ninefold validation   (Table 3). Then, differentiating benign and malignant masses for the test set (226 images from 185 patients) was performed using the trained algorithm. There were 90 benign images, 40 TN breast cancer images and 96 NTN breast cancer images. The results showed that the AUC of discriminating TN breast cancer from benign ones was 0.9789, the accuracy was 94.62%, the sensitivity was 92.50%, and the specificity was 95.56%. The AUC of discriminating NTN breast cancer from benign ones of the proposed method was 0.9689, the accuracy was 93.01%, the sensitivity was 90.62%, and the specificity was 95.56% (Fig. 3).   www.nature.com/scientificreports/ For comparison, the test set were also evaluated by the radiologist according to BI-RADS. The AUC of radiologist's discriminating TN breast cancer from benign ones was 0.9857, the accuracy was 94.44%, the sensitivity was 92.73%, and the specificity was 95.51%. The AUC of radiologist's discriminating NTN breast cancer from benign ones of the proposed method was 0.9598, the accuracy was 89.02%, the sensitivity was 82.14%, and the specificity was 95.51%. Statistical analysis showed that there was no significant difference between the algorithm and that of expert radiologist (p > 0.5) 35,36 . TN and NTN. A 18 fold cross validation was used to verify the validity and performance of this algorithm. 204 images (102 NTN breast cancer, and 102 TN breast cancer) were used to train the network and determine the hyper parameter of the network. The training data were divided into 18 sub-datasets, and cross validation was performed. The AUC of the algorithm was 0.8746 (95% CI 0.8347, 0.9145) (Fig. 3). The averaged accuracy was 88.75% (95% CI 86.31%, 91.19%), the averaged sensitivity was 87.35% (95% CI 83.27%, 91.44%), and the averaged specificity was 90.32% (95% CI 85.83%, 94.80%). The AUC of radiologist's discriminating triple negative breast cancer from non-triple ones was 0.4461, the accuracy was 53.80%, the sensitivity was 71.91%, and the specificity was 34.15% (Table 4). 10 NTN breast cancers, and 8 TN cancers were tested. The AUC of discriminating TN breast cancer from NTN ones was 0.9000, the accuracy was 88.89%, the sensitivity was 87.50%, and the specificity was 90.00%. 29 TN samples and 28 NTN samples obtained from Chinese medicine Hospital of Jiangsu Province were used for the external test, the AUC is 0.8817, the accuracy is 89.94%, the sensitivity is 86.67%, and the specificity is 93.33%.

Discussion
Benign and malignant. In recent years, due to an increased need for efficient and objective evaluation of ultrasound images, artificial intelligence ultrasound diagnosis has been widely studied. The computer-aided system based on deep learning, due to its self-learning ability, has achieved good results in diagnosing breast masses. Fujioka et al. 37 retrospectively analyzed 480 benign and 467 malignant breast mass ultrasound images, and constructed deep learning model of differentiating breast masses by convolution neutral network architec-  38 introduced ImageNet-pretrained VGG19 with finetuning and matching layer at input based on a set of 882 breast mass ultrasound images to classify breast mass, and reached an AUC of 0.936. Though these studies achieved high accuracy and proved to be a useful tool for breast mass classification, fewer images were used, especially for malignant masses. In this study, a larger cohort including 1844 images were retrospectively analyzed to develop an artificial neural network, and the AUC of the algorithm reached 0.9504. The result of the algorithm was consistent with those reported by previous studies, and demonstrated that deep learning technology could help radiologists to classify breast masses in ultrasound images.

TN and NTN. While molecular markers can be evaluated from tissues obtained in biopsy or surgery, it is
invasive and subject to the tissue sampling bias problem. TN breast cancer is associated with aggressive histology, poor clinical prognosis, unresponsiveness to usual endocrine therapies and shorter survival. If it is possible to predict the presence of TN breast cancer based on noninvasive ultrasound features, this information will be beneficial for both pretreatment planning and prognosis, and will add to the understanding of the biological behavior of this disease. Nowadays, much attention focusing on deep learning methods has been paid to TN breast cancer using MR images. Agner et al. 24 used linear discriminant analysis combined with support vector machine classifier, achieved an AUC of 0.73 (95% CI 0.59, 0.87) for TN breast cancer versus NTN breast cancer. Koo et al. 39 found the usefulness of computer-aided breast MR diagnosis in predicting the level of tumor-infiltrating lymphocytes in TN breast cancers. However, there were few studies focused on intelligent recognition of breast cancer ultrasonic images at the genetic and cellular levels. Guo et al. 40 reported an automatic radiomics approach to assess the associations between quantitative ultrasound features and biological characteristics. They used a support vector machine classifier with three-fold-cross-validation to evaluate a strong correlationship between ultrasound features and biological characteristics. Wu et al. 24 discriminated 140 cases with logistic regression including 23 TN and 117 NTN, and the specificity was 82.05%, the sensitivity was 78.26%. However, the enrolled images were extracted from a manually drawn region of interest in those studies. In comparison, the boundaries of the masses were not required to be drawn in this work, which could overcome the defect of operator dependence and avoid to add additional work to radiologist. The results of our study showed that the AUC of discriminating TN breast cancer from NTN ones was 0.9000, the accuracy was 88.89%, the sensitivity was 87.50%, and the specificity was 90.00%, which showed an improvement of accuracy.
The low accuracy of radiologists distinguishing between TN vs NTN. The AUC of radiologist's discriminating TN breast cancer from non-triple ones was 0.4461, the accuracy, sensitivity and specificity were also very low. Because we used the BI-RADS score of the radiologists during the usual B-mode ultrasound diagnostic routine process, and it did not differentiate molecular subtypes. This was the reason for the low diagnosis efficiency of radiologists in distinguishing between TN vs NTN.
Originality & clinical significance. Restnet50 was proposed in Ref 24, in this paper we did some finetuned work to adapt to our specific task. www.nature.com/scientificreports/ The accuracy of discriminating benign and malignant breast mass ultrasound images was a little higher than the previous work. One reason is that we collected more training data. The other reason is the power of deeper layers of Resnet and our fine-tuned techniques help to achieve more characteristic of the B-mode breast mass ultrasound images.
The main clinical significance of this paper is discriminating TN breast cancer from non-triple ones. To predict the presence of TN breast cancer with B-mode ultrasound features, it will be beneficial for pretreatment planning and prognosis. Because it is noninvasive, it can reduce the suffering of patients. And it will add to the understanding of the biological behavior of this disease.

Bias of Selection of data.
There is no bias of selection of data in the paper.
Step 1: The 820 benign and 798 malignant ultrasound images for training, 90 benign and 136 malignant images for testing were randomly selected from the records between February 2018 and March 2019 in hospital. Further, we did ninefold cross validation, which showed the robustness of the algorithm.
Step 2:102 for training and 8 for testing TN ultrasound images were the total patients in the hospital between February 2018 and March 2019. 102 for training and 10 for testing NTN ultrasound images were randomly selected from the malignant images used in Step 1. Then we enrolled 8 images of TN breast cancers and randomly selected 10 images of NTN breast cancers to form the test set. Patients were grouped into training and test cohorts randomly. Further, we did 18 fold cross validation, which showed the robustness of the algorithm.
There are different types of ultrasound machines in different hospitals, even in one hospital. To make sure the proposed method could be applied in different equipment, the original data from the different ultrasound machines were used. Tables 5 and 6 list the data distribution collected from different machines. It suggests that the proposed method could be applied to different ultrasound machines.
Limitations. This study still had some limitations. The sample size of this study was small and the data were derived from only a single center. The specificity of discriminating NTN from TN breast cancer using the proposed DCNN is around 90%, which means around 10% of tumors could be misdiagnosed. Future studies involving larger number of multi-center patients are warranted to demonstrate the viability of the proposed approach in clinical settings. If add more patients for verification, the accuracy of the model could be further verified and improved.

Conclusion.
In this paper, a deep learning-based algorithm was developed for the automated diagnosis of benign and malignant breast masses, and for further identification of TN breast cancer. The results showed the computer-aided system based on DCNN is expected to be a promising noninvasive clinical tool for ultrasound diagnosis of TN breast cancer.

Fine-tuned DCNN. Convolutional neural network consists of a set of convolution and pooling operations
applied to obtain complex features from the input image. These features are flattened into a vector. The output of the model is a collection of continuous variables that represented the predicted probabilities for each category.
Deeper neural networks are more difficult to train. Resnet 28 , residual learning framework,was presented to ease the training of networks that are substantially deeper than those used previously. Resnet explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. Comprehensive empirical evidence showed that Resnet are easier to optimize, and can gain accuracy from considerably increased depth. In this work, the Resnet50 DCNN is fine-tuned for the diagnosis of breast masses. The detailed mechanism is as below.
(i) The building block is defined as where x and y are the input and output vectors of the layers considered. The function F(x, {W i }) represents the residual mapping to be learned. The operation F + x is performed by a shortcut connection. (ii) The image density resolution is normalized to 8 bit density and 224 × 224 size. (iii) To apply the network in this study, the last fully connected layer is changed for a global average pool layer, followed by a full connection layer and Softmax. (iv) The weights are optimized by the adaptive learning rate (adadelta) optimization algorithm as: It is optimized in 12 mini-batch size. The parameters, i.e. the learning rate and the maximum epoch number are set at 0.0001 and 148 respectively. The loss function is identified as binary cross entropy. Figure 4 shows the structure of the network. Tensorflow 2.0 and tensorboard are used to implement the DCNN. The training is conducted on computers equipped with Intel Core i5-8300h CPU and 8 GB memory, NVIDIA geforce GTX 1060 GPU and 6 GB memory. Figure 4. The network structure of the method.