Discrimination of Breast Cancer with Microcalcifications on Mammography by Deep Learning

Microcalcification is an effective indicator of early breast cancer. To improve the diagnostic accuracy of microcalcifications, this study evaluates the performance of deep learning-based models on large datasets for its discrimination. A semi-automated segmentation method was used to characterize all microcalcifications. A discrimination classifier model was constructed to assess the accuracies of microcalcifications and breast masses, either in isolation or combination, for classifying breast lesions. Performances were compared to benchmark models. Our deep learning model achieved a discriminative accuracy of 87.3% if microcalcifications were characterized alone, compared to 85.8% with a support vector machine. The accuracies were 61.3% for both methods with masses alone and improved to 89.7% and 85.8% after the combined analysis with microcalcifications. Image segmentation with our deep learning model yielded 15, 26 and 41 features for the three scenarios, respectively. Overall, deep learning based on large datasets was superior to standard methods for the discrimination of microcalcifications. Accuracy was increased by adopting a combinatorial approach to detect microcalcifications and masses simultaneously. This may have clinical value for early detection and treatment of breast cancer.

Scientific RepoRts | 6:27327 | DOI: 10.1038/srep27327 to demonstrate an explicit gradient for feature complexity in the ventral pathway of the human brain 26 ; deep learning was applied to determine the sequence specificities of DNA and RNA-binding proteins for identifying causal disease variants 27 ; superpixel and deep learning were used for automatic vaginal bacteria segmentation and classification 28 ; some deep learning-based latent feature representations are proposed for diagnosis of Alzheimer's disease and its prodromal stage, mild cognitive impairment (MCI), such as stacked auto-encoder and deep boltzmann machine 32,33 . However, only few works have explored deep learning methods to address the automatic classification of identified lesions on mammography. A nice learning framework for breast cancer diagnosis in mammography by convolutional neural networks was reported 34 . The tested data were preprocessed images. A convolutional sparse autoencoder was proposed for mammographic texture scoring 35 .
Deep learning comprises a neural network with multiple hidden layers that enhances the recognition accuracy of images, audio and other data types; thereby increasing its versatility for capturing representative features. Deep learning outperforms other state-of-the-art methods in many areas and has solved complicated pattern recognition problems, especially in big data situations [36][37][38][39] . Stacked denoising autoencoder model is one of the most successful deep learning strategies. The deep architecture can be used to discover latent or hidden representation efficiently inherent in the low-level features from modalities, and ultimately to enhance classification accuracy. In this study, with a stacked denoising auto-encoder, an innovative deep learning-based model was employed to retrospectively analyze a large sample of microcalcifications with or without masses on mammography. Its performance and accuracy in classifying and discriminating breast lesions were compared with benchmark models.

Results
The training group consisted of 1000 images, including 677 benign and 323 malignant lesions. The test group consisted of 204 images, including 97 benign and 107 malignant lesions. Table 1 shows the histopathological distributions of the lesions in both groups. Data about microcalcifications and suspicious breast masses were extracted through image segmentation. Both statistical and textural features were used to classify image features and obtain comprehensive characterization of the microcalcifications and masses. A total of 41 quantitative measurements were recorded for each patient. Detailed information is provided in the Appendix File S1. Fifteen microcalcifications features and twenty-six breast masses features were feed into the comparative classifiers, including SVM, LDA, and KNN. These features were selected since they have been shown to improve the performance of standard machine learning classifiers in earlier researches on breast lesions 18,19,23,24,34,40,41 . Figure 1 illustrates an automatic detection and segmentation pipeline to identify suspicious microcalcifications and masses in the left breast of a 60-year-old patient with invasive ductal carcinoma. The microcalcifications were extracted from the raw data to delineate the image characteristics ( Fig. 1(b)). Figure 2 shows that this method could accurately detect and extract suspicious microcalcifications from the background of a low-density image showing the left breast of a 56-year-old patient with ductal carcinoma in situ. This demonstrated the high accuracy and robustness of the image segmentation pipeline. Figure 3 shows the image of the right breast of a 49-year-old patient with fibrocystic changes in which the focal microcalcifications appear low contrast compared with the high-density background. Extraction of suspicious microcalcifications is a challenging task, however, these results demonstrated that our segmentation model was able to accurately identify and extract microcalcifications from the images to facilitate characterization.    In order to evaluate the performance and discriminative power of the deep learning model (DL), quantitative measurements for overall classification accuracy (acc), sensitivity, specificity and the area under the receiver operating characteristic (ROC) curve (AUC) were calculated as follows: TP TN  TP FP FN TN   sensitivity  TP  TP FN   specificity  TN  FP TN   100% where TP, FN, TN and FP represent the true positives, false negatives, true negatives and false positives, respectively.
Previous reports have suggested that the discriminative performances of classifiers can be increased through comprehensive characterization of microcalcifications as opposed to characterization of individual features. In agreement with these reports, our deep learning-based model achieved similar outcomes, as demonstrated by the ROC curves in Fig. 4. Therefore, this approach was used in the following experiments.
Three scenarios for discriminating between malignant and benign lesions were examined: microcalcifications alone; breast masses alone; and microcalcifications and breast masses in combination. The primary aims of the three scenarios were to investigate the discrimination power of microcalcifications, masses or their combination in differentiation of the lesions types. The results were compared to those of SVM, KNN and LDA benchmark classifiers.
The structure of a SAE network is decided by the size of the input layer, the number of hidden layers, and the number of hidden units in each hidden layer. Through the experiments, the data of microcalcifications alone was used as the input in the first scenario; the data of breast masses alone was served as the input in the second scenario; and the data of microcalcifications and breast masses in combination was served as the input in the third scenario. We used the SAE model to classify malignant and benign lesions in three scenarios. The optimal hyper parameters for the three scenarios were estimated by 10 fold cross-validation on training group. For the first scenario, the trained architecture consisted of two hidden layers, and the number of hidden units in each hidden layer was [200,200], respectively. For the second scenario, the trained architecture consisted of two hidden layers, and the number of hidden units in each hidden layer was [200,200], respectively. For the third scenario, the trained architecture consisted of two hidden layers, and the number of hidden units in each hidden layer was [400, 400].
In the first scenario, image segmentation yielded 15 features. The overall accuracies were 85.8%, 83.8%, 58.8% and 87.3% for the SVM, KNN, LDA and DL models, respectively. The DL model also achieved the highest specificity and AUC values (0.82 and 0.87, respectively). The results are summarized in Table 2; the ROC curves in Fig. 5(a) provided visual comparisons between the models.
In the second scenario, based on breast masses alone, image segmentation yielded 26 features. The results are summarized in Table 3, and the ROC curves are shown in Fig. 5(b). The overall accuracies were markedly lower in all of the models, at 61.3%, 58.8%, 53.4% and 61.3% for SVM, KNN, LDA and DL respectively. Furthermore, the performance of the DL model was only marginally higher than that of the SVM model. Despite this finding, the sensitivity of the model was approximately 100%, indicating that patients who tested positive all had breast masses. As such, this method may facilitate diagnosis in benign cases; however, it may not serve as a valid diagnostic tool in clinical practice.
In the third scenario, based on a combinatorial approach by analyzing microcalcifications and breast masses simultaneously, image segmentation yielded 41 features. The overall accuracies were 85.8%, 84.3%, 74.0% and 89.7% for the SVM, KNN, LDA and DL models, respectively. Furthermore, the DL model achieved the highest  Table 4 and the ROC curves are shown in Fig. 5(b).
These findings confirmed that by accessing a large dataset, the deep learning model produced a higher number of representative segmentation features and exhibited greater overall accuracy for discriminating between malignant and benign breast lesions through mammography compared to standard models. Furthermore, the discriminative power of the deep learning model was greatest if a combinatorial approach was applied to characterize microcalcifications and breast masses simultaneously.

Discussion
Mammography is considered the primary imaging modality for early detection and treatment of breast cancer; however, achieving accurate diagnoses through mammography is often challenging for radiologists due to the difficulty of distinguishing the features of malignant symptoms in images [42][43][44] . Consequently, considerable research is being undertaken to develop computer-based applications including various classification models to overcome these challenges 10,12,14,[45][46][47] .
Microcalcifications are highly correlated with breast cancer 2,3,7 , therefore, the aim of this investigation was to evaluate the performance of an innovative deep learning model for classifying breast lesions. The results demonstrated that deep learning not only enabled accurate segmentation of microcalcifications but also provided an efficient analysis of their characteristics, leading to a marked improvement in discriminating between benign and malignant breast lesions compared to more standard SVM, KNN and LDA methods. This may have particular significance for cases in which microcalcifications are the only indicator of malignant lesions 4,11,48,49 .
Deep learning-based models employing large sample sets show greater discriminative performance in classifying microcalcifications through mammography compared to other machine learning methods. Compared to other methods, deep learning-based models provide a higher number of image segmentation features and help enhance the diagnostic accuracy through comprehensive characterization of these features. The discriminative power of deep learning can be increased by adopting a combinatorial approach to classify microcalcifications and masses simultaneously. Our results suggest that deep learning based-models on large datasets are promising in the earlier detection and treatment of breast cancer by identifying microcalcifications on mammograms.
Breast masses are also know to exhibit distinct features that vary from benign to malignant lesions 2 ; however, machine-based methods are generally based on detecting microcalcifications or breast masses in isolation. In contrast, reports on methods that detect microcalcifications and masses simultaneously are scarce. In this study, we carried out a provisional and innovative trial using our deep learning-based model to distinguish both features in combination. The results showed that this combinatorial approach enhanced the diagnostic sensitivity of the model in patients presenting with both microcalcifications and masses. This implied that deep learning may offer an advanced statistical method for differentiating mammographic microcalcifications with greater accuracy and sensitivity, both in the presence or absence of breast masses. Not only could this facilitate earlier and more accurate classification of breast cancer, but also improve prognosis through timely treatment in malignant cases. It may also help avoid unnecessary surgical procedures, including total resection, and psychological and physiological pain in benign cases.
However, the current study suffered from the following limitations. First, the testing dataset should to be expended to provide more benign and malignant samples in order to achieve higher statistical power. In addition, by increasing the number of cases with breast masses, either alone or with microcalcifications, would allow deeper examination of the combinatorial approach and facilitate establishing the optimal diagnostic performance of our model and its potential value in future applications. Second, the features investigated in present study may not so sufficient enough to fully characterize microcalcifications, future studies will extract more. By selecting the most discriminative subset of them and optimizing the selection of various features, it helps improve the performance of deep learning in the classification stage. The current study was aiming to employ powerful deep learning based classifier to discriminate breast lesions by microcalcifications with or without the combined analysis of masses. With the settlement of problems addressed before, the nice performance of our trial in using deep learning opens a way to aid radiologist's diagnostic performance. It further facilitates the systematical investigation of breast cancer for early detection, diagnosis and clinical management.   Imaging and analysis. Images were obtained on a GE Senographe DS mammography system and a Siemens Mammomat Inspiration mammography system. Craniocaudal (CC) and mediolateral oblique (MLO) projections were obtained for each breast. All images were digitized at a resolution of 1024 × 1024 pixels and at 8-bit gray scale level. Taking the raw image directly may bring in a large bias due to image deformation, uniform background    illumination, uneven imaging angle and position. Such problems may deteriorate the classification performance.

Methods
To alleviate the problems, this study used various types of features that were widely used in researches on breast lesions as input data instead of original images 34,40,41 . We not only considered the features invariant to rotation, but also the features invariant to rotation, scaling, and translation. A previously reported computerized segmentation approach 29 was used to extract any suspicious microcalcifications and masses from each image. Data about microcalcifications and suspicious breast masses were extracted through image segmentation. Both statistical and textural features were used to classify image features and obtain comprehensive characterization of microcalcifications and breast masses. A total of 41 quantitative measurements were recorded for each patient. Fifteen microcalcifications features and twenty-six breast masses features, estimated from the region of interests, were selected instead of original images as the input data for SAE model. The extracted features from mammograms aimed to provide comprehensive characterization of the image as much as possible. They consisted of intensity, statistic, shape and texture features. These features were extensively reported and tested widely in researches on breast les ions 18,19,23,24,34,40,41 . The 15 microcalcifications features were selected to describe different dimensional aspects of microcalcifications, including one-dimensional shape features (average diameter), two-dimensional morphological features ( microcalcifications area), fractal dimensional features (microcalcifications density, circularity proportion, solidity, sandy microcalcification, spiculation, volume ratio), gray level intensity statistics features (mean gray value), and statistic feature (microcalcifications number, circularity, linear microcalcification). The 26 breast masses features also characterized different aspects of masses, including morphological features (breast masses area), fractal dimensional features (solidity, elongation, axis ratio, heterogeneity, spiculation, volume ratio, convexity), texture features (mean gray, maximum gray, gray relativity, entropy, inverse difference entropy, difference entropy, correlation, difference variance, sum average, sum variance, energy, mutual information). Detailed information about the features was provided in the Appendix File S1. Once the comprehensive characterization for each lesion done, its feature description was feed into the deep learning model to classify its type into benign or malignant.
Deep learning model. Deep learning is a machine learning model with multiple hidden layers that learns inherent rules and features of large data sets. A stacked autoencoder (SAE) creates a deep network by stacking multiple autoencoders hierarchically 31,34,35 . Each autoencoder is a neural network (NN) that attempts to reproduce its input; the output of each autoencoder is used as the training set for the next autoencoder. More specifically, in an SAE with n layers, the first layer is trained as an autoencoder to obtain the first hidden layer, and the output of the k th hidden layer is used as the input of the (k+1) th hidden layer. In this study, 15 microcalcifications features and 26 breast masses features were selected instead of original images as the input data for SAE model, respectively. The SAE model was trained in a layer-wise greedy fashion to learn low-level features of microcalcifications from input data according to the following mathematical procedures: Training samples were denoted as ( ) ; an autoencoder encoded inputx (i) to a hidden representation y x ( ) i ( ) through a deterministic mapping function: Conversely, the autoencoder decoded the representation y x ( ) i ( ) back into a reconstruction through a second deterministic mapping function: where W 1 is a weight matrix, W 2 is a decoding matrix, b 1 is an encoding bias vector, and b 2 is a decoding bias vector. A logistic sigmoid function: = + − f x ( ) e 1 1 x and = + − g x ( ) e 1 1 x was used in this study. The objective of an autoencoder was to minimize the reconstruction error by applying the following formula: The encoding procedure was carried out from the first layer to the last layer by the following formulas: The decoding procedure was calculated from the last layer to the first layer by the following formulas: n k n k n k n k ( 1 ) ( ,2) ( ) ( where W k ( ,1) is a weight matrix of the k th autoencoder, W (k,2) is a decoding matrix of the k th autoencoder, b k ( ,1) is an encoding bias vector of the k th autoencoder, and b k ( ,2) is a decoding bias vector of the k th autoencoder, f x ( ) is sigmoid function, a k ( ) is sigmoid value. We added a softmax classifier on the top layer of the SAE network to create the deep learning model for analyzing breast lesions [50][51][52][53][54][55] .