A novel breast cancer image classification model based on multiscale texture feature analysis and dynamic learning

Guo, Jia; Yuan, Hao; Shi, Binghua; Zheng, Xiaofeng; Zhang, Ziteng; Li, Hongyan; Sato, Yuji

doi:10.1038/s41598-024-57891-5

Download PDF

Article
Open access
Published: 27 March 2024

A novel breast cancer image classification model based on multiscale texture feature analysis and dynamic learning

Jia Guo^1,4,5,
Hao Yuan^1,4,5,
Binghua Shi^1,4,5,
Xiaofeng Zheng²,
Ziteng Zhang^1,4,5,
Hongyan Li^1,4,5 &
…
Yuji Sato³

Scientific Reports volume 14, Article number: 7216 (2024) Cite this article

518 Accesses
Metrics details

Subjects

Abstract

Assistive medical image classifiers can greatly reduce the workload of medical personnel. However, traditional machine learning methods require large amounts of well-labeled data and long learning times to solve medical image classification problems, which can lead to high training costs and poor applicability. To address this problem, a novel unsupervised breast cancer image classification model based on multiscale texture analysis and a dynamic learning strategy for mammograms is proposed in this paper. First, a gray-level cooccurrence matrix and Tamura coarseness are used to transfer images to multiscale texture feature vectors. Then, an unsupervised dynamic learning mechanism is used to classify these vectors. In the simulation experiments with a resolution of 40 pixels, the accuracy, precision, F1-score and AUC of the proposed method reach 91.500%, 92.780%, 91.370%, and 91.500%, respectively. The experimental results show that the proposed method can provide an effective reference for breast cancer diagnosis.

Microenvironmental reorganization in brain tumors following radiotherapy and recurrence revealed by hyperplexed immunofluorescence imaging

Article Open access 15 April 2024

Segment anything in medical images

Article Open access 22 January 2024

Towards a general-purpose foundation model for computational pathology

Article 19 March 2024

Introduction

Cancer, as a serious disease, endangers people’s lives, and its effects are very severe worldwide. In particular, breast cancer is a malignant tumor of the epithelial tissue that seriously threatens women’s lives and health care systems, with a high prevalence rate and a long treatment process¹. For women, the occurrence rate of breast cancer is only lower than that of lung cancer. Currently, breast cancer has become an essential public health issue of global concern. Early detection and therapy are key to improving the survival rate of breast cancer patients. With the development of medicine, the analysis of histopathological images has become the most widely used method for breast cancer diagnosis. This method is used to diagnose pathology through biological tissue sections. Thus, the nature and type of tumor can be identified, providing an important foundation for a surgeon’s next diagnosis.

Since a rigorous and accurate diagnosis is the core of the treatment of cancer, the analysis of histopathological images has become a hot topic in current research. Pathology-based image analysis and artificial intelligence-assisted diagnosis are playing increasingly significant roles in this field and are critical factors in helping physicians enhance the accuracy of cancer diagnosis². Deep learning in artificial intelligence³ has also shown great potential in tumor region identification, tumor microenvironment characterization, prognosis prediction, and so on.

Based on numerous studies, deep learning has been applied to mineral mapping⁴, hyperspectral image classification problems⁵⁶, oral cancer image classification⁷, breast cancer image classification⁸, skin cancer prediction⁹, potato and rice disease prediction¹⁰, and many others. Among them, in the field of diagnosing breast cancer, histopathological image analysis is an important technique in its early stages, but its efficiency is not guaranteed in practical applications¹¹. The poor accuracy of image edge detection¹², time-consuming and tedious annotation delineation of images¹¹, expensive data tagging¹³, and unbalanced dataset selection¹⁴ have seriously hindered the development of effective computerized diagnostic methods. Among the many technologies that address related issues, convolutional neural networks and recurrent neural networks¹⁵ in deep learning have shown excellent performance in various vision tasks¹⁶. However, in some cases, traditional neural networks still have some drawbacks. When the network depth is increased excessively, problems such as accuracy degradation and gradient disappearance may arise¹⁷. In the application of practical scenarios, detection and interpretation errors may occur, leading to incorrect diagnoses¹⁸. Thus, many researchers have made strong efforts in the field of histopathological image analysis. In 2014, Xu¹⁹ proposed a comprehensive framework based on weakly supervised learning combined with machine learning to effectively separate cancerous tissues from digital images and classify them into different types. Researchers have made improvements to convolutional neural networks. In 2018, Zhong²⁰ proposed an end-to-end spectral-spatial residual network (SSRN), which improves the precision of deep learning. Zhu²¹ proposed a generative adversarial network (GAN), which is very competitive in the face of complex tasks and highly competitive in the face of complex and nonlinear data analysis tasks.

Combining convolutional neural networks in machine learning with feature extraction algorithms can be far more efficient than traditional manual feature extraction algorithms. Mei²² proposed an unsupervised spatial-spectral feature learning strategy that uses a 3-dimensional (3D) convolutional autoencoder (CAE) to maximize the exploration of spatial-spectral structure information and used it for CNN learning. Li²³ microsupervised the mining of contrast patterns between normal and malignant images using a fully convolutional autoencoder to learn the main structural patterns in normal image patches, making the output diagnostic solution easier to understand. Wang²⁴ used adversarial networks in deep learning for sample generation, proposed the Caps-TripleGAN framework, and applied it to hyperspectral image classification. Hameed²⁵ implemented the automatic classification of complex cancer cell images by using large amounts of data for training deep learning models. By combining the merits of convolutional and recurrent neural networks. Yan²⁶ applied a neural network to breast cancer histopathology image analysis, using a CNN to extract richer multilevel information from case images and then fusing plaque features using an RNN to derive image classification finally.

However, most feature extraction algorithms need to process a large amount of information with many redundant features. In 2020, Feng²⁷ proposed a deep manifold preserving autoencoder, which learns discriminative features directly from unlabelled data and then fine-tunes a neural network with labeled training data, preserving the structure of the input dataset from the perspective of popular learning and minimizes the reconstruction error of a large amount of unlabelled data from the perspective of deep learning, thus learning discriminative features. Given the complexity of label relationships in images, graph representations may be difficult to distinguish. Nguyen²⁸ proposed the modular graph transformer network (MGTN), which partitions a computational graph into multiple modular-based subgraphs through a preprocessing procedure, and on the basis of the MGTN, different subgraphs can be better propagated in terms of information. By combining a contrast learning method and a transformer model. Hu²⁹ proposed a novel unsupervised framework that can effectively extract hyperspectral image features without supervision. By comparing traditional machine learning and deep learning experiments. Boumaraf³⁰ concluded that deep learning has better explanations for clinical image classification. Dai³¹ combined the advantages of CNNs and transformers to propose TransMed for multimodal medical image classification, which is able to capture long-distance dependencies between different modalities and shows great potential in the field of medical research. Wang³² combined the advantages of convolutional neural networks and capsule networks and proposed histopathological image classification of breast cancer based on deep feature fusion and augmented routing by extracting convolutional features and capsule features simultaneously through a novel two-channel structure. Thilagaraj³³ combined an artificial fish swarm algorithm with a deep convolutional neural network and proposed the implementation of an improved DCNN for the classification of breast cancer images using an artificial fish swarm model, in which the training data of the deep convolutional neural network are directly provided by the artificial fish swarm algorithm. Hong³⁴ proposed the spectral former from the perspective of the order of transformers, which overcomes the limitation of traditional convolutional neural networks by their inherent network backbone. Liu³⁵ proposed a hierarchical learning algorithm based on a Bayesian neural network classifier, which builds a visual confusion label tree from the output of a convolutional neural network model, constructs a hierarchical structure for a large number of categories in an image dataset, and automatically determines the hierarchical learning task. In addition, a backtracking algorithm was introduced into the algorithm for reclassification as a way to correct samples that were misclassified during the former classification process. Also, novel evolutionary methods^36,37,38 can be used in complex optimization problems.

The main contributions of this paper are listed as follows:

1.
Texture feature parameter extraction (TFPE) is used to transform figures into a matrix.
2.
An unsupervised dynamic learning mechanism (UDLM) is proposed to classify the matrix generated from TFPE. The UDLM does not require human annotation or pre-parameter learning and achieves unsupervised classification. Moreover, its adaptive evolution capability enables the UDLM to be quickly applied to the classification of different data.

The rest of this paper is organized as follows. “Materials and methods” section introduces the proposed method, “Experiments and results” section introduces the experimental methods and results, and “Conclusion” section presents the discussion.

Materials and methods

Texture feature parameter extraction based on a GLCM

A novel breast cancer image classification model (BCICM) is proposed in this section. Image texture information is one of the important bases to reflect whether a breast cancer image is complex. Referring to the related results in the field of image complexity analysis, textural features can be obtained using the grey-level cooccurrence matrix (GLCM) methods. By computing the GLCM of an image to obtain its feature information, Haralick proposed 14 statistical feature parameters, including energy, entropy, contrast, homogeneity, correlation, and variance. Combining the above methods results in 20 feature parameters describing the complexity of a breast cancer image from different aspects, but there is an overlap problem and redundancy among these feature parameters. Therefore, in this paper, four textural parameters with a low correlation that are easy to calculate, including the energy, contrast, inverse difference moment, and correlation, are selected for the complexity perception of a breast cancer image. Assume an $M \times N$ breast cancer image I with $N_g$ grey grades. $\left( x_{1}, y_{1}\right)$ and $\left( x_{2}, y_{2}\right)$ are two pixel points in image I with distance d in the direction of $\theta$. Then, the GLCM of this breast cancer image is calculated by Eq. (1).

$$\begin{aligned} \begin{aligned} P(i, j, d, \theta )=\#\{\left( x_{1}, y_{1}\right) ,\left( x_{2}, y_{2}\right) \in M \times N \mid I\left( x_{1}, y_{1}\right) =i, I\left( x_{2}, y_{2}\right) =j \} \end{aligned} \end{aligned}$$

(1)

where $\#$ denotes the number of elements in the set. $i,j=0,1,2,..., N_g-1$ represents the grey levels of the two pixels.

The energy (ASM) is used to describe the uniformity of the distribution of the breast cancer image. When the elements are concentrated near the diagonal of the GLCM, a smaller ASM value indicates that the greyscale distribution is more uniform and the texture is finer; conversely, it indicates that the greyscale distribution is uneven and the texture is rougher. The ASM is calculated by Eq. (2).

$$\begin{aligned} A S M=\sum _{i=0}^{N_{g}-1} \sum _{j=0}^{N_{g}-1} P(i, j, d, \theta )^{2} \end{aligned}$$

(2)

The contrast (CON) is used to reflect the depth of image texture grooves and clarity. In a particular breast cancer image, a clearer image texture means a larger CON value, and the opposite means a smaller value. The (CON) is calculated by Eq. (3).

$$\begin{aligned} C O N=\sum _{i=0}^{N_{g}-1} \sum _{j=0}^{N_{g}-1}(i-j)^{2} P(i, j, d, \theta ) \end{aligned}$$

(3)

The inverse difference moment (IDM) is a statistical feature parameter that reflects the local texture of a breast cancer image. When the IDM value is large, the textures of different regions in the breast cancer image are more homogeneous. The (IDM) is calculated by Eq. (4).

$$\begin{aligned} I D M=\sum _{i=0}^{N_{g}-1} \sum _{j=0}^{N_{g}-1} \frac{P(i, j, d, \theta )}{1+(i-j)^{2}} \end{aligned}$$

(4)

The correlation (COR) is used to measure the similarity of the GLCM elements in the row or column direction. When the row or column similarity is high, the COR value is larger, and the complexity of the scene is smaller. The opposite also holds. The (COR) is calculated by Eq. (5).

$$\begin{aligned} C O R=\sum _{i=0}^{N_{g}-1} \sum _{j=0}^{N_{g}-1}\left( i-\mu _{1}\right) \left( j-\mu _{2}\right) / \delta _{1} \delta _{2} \end{aligned}$$

(5)

where $\mu _{1}$ and $\mu _{2}$ denote the mean values of the elements along the normalized GLCM in the row and column directions, respectively. $\delta _{1}$ and $\delta _{2}$ represent their mean squared values, respectively.

Therefore, for any breast cancer image, we extract four parameters separately and combine them into a texture feature vector. Details are shown in Eq. (6).

$$\begin{aligned} {\textbf {E}}=[A S M, C O N, I D M, C O R] \end{aligned}$$

(6)

Unsupervised dynamic learning mechanism for data classification

Breast cancer-related image data have a wide variety and are difficult to label. Traditional machine learning methods require a large amount of labeled data in the training process, which also leads to slow and expensive training. Based on these factors, a novel unsupervised dynamic learning mechanism (UDLM) for data classification is proposed in this section. Compared with traditional classification methods, the UDLM does not require pretraining or human labeling, and it can efficiently and accurately classify data according to their characteristics.

To evaluate the performance of the classification algorithm, the sum distance is defined by Eq. (7):

$$\begin{aligned} SumDis(p_i) & = \sum \limits _{i = 1}^n {Dis(p+i)} \\ Dis(p+i) & = \min (Euc({p_i},center1),Euc({p_i},center2))\end{aligned}$$

(7)

where Euc(a, b) is a function used to calculate the Euclidean distance between elements a and b. From this, it can be seen that each element is divided into the cluster centers closer to it, and the distance between them is calculated into the sum distance. Now, the whole problem is transformed into finding two suitable clustering centers that minimize the sum of distances.

To achieve unsupervised classification, a metaheuristic algorithm is used in the UDLM, and a dynamic learning strategy is added to enhance the accuracy of the algorithm. As a metaheuristic algorithm, the historical optimal position (pbest) of each particle and the population optimal position (gbest) of all particles are recorded for subsequent calculations. Specifically, a particle with no mass and volume, only coordinates, is used as the basic search unit. Each particle has two sets of coordinates, one representing the center of the first class and the other representing the center of the second class. Each particle corresponds to a value that is calculated by Eq. (7). First, all particles are scattered randomly in the search space. As the search proceeds, particles are clustered towards the optimal particles until the end of the iteration. In each iteration, one particle coevolves with a random neighbor particle. To describe this process more accurately, we assume that of the two particles selected, the one with the smaller sum distance is particle j and the one with the larger sum distance is particle i. The next positions of particle i and particle j are calculated by Eq. (8):

$$\begin{aligned} \alpha & = (p_i^t + p_j^t)/2\\ \beta &= abs(p_i^t - p_j^t)\\ \chi & = (p_j^t + gbes{t^t})/2\\ \delta & = abs(p_j^t - gbes{t^t})\\ candidate\_p_i^(t+1) &= Gauss(\alpha ,\beta )\\ candidate\_p_j^(t+1) &= Gauss(\chi ,\delta ) \end{aligned}$$

(8)

where $p_i^t$ is the personal best position of particle i in the tth generation, $p_j^t$ is the personal best position of particle j in the tth generation, $gbes{t^t}$ is the best position of all particles in the tth generation, abs(a, b) is a function that is used to calculate the absolute value between a and b; Gauss(a, b) is a function that is used to calculate the Gaussian distribution with a mean of a and a standard deviation of b, $candidate\_p_i^{t+1}$ is the candidate position of particle i in the $(t+1)$th generation, and $candidate\_p_j^{t+1}$ is the candidate position of particle j in the $(t+1)$th generation. To completely describe the evolution of all particles, the pseudo-code of the UDLM is shown in Algorithm 1.

Novel breast cancer image classification model

The novel breast cancer image classification model (BCICM) is a tool used to reduce the workload of healthcare workers. First, TFPE extracts the texture features of an image and converts the image into a matrix. Then, the UDLM randomly generates a large number of preparatory clustering centers and moves the positions of these preparatory centers by dynamic learning until the iteration reaches the preset upper limit. Finally, the BCICM will output the classification results to the medical staff for their reference and use. To better present this process, the flow chart of the BCICM is shown in Fig. 1. TFPE and UDLM were chosen due to their efficacy in effectively handling multi-scale texture features and utilizing an unsupervised dynamic learning mechanism for classifying these feature vectors. Consequently, the selection of these two methods aims to address the issues inherent in traditional machine learning approaches for medical image classification, which typically require extensive labeled data and prolonged learning periods, thereby mitigating training costs and enhancing applicability. Codes of BCICM can be found at https://github.com/GuoJia-Lab-AI/breast-cancer-image-classification.

Experiments and results

Datasets

The BreakHis¹⁴ database contains microscopic biopsy images of benign and malignant breast tumors. Two well-known traditional methods, random forest (RF) and support vector machine (SVM), were used for the control group.

Benign breast cancers are mainly composed of adenosis, fibroadenoma, phyllodes tumor, and tubular adenoma. Malignant breast cancers are mainly composed of ductal carcinoma, lobular carcinoma in situ, mucinous carcinoma, and papillary carcinoma. All RGB values and corresponding labels were disrupted, and the dataset was split into an 80% training set and a 20% test set using the train test split function.

Evaluation parameters

The classification of benign and malignant breast tumors is a routine dichotomous task and is usually assessed using accuracy (ACC), sensitivity (SEN), precision (PRE), specificity (SPE), receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC). The model’s classification performance was determined by evaluation indicators such as precision (PRE), specificity (SPE), receiver operating characteristic (ROC) curve, and area under the ROC curve (AUC). Due to the problem of unbalanced data, F1 scores were also used as evaluation metrics to compensate for the negative impact of these metrics on the evaluation of unbalanced data. In this study, a malignant sample is a positive (P) result, and a benign sample is a negative (N) result. A true positive (TP) outcome occurs when the model correctly predicts a malignant sample, a false-negative (FN) outcome occurs when the model incorrectly predicts a malignant sample, a true negative (TN) outcome occurs when the model correctly predicts a benign sample and a false-positive (FP) occurs when the model incorrectly predicts a benign sample. The indicators in the evaluation index can then be expressed. The accuracy (ACC) rate reflects the overall accuracy of the model’s prediction results and is the ratio of the number of samples correctly predicted to the total sample size. The ACC is defined in Eq. (9).

$$\begin{aligned} \textrm{ACC}=\frac{(\textrm{TP}+\textrm{TN})}{(P+N)} \times 100 \% \end{aligned}$$

(9)

The ratio of the sensitivity (SEN) of malignant tumors, with a higher value indicates that the model can find as many malignant tumors as possible and has a lower underdiagnosis rate. The SEN is defined in Eq. (10).

$$\begin{aligned} \textrm{SEN}=\frac{\textrm{TP}}{(\textrm{TP}+\textrm{TN})} \times 100 \% \end{aligned}$$

(10)

The precision (PRE) rate reflects the ratio of malignancy samples correctly detected by the model to all predicted malignancy samples. The PRE is defined in Eq. (11).

$$\begin{aligned} \textrm{PRE}=\frac{\textrm{TP}}{(\textrm{TP}+\textrm{FP})} \times 100 \% \end{aligned}$$

(11)

The specificity (SPE) reflects the ability of the model to detect benign tumors, with higher values indicating a lower rate of misdiagnosis. The SPE is defined in Eq. (12).

$$\begin{aligned} \textrm{SPE}=\frac{\textrm{TN}}{(\textrm{TN}+\textrm{FP})} \times 100 \% \end{aligned}$$

(12)

The F1 score is a more balanced reflection of the classification performance of the model when the categories are not balanced. The F1 score is defined in Eq. (13).

$$\begin{aligned} \textrm{F} 1=\frac{2 \times (\textrm{PRE} \times \textrm{SEN})}{(\textrm{PRE}+\textrm{SEN})} \times 100 \% \end{aligned}$$

(13)

Comparative experimental analysis

Effect of different resolution images on feature extraction results

The effect of different resolution images on the feature extraction results is shown in Table 1. As shown in Table 1, there are differences in the texture feature values extracted from different resolutions. Specifically, the energy (ASM) value reflects the uniformity of the grey distribution of the image, and it can be observed that its change is smaller as the resolution increases. The contrast (CON) reflects the clarity of the image texture, and it can be seen that its value is smaller as the resolution increases. The inverse moment difference (IDM) reflects the size of the change in the local texture of the image, and it can be seen that its value decreases as the resolution increases. The entropy (ENT) reflects the amount of information in the image, and it can be found that its change with the resolution is not obvious. From the viewpoint of running time, the higher the resolution is, the longer the processing time, and the model accuracy improvement effect is not obvious.

Table 1 Effect of different resolution images on the feature extraction results.

Full size table

Comparison of the results of the different methods

To test the performance of the BCICM in all aspects, 4 sets of simulation tests, including a 40-pixel test, 100-pixel test, 200-pixel test, and 400-pixel test, were implemented. The pre-processing images in the 40-pixel test, 100-pixel test, 200-pixel test, and 400-pixel test are shown in Fig. 2. Selecting images of different pixel sizes for breast cancer image classification experiments serves several important purposes: (1) Multiscale Feature Analysis: Features in breast cancer images may exist at different scales. By using multiple pixel sizes, we can capture image features at various scales, thereby enhancing the performance of the classification model. (2) Adaptation to Different Resolution Requirements: In practical applications, breast cancer images may vary in resolution. Hence, it is essential to test the performance of the classification model on images with different resolutions. Selecting images of different pixel sizes simulates this scenario, enabling the model to generalize better. (3) Exploration of Optimal Performance at Different Sizes: Experimenting with different pixel sizes helps determine which size is most effective for breast cancer image classification tasks. This facilitates the optimization of model design and image processing procedures, thereby improving classification accuracy. (4) Consideration of Computational Costs and Efficiency: Higher-resolution images typically entail greater computational resources and processing time. Selecting appropriate pixel sizes can balance classification accuracy with computational costs and processing time, making the algorithm more practical. Thus, conducting experiments on images with pixel sizes ranging from 40 to 400 pixels enables a comprehensive evaluation of the algorithm’s performance across different resolutions, providing a more holistic solution for breast cancer image classification.

The experimental results are given in both tables and figures. The results of the 40-pixel test are shown in Table 2 and Fig. 3, the results of the 100-pixel test are shown in Table 3 and Fig. 4, the results of the 200-pixel test are shown in Table 4 and Fig. 5, and the results of the 400-pixel test are shown in Table 5 and Fig. 6. To classify benign and malignant breast tumors, we selected a series of evaluation metrics, such as the SEN, PRE, SPE, F1 score, and callback curve, in addition to the accuracy. Unlike the ACC, which reflects the overall accuracy of the model classification and prediction, the SEN can reflect the leakage rate of the model, and a higher value represents a lower probability of malignant tumors being missed. A higher PRE indicates a higher probability of correctly diagnosing malignant tumors. The SPE represents the ability of the model to detect benign tumors, and a higher value indicates a lower probability of misdiagnosis of benign cases. The F1 score is a combination of the detection rate and specificity of the model, and a higher value indicates a better overall performance of the model. In addition, the AUC value indicates the area under the receiver operating characteristic (ROC) curve, and the closer to 1 the value is, the better the classification effect of the model. It can be seen that the evaluation parameters selected by the method in this paper are optimal compared with those of the benchmark method at all resolutions.

Table 2 Experimental results of the 40-pixel test.

Full size table

Table 3 Experimental results of the 100-pixel test.

Full size table

Table 4 Experimental results of the 200-pixel test.

Full size table

Table 5 Experimental results of the 400-pixel test.

Full size table

To better demonstrate the performance of the BCICM, we plotted the receiver operating characteristic (ROC) curve for four sets of experiments. The curves clearly demonstrate that the proposed algorithm has a great advantage in all experiments. The ROC curve of the 40-pixel test is shown in Fig. 7. The ROC curve of the 100-pixel test is shown in Fig. 8. The ROC curve of the 200-pixel test is shown in Fig. 9. The ROC curve of the 400-pixel test is shown in Fig. 10.

From the above results, it can be seen that the proposed model is more sensitive to the tumor region during classification. However, it responds to some information in the background to a lesser extent. This shows that the model can learn richer information about the tumor region. The feedback in the calcified region and nonsmooth edges of the tumor is also more obvious in the malignant tumor images.

However, as the image resolution increases, the main reason for the decrease in model accuracy may be attributed to the fact that high-resolution images contain more details and noise, posing more complex challenges for the model during processing. High-resolution images may encompass finer structures and textures, rendering it difficult for the model to accurately differentiate and classify various features. Moreover, high-resolution images may increase the dimensionality of the feature space, making the model more prone to overfitting or encountering greater computational complexity during training.

Addressing this issue, future work could explore the following avenues: Firstly, designing and employing more effective feature extraction methods tailored to high-resolution images to extract features that are most discriminative and robust for the classification task. Secondly, researching and optimizing model structures to better adapt to and handle the characteristics of high-resolution images, including increasing model capacity or introducing more complex network architectures. Additionally, exploring the use of data augmentation techniques to expand the training dataset to alleviate overfitting issues associated with high-resolution images. Lastly, approaches such as transfer learning or semi-supervised learning to leverage pre-trained models or a small amount of labeled data to enhance the model’s generalization ability on high-resolution images.

In conclusion, addressing the decrease in model accuracy with increasing image resolution requires future work to focus on improving feature extraction methods, optimizing model structures, expanding datasets, and employing strategies such as transfer learning, aiming to enhance model performance and generalization ability on high-resolution images.

Conclusion

This work introduces a novel breast cancer image classification model (BCICM) that achieves unsupervised classification of breast cancer images through the collaboration between TFPE and the UDLM. Across all four different size tests, the BCICM demonstrates the highest accuracy results. These experimental findings underscore the BCICM’s capacity to offer efficient and highly accurate support to medical professionals.

Given the unsupervised nature of the BCICM, it holds theoretical applicability to various types of image classification tasks. Therefore, leveraging the BCICM for the classification of other cancer images, such as lung cancer images and skin cancer images, appears to be a viable future direction. Moreover, there exists room for improvement in the adaptive evolution strategy of the BCICM. Hence, our future endeavors aim to propose enhanced dynamic learning methods to enhance the algorithm’s efficiency and accuracy.

Furthermore, it is observed that the accuracy of the BCICM diminishes as the image resolution increases. Thus, we emphasize the importance of enhancing the classification accuracy of the BCICM in high-resolution images for future investigations. Additionally, augmenting the diversity of datasets and elucidating their fidelity to real-world contexts are deemed crucial for bolstering the resilience of research outcomes. This avenue represents a critical aspect for future inquiry.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

References

Zhu, C. et al. Breast cancer histopathology image classification through assembling multiple compact CNNs. BMC Med. Inform. Decis. Mak. 19(1), 1–17. https://doi.org/10.1186/s12911-019-0913-x (2019).
Article Google Scholar
Zhu, C., Chen, W., Peng, T., Wang, Y. & Jin, M. Hard sample aware noise robust learning for histopathology image classification. IEEE Trans. Med. Imaging 41(4), 881–894. https://doi.org/10.1109/TMI.2021.3125459 (2022).
Article PubMed Google Scholar
Wang, S. et al. Artificial Intelligence in Lung Cancer Pathology Image Analysis https://doi.org/10.3390/cancers11111673 (2019).
Yokoya, N., Chan, J.C., Segl, K. Potential of Resolution-Enhanced Hyperspectral Data for Mineral Mapping Using Simulated EnMAP and Sentinel-2 Images. https://doi.org/10.3390/rs8030172 (2016).
Qing, Y., Liu, W., Feng, L. & Gao, W. Improved transformer net for hyperspectral image classification. Remote Sens. 13(11), 2216. https://doi.org/10.3390/rs13112216 (2021).
Article ADS Google Scholar
Chen, Y., Jiang, H., Li, C., Jia, X. & Ghamisi, P. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE Trans. Geosci. Remote Sens. 54(10), 6232–6251. https://doi.org/10.1109/TGRS.2016.2584107 (2016).
Article ADS Google Scholar
Song, B. et al. Classification of imbalanced oral cancer image data from high-risk population. J. Biomed. Opt. 26(10), 105001. https://doi.org/10.1117/1.jbo.26.10.105001 (2021).
Article ADS PubMed PubMed Central Google Scholar
Khairi, S. S. M. et al. Deep learning on histopathology images for breast cancer classification: a bibliometric analysis. Healthcare https://doi.org/10.3390/healthcare10010010 (2022).
Article Google Scholar
Bhavya Sai, V., Narasimha Rao, G., Ramya, M., Sujana Sree, Y. & Anuradha, T. Classification of skin cancer images using tensorflow and inception V3. Int. J. Eng. Technol. https://doi.org/10.14419/ijet.v7i2.7.10930 (2018).
Article Google Scholar
Sharma, R., Singh, A., Kavita, Jhanjhi, N. Z., Masud, M., Sami Jaha, E. & Verma, S. Plant disease diagnosis and image classification using deep learning. Comput. Mater. Continua 71(2), 2125–2140. https://doi.org/10.32604/cmc.2022.020017 (2022).
Article Google Scholar
Wang, X. et al. Weakly supervised deep learning for whole slide lung cancer image analysis. IEEE Trans. Cybern. https://doi.org/10.1109/TCYB.2019.2935141 (2020).
Article PubMed Google Scholar
Li, X., Jiao, H. & Wang, Y. Edge detection algorithm of cancer image based on deep learning. Bioengineered https://doi.org/10.1080/21655979.2020.1778913 (2020).
Article PubMed PubMed Central Google Scholar
Sudharshan, P. J. et al. Multiple instance learning for histopathological breast cancer image classification. Expert Syst. Appl. https://doi.org/10.1016/j.eswa.2018.09.049 (2019).
Article Google Scholar
Spanhol, F. A., Oliveira, L. S., Petitjean, C. & Heutte, L. A dataset for breast cancer histopathological image classification. IEEE Trans. Biomed. Eng. https://doi.org/10.1109/TBME.2015.2496264 (2016).
Article PubMed Google Scholar
Mou, L., Ghamisi, P. & Zhu, X. X. Deep recurrent neural networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 55(7), 3639–3655. https://doi.org/10.1109/TGRS.2016.2636241 (2017).
Article ADS Google Scholar
Hu, W., Huang, Y., Wei, L., Zhang, F. & Li, H. Deep convolutional neural networks for hyperspectral image classification. J. Sens. 2015, 258619. https://doi.org/10.1155/2015/258619 (2015).
Article Google Scholar
Song, W., Li, S., Fang, L. & Lu, T. Hyperspectral image classification with deep feature fusion network. IEEE Trans. Geosci. Remote Sens. 56(6), 3173–3184. https://doi.org/10.1109/TGRS.2018.2794326 (2018).
Article ADS Google Scholar
Beeravolu, A. R. et al. Preprocessing of breast cancer images to create datasets for deep-CNN. IEEE Access https://doi.org/10.1109/ACCESS.2021.3058773 (2021).
Article Google Scholar
Xu, Y., Zhu, J. Y., Chang, E. I., Lai, M. & Tu, Z. Weakly supervised histopathology cancer image segmentation and classification. Med. Image Anal. https://doi.org/10.1016/j.media.2014.01.010 (2014).
Article PubMed PubMed Central Google Scholar
Zhong, Z., Li, J., Luo, Z. & Chapman, M. Spectral-spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 56(2), 847–858. https://doi.org/10.1109/TGRS.2017.2755542 (2018).
Article ADS Google Scholar
Zhu, L., Chen, Y., Ghamisi, P. & Benediktsson, J. A. Generative adversarial networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 56(9), 5046–5063. https://doi.org/10.1109/TGRS.2018.2805286 (2018).
Article ADS Google Scholar
Mei, S. et al. Unsupervised spatial-spectral feature learning by 3D convolutional autoencoder for hyperspectral classification. IEEE Trans. Geosci. Remote Sens. 57(9), 6808–6820. https://doi.org/10.1109/TGRS.2019.2908756 (2019).
Article ADS Google Scholar
Li, X., Radulovic, M., Kanjer, K. & Plataniotis, K. N. Discriminative pattern mining for breast cancer histopathology image classification via fully convolutional autoencoder. IEEE Access 7(c), 36433–36445. https://doi.org/10.1109/ACCESS.2019.2904245 (2019).
Article Google Scholar
Wang, X., Tan, K., Du, Q., Chen, Y. & Du, P. Caps-TripleGAN: GAN-assisted CapsNet for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 57(9), 7232–7245. https://doi.org/10.1109/TGRS.2019.2912468 (2019).
Article ADS Google Scholar
Hameed, Z., Zahia, S., Garcia-Zapirain, B., Aguirre, J. J. & Vanegas, A. M. Breast cancer histopathology image classification using an ensemble of deep learning models. Sensors https://doi.org/10.3390/s20164373 (2020).
Article PubMed PubMed Central Google Scholar
Yan, R. et al. Breast cancer histopathological image classification using a hybrid deep neural network. Methods https://doi.org/10.1016/j.ymeth.2019.06.014 (2020).
Article PubMed Google Scholar
Feng, Y., Zhang, L. & Mo, J. Deep manifold preserving autoencoder for classifying breast cancer histopathological images. IEEE/ACM Trans. Comput. Biol. Bioinform. https://doi.org/10.1109/TCBB.2018.2858763 (2020).
Article PubMed Google Scholar
Nguyen, H. D., Vu, X. S. & Le, D. T. Modular graph transformer networks for multi-label image classification. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021, Vol. 10B. https://doi.org/10.1609/aaai.v35i10.17098 (2021).
Hu, X., Li, T., Zhou, T., Liu, Y. & Peng, Y. Contrastive learning based on transformer for hyperspectral image classification. Appl. Sci. https://doi.org/10.3390/app11188670 (2021).
Article Google Scholar
Boumaraf, S. et al. Conventional machine learning versus deep learning for magnification dependent histopathological breast cancer image classification: A comparative study with visual explanation. Diagnostics https://doi.org/10.3390/diagnostics11030528 (2021).
Article PubMed PubMed Central Google Scholar
Dai, Y., Gao, Y. & Liu, F. Transmed: Transformers advance multi-modal medical image classification. Diagnostics https://doi.org/10.3390/diagnostics11081384 (2021).
Article PubMed PubMed Central Google Scholar
Wang, P. et al. Automatic classification of breast cancer histopathological images based on deep feature fusion and enhanced routing. Biomed. Signal Process. Control https://doi.org/10.1016/j.bspc.2020.102341 (2021).
Article PubMed PubMed Central Google Scholar
Thilagaraj, M., Arunkumar, N. & Govindan, P. Classification of breast cancer images by implementing improved DCNN with artificial fish school model. Comput. Intell. Neurosci. https://doi.org/10.1155/2022/6785707 (2022).
Article PubMed PubMed Central Google Scholar
Hong, D. et al. SpectralFormer: Rethinking hyperspectral image classification with transformers. IEEE Trans. Geosci. Remote Sens. https://doi.org/10.1109/TGRS.2021.3130716 (2022).
Article Google Scholar
Liu, Y., Dou, Y., Jin, R., Li, R. & Qiao, P. Hierarchical learning with backtracking algorithm based on the Visual Confusion Label Tree for large-scale image classification. Vis. Comput. 38(3), 897–917. https://doi.org/10.1007/s00371-021-02058-w (2022).
Article Google Scholar
Guo, J., Zhou, G., Yan, K., Sato, Y. & Di, Y. Pair barracuda swarm optimization algorithm: A natural-inspired metaheuristic method for high dimensional optimization problems. Sci. Rep. 13(1), 18314 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Guo, J. et al. A novel hermit crab optimization algorithm. Sci. Rep. 13(1), 9934 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Guo, J. et al. A bare-bones particle swarm optimization with crossed memory for global optimization. IEEE Access 11, 31549–31568 (2023).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Hubei Key Laboratory of Digital Finance Innovation, Hubei University of Economics, Wuhan, 430205, Hubei, China
Jia Guo, Hao Yuan, Binghua Shi, Ziteng Zhang & Hongyan Li
Xiangzhou District People’s Hospital of Xiangyang, Xiangyang, 441100, Hubei, China
Xiaofeng Zheng
Faculty of Computer and Information Sciences, Hosei University, Tokyo, 102-8160, Japan
Yuji Sato
School of Information Engineering, Hubei University of Economics, Wuhan, 430205, Hubei, China
Jia Guo, Hao Yuan, Binghua Shi, Ziteng Zhang & Hongyan Li
Hubei Internet Finance Information Engineering Technology Research Center, Hubei University of Economics, Wuhan, 430205, Hubei, China
Jia Guo, Hao Yuan, Binghua Shi, Ziteng Zhang & Hongyan Li

Authors

Jia Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Binghua Shi
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofeng Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ziteng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hongyan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuji Sato
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

J.G.: conceptualization, data curation, formal analysis, investigation, methodology, funding acquisition, validation, visualization, writing-original draft, and writing—review and editing. B.S.: conceptualization, resources, software, and writing—review and editing. X.Z.: conceptualization, resources. H.Y.: conceptualization, resources, software, funding acquisition, and writing—review and editing. H.L.: measurement, investigation, methodology, and software. Z.Z.: Investigation, methodology, and software. Y.S.: conceptualization, formal analysis, funding acquisition, methodology, project administration, resources, software, supervision, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Binghua Shi or Xiaofeng Zheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Guo, J., Yuan, H., Shi, B. et al. A novel breast cancer image classification model based on multiscale texture feature analysis and dynamic learning. Sci Rep 14, 7216 (2024). https://doi.org/10.1038/s41598-024-57891-5

Download citation

Received: 29 December 2023
Accepted: 22 March 2024
Published: 27 March 2024
DOI: https://doi.org/10.1038/s41598-024-57891-5

Keywords

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.