Pneumonia detection with QCSA network on chest X-ray

Singh, Sukhendra; Kumar, Manoj; Kumar, Abhay; Verma, Birendra Kumar; Shitharth, S.

doi:10.1038/s41598-023-35922-x

Download PDF

Article
Open access
Published: 03 June 2023

Pneumonia detection with QCSA network on chest X-ray

Sukhendra Singh¹,
Manoj Kumar¹,
Abhay Kumar²,
Birendra Kumar Verma¹ &
…
S. Shitharth³

Scientific Reports volume 13, Article number: 9025 (2023) Cite this article

2475 Accesses
5 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Worldwide, pneumonia is the leading cause of infant mortality. Experienced radiologists use chest X-rays to diagnose pneumonia and other respiratory diseases. The diagnostic procedure's complexity causes radiologists to disagree with the decision. Early diagnosis is the only feasible strategy for mitigating the disease's impact on the patent. Computer-aided diagnostics improve the accuracy of diagnosis. Recent studies established that Quaternion neural networks classify and predict better than real-valued neural networks, especially when dealing with multi-dimensional or multi-channel input. The attention mechanism has been derived from the human brain's visual and cognitive ability in which it focuses on some portion of the image and ignores the rest portion of the image. The attention mechanism maximizes the usage of the image's relevant aspects, hence boosting classification accuracy. In the current work, we propose a QCSA network (Quaternion Channel-Spatial Attention Network) by combining the spatial and channel attention mechanism with Quaternion residual network to classify chest X-Ray images for Pneumonia detection. We used a Kaggle X-ray dataset. The suggested architecture achieved 94.53% accuracy and 0.89 AUC. We have also shown that performance improves by integrating the attention mechanism in QCNN. Our results indicate that our approach to detecting pneumonia is promising.

Segment anything in medical images

Article Open access 22 January 2024

Transparent medical image AI via an image–text foundation model grounded in medical literature

Article 16 April 2024

Towards a general-purpose foundation model for computational pathology

Article 19 March 2024

Introduction

Pneumonia is an infection of the lungs that may be due to bacteria, viruses, or fungi. This infection inflames the air sacs and fluid-filled lungs (pleural effusion). Pneumonia is a leading cause of infant mortality and worldwide death. Overcrowding, pollution, and an unhygienic environment lead to pneumonia in underdeveloped and developing nations with few medical resources. Our work will benefit millions of people worldwide by providing a promising approach to accurately and efficiently detect pneumonia from chest X-ray images, which facilitates early diagnosis and treatment, and ultimately improve patient outcomes. Early detection and treatment are key to averting a fatal condition. X-rays, CT, MRI, and CT are used to diagnose lung disorders, among which X-Ray is most commonly used for the diagnosis of pneumonia. The proposed architecture will help radiologists to accurately analyze X-rays, CT, MRI, and CT, which could lead to diagnosing other respiratory diseases, bone fractures, and tumors. The advantages of QCSA lie in its ability to effectively capture the complex spatial and channel-wise correlations in chest X-ray images, which is crucial for accurately detecting pneumonia. Figure 1 depicts the CXRs of a person with pneumonia and a healthy individual. The white dots on the CXR on the right indicate the presence of pneumonia. Pneumonia CXR testing is subjective to radiologists' experience, and hence computer-aided assistance for detection and diagnosis is required for an accurate result.

Deep neural networks have exhibited exceptional image classification potential¹. However, most current research on image classification architectures is based on real-valued data. The study² has argued that the real-valued CNN cannot properly encode the relationship between multi-image channels. Quaternion number systems were utilized to address this issue, which are generalizations of complex number systems and have remarkable properties that can be exploited to create more robust designs. In our study, we have built quaternion convolution neural networks (QCNN), which are CNN extensions. The QCNN³ can extract the most representative features from multiple-dimensional input objects. This is because the orientation and spatial position of color image channels within the input images are encoded correctly in QCNN.

The attention mechanism^4,5 has attracted significant interest in computer vision systems for object recognition and scene interpretation in the past few years. By focusing only on the relevant parts of an object, the human visual system enables people to discriminate objects quickly when it is viewed. The capability of the human brain has inspired the use of attention mechanisms in deep neural networks⁶. The attention mechanism was utilized more frequently in activities linked to natural language processing⁷. It has also lately been used for image classification tasks⁸ to produce cutting-edge outcomes. Channel and spatial attention processes⁹ are the two typical types of attention mechanisms applied in computer vision tasks.

Recently, researchers^10,11,12,13 have experimented with the quaternion extension of CNN and produced outperforming results compared to real-valued CNN. In this experiment, the channel and spatial attention modules of a quaternion residual quaternion network were employed to improve the performance of predicting Pneumonia from CXR images. The capability of quaternions to adequately describe spatial transformations and evaluate multi-channel data makes them an intriguing candidate for computer vision applications.

The novelty of the proposed work is that we have incorporated a spatial attention layer in between the layers of Quaternion convolutional neural network. This enables the network to learn important regions from the chest X-ray images while attending to complex spatial features, thereby improving the accuracy of pneumonia detection. Our analysis of the feature map and the attention map shows that the QCSA network is able to effectively learn features from important regions of the chest X-ray images which leads to better performance of the proposed framework to detect pneumonia.

Major contributions

The following is our contribution to this experiment.

1.
We have first built residual quaternion architecture and evaluated the performance of Pneumonia detection on the CXR dataset.
2.
We then incorporated spatial and channel attention modules in the architecture in (i) and kept all hyper-parameter values the same for both architectures. We then evaluated the performance of this attention-augmented architecture.
3.
We then compared the performance of both the architecture to compute the influence of incorporating spatial and channel attention modules.

The remainder of this work is structured as follows. The background necessary for the proposed work and recent research results in issue areas connected to the proposed work is presented in “Background and similar works” section. “Materials and methods” section outlined the properties of the utilized dataset and recommended design. “Experimental analysis” section presents hardware infrastructure, performance metrics, and experimentation details and “Analysis of result” section describes a discussion of results. The conclusion and future scope of the proposed work is given in “Conclusion and future work” section.

Background and similar works

Here, we give the necessary background ideas for the suggested design as well as a comparative study of the findings of other recent investigations related to the same problem area.

Quaternion convolution neural network (QCNN)

QCNN¹⁴ is an extension of the real CNN model. Quaternion is a four-dimensional vector space having a basis of 1, i, j, and k. One of these orthogonal subspaces is a scalar subspace of one dimension, whereas the other is a pure subspace of three dimensions. Quaternion neural networks are a more recent form of neural network that uses quaternion-valued inputs, activation, and parameters (QNNs). Quaternions are numbers with one real component and three imaginary components. Each of its three imaginary components may encode a color component of an RGB image, making them appropriate for image processing. Numerous proposed models outperform their real-valued equivalents in tasks such as image processing and speech recognition^15,16 in recent years. Moreover, quaternion-valued networks benefit from parameter sharing as a result of the interactions of the Hamilton product¹⁷, resulting in models that require fewer parameters and less storage space and are hence smaller. These benefits can be provided to representations by substituting quaternionic layers for conventional (real-valued) layers, hence lowering their size without a perceptible decrease in performance.

Inputs and layers of a QNN have quaternion values as opposed to real values. Although the work on quaternion representations for deep learning is in its infancy, few papers analyzing their value have been published. Deep quaternion networks have been used specifically for classification^18,19 and segmentation²⁰. According to their research, quaternions offer superior results for a variety of tasks while necessitating fewer parameters. QCNNs were developed in order to correctly display color images in the quaternion domain. They found that their QCNN models for color image classification²¹ and denoising outperformed traditional CNNs. The authors of²² studied the influence of the Hamilton product on the grayscale-only reconstruction of color images. To reconstruct a unique grayscale image, a quaternion convolutional encoder-decoder architecture is created in¹². In contrast to standard convolutional encoder-decoder networks, their method can efficiently learn to reconstruct an image's colors from its grayscale representation. They conclude that quaternion-valued systems are unfettered by internal and global dependencies, making them suited for applications involving image recognition. Quaternion Recurrent Neural Networks (QRNNs) are proposed by the same authors²³ for sequential tasks such as speech recognition. Their quaternion-based recurrent designs beat non-quaternion-based alternatives despite having two to three times fewer parameters.

Figure 2, shows the building blocks, which show the customization of conventional CNN into quaternion CNN.

Algebra of quaternion numbers

This section describes identities and properties²⁴, followed by quaternion numbers.

Following Eq. (1) is the notation for a quaternion Q.

$$Q = r + xi + yj + zk$$

(1)

Furthermore, the Imaginary components of Quaternion can be expressed by Eq. (2).

$$i^{2} = j^{2} = k^{2} = ijk = - 1$$

(2)

As seen by the following Eq. (3), the product of two quaternions violates the commutative property.

$$ij = k = - jijk = - kj = iki = - ik = j$$

(3)

Also, in the quaternion domain, r represents the scalar component, x, y, and z represent the imaginary component in xi + yj + zk, and v represents the vector component. It has been represented by Eq. (4).

$$Q = (r,v)$$

(4)

The conjugate of Q is denoted by Eq. (5).

$$Q^{*} = {\text{(r}} - {\text{xi}} - {\text{yj}} - {\text{zk )}}$$

(5)

The magnitude of Q is shown by ||Q|| is described by Eq. (6).

$$||Q|| = \sqrt {r^{2} + x^{2} + y^{2} + z^{2} }$$

(6)

The inverse Q⁻¹ of a quaternion Q is defined by the expression as given in Eq. (7).

$$Q^{ - 1} = \frac{{Q^{*} }}{{||{\text{Q}}||^{2} }}$$

(7)

Just like a complex number, a quaternion number can also be represented as in Eq. (8).

$$Q = \rho e^{\theta s} = \rho (cos\theta + ssin\theta )$$

(8)

ρ =|Q|, θ is a real quantity and s is a pure imaginary quaternion of unit length.

Rotate a three-dimensional vector Q by an angle along a rotation axis w to obtain a new vector p. This rotation may be shown in Eqs. (9) and (10).

$$\widehat{{\text{Q}}} = q_{1} {\text{i}} + q_{2} {\text{j}} + q_{3} {\text{k}}\quad {\text{and}}\quad \widehat{{\text{p}}} = {\text{p}}_{1} {\text{i }} + {\text{p}}_{2} {\text{j}} + {\text{p}}_{3}$$

(9)

$\widehat{{\text{p}}} = { }\widehat{{\text{w }}} \cdot \widehat{{\text{Q}}} \cdot { }\overline{{\widehat{{\text{w }}}}} \,{\text{where}}\,{\hat{\text{p}}}\,{\text{and}}\,{\hat{\text{Q}}}$ are pure Quaternion with the real component being zero

$$\hat{w} = cos\frac{\theta }{2} + sin\frac{\theta }{2}(w_{1} + w_{2} + w_{3} )$$

(10)

The Quaternion convolution method employs scaling and rotation between the Q and Q_N input filters.

Here, w is a quaternion filter of size F, and Q is a quaternion matrix of size N. Then, as in Eq. (11), the quaternion operation can be written as.

S = N − F + 1 and T = N − F + 1

$$\left\{ {\begin{array}{*{20}l} {\hat{Q}\hat{w} = \left[ {\widehat{{f_{{kk^{\prime } }} }}} \right] \in H^{(S) \times (T)} } \hfill \\ {ff_{kk}^{\prime } = \mathop \sum \limits_{i = 1}^{M} \mathop \sum \limits_{j = 1}^{M} \frac{1}{{s_{ij} }}w_{ij} q(k + i)\left( {k^{\prime } + j} \right)\overline{{w_{ij} }} } \hfill \\ {w_{\prime } = s_{\prime } \left( {cos\frac{{\theta_{\prime } }}{2} + \mu sin\frac{{\theta_{\prime } }}{2}} \right)} \hfill \\ \end{array} } \right.$$

(11)

Here, s stands for the scaling component is the axis of unit length, and fluctuates between—and. Due to the Hamiltonian product, as indicated in Eq. (11), A QNN can represent the local and global dependence inside the multi-channel input's features.

Hamiltonian product

In QCNN, the Hamilton product is utilized in place of the conventional real-valued dot product to carry out the following transformations between two quaternions,

Q₁ = r₁ + x₁i + y₁j + z₁k and W₁ = r₂ + x₂i + y₂j + z₂k, here Q₁ and W₁ are two quaternions.

⊗ operator is used to represent the Hamiltonian product of two quaternions Q₁ and W_1, and it is defined as Eq. (12).

$${\text{Q}}_{{1}} \otimes {\text{W}}_{{1}} = ({\text{r}}_{{1}} {\text{r}}_{{2}} - {\text{ x}}_{{1}} {\text{x}}_{{2}} - {\text{ y}}_{{1}} {\text{y}}_{{2}} - {\text{ z}}_{{1}} {\text{z}}_{{2}} ) + ({\text{r}}_{{1}} {\text{x}}_{{2}} + {\text{ x}}_{{1}} {\text{r}}_{{2}} + {\text{ y}}_{{1}} {\text{z}}_{{2}} - {\text{ z}}_{{1}} {\text{y}}_{{2}} ){\text{i}} + {\text{(r}}_{{1}} {\text{y}}_{{2}} - {\text{ x}}_{{1}} {\text{z}}_{{2}} + {\text{ y}}_{{1}} {\text{r}}_{{2}} + {\text{ z}}_{{1}} {\text{x}}_{{2}} {\text{)j}} + {\text{(r}}_{{1}} {\text{z}}_{{2}} + {\text{ x}}_{{1}} {\text{y}}_{{2}} - {\text{y}}_{{1}} {\text{x}}_{{2}} + {\text{z}}_{{1}} {\text{r}}_{{2}} {\text{)k}}$$

(12)

The Hamilton product enables QNN to discover latent interactions inside the Quaternion's properties. During the Hamilton product in a QNN, the quaternion-weight components are shared over many quaternion-input sections, hence forming connections between the elements. In a real-valued neural network, the multiple weights necessary to encode latent relations within a feature are evaluated at the same level as learning global dependencies between different features, while the quaternion weight w encodes these interconnections within a unique quaternion Q_out during the Hamilton product.

Attention mechanism

Image attention involves finding a target region as the eye rapidly scans the image. When smaller activation values are combined by the associated feature map, a substantial quantity of feature map information is discarded.; therefore, combining spatial and channel attention in the quaternion-residual network produces superior results. Second, regions of interest are highlighted as opposed to feature maps. When channel attention reduces the information in individual feature maps, spatial attention can highlight numerous significant regions of each feature map by employing the attention mask of a different branch. In the last phase, the output feature maps of two attention processes are concatenated. These characteristics of interest are amplified in fused feature maps, while redundant features are deleted. To collect the most accurate target data while reducing unnecessary data, this target region is weighted (distributed). Soft attention^25,26 is the most popular since it is differentiable and trains CNN models from start to end. Most soft attention models employ an attention template to locate distinctive aspects for aligning the weights of discrete sequences or image segments. Hard attention, as opposed to soft attention, is a stochastic, non-differentiable procedure that analyzes distinct regions as opposed to the image's primary characteristics. The attention network for image classification can determine an image's attention spectrum's weight of the arithmetic mean of attention. The method can gather image-based attention like natural language processing.

Because it collects features from data, a deep neural network can classify images pixel-wise. The attention mechanism²⁷ mimics human vision and helps identify significant characteristics quickly and precisely. CNN process all image information and details in all convolution layers. Multiple convolution layers and global average pooling in the last layer average the image's characteristics and attributes. This network's last affine fully connected layer determines image classification. Background and other non-essential information have a greater impact on categorization results as image size decreases. Large quantities of data plus a neural network that learns not to emit background information prevent outcomes from being inaccurate.

One way to generate one image from two or more convolution layers is to branch the output of one layer. We set sigmoid, the convolution output activation function, to work a value between zero and one for each pixel. Sigmoid keeps input values within the range of 0 to 1. The result of the convolution function multiplies the initial output. The two further layers assess the output's quantity. Near-zero values are unimportant. This configuration discards most sigmoid values approaching zero from the downstream recognition process. Configuring a neural network to estimate the area of focus using the result is the most common way to use attention for image classification.

Literature²⁷ has produced two visual-system-inspired attention strategies. The first is a top-down method that iteratively selects the correct region from a scene record pool. The bottom-up approach, however, highlights the most critical visual path places. Top-down iteration is slower than bottom-up. The bottom-up technique selects the most relevant regions from incoming data progressively, although sequential processes increase errors with depth.

The attention mechanism is a prominent study topic for many reasons. Any model's attention mechanism outperforms baseline techniques. Second, using backpropagation, the attention model can be trained with a base recurrent neural network. The transformer model's²⁸ induction was widely used in image processing, video processing, and recommendation systems, improving the attention model and avoiding the parallelized issue in recurrent neural networks.

Classification neural networks model data as a numeric vector of low-level features with the same weights against their capabilities. The attention model assigned variables to features based on their relevance. The attention model computes the weight distribution based on the input features and assigns greater values to features with high rank.

The attention mechanism has three layers: alignment, attention weight, and context vector. The attention layer calculates the alignment score between the encoded vector h = {h₁, h₂,….. h_n) and a vector v. As stated in Eqs. (13) and (14), the SoftMax computes the probability distribution α₁ by normalizing over all n elements of h where i = 1, 2,…n.

$$\propto_{i} = \frac{{exp^{{\left( {h_{i}^{\prime } v} \right)}} }}{{\mathop \sum \nolimits_{j = 1}^{n} exp^{{\left( {h_{i}^{\prime } v} \right)}} }}$$

(13)

$$O = \mathop \sum \limits_{i = 1}^{n} \propto_{i} {\text{h}}_{{\text{i}}}$$

(14)

From the equations above, hi provides vector v with vital information. The attention mechanism output O is a weighted sum of the encoded vector hi.

In the proposed work, we have combined channel attention and spatial attention mechanism in quaternion residual networks.

Channel attention

Using the inter-channel relationship between features, a channel attention^26,29,30 map is created. As each channel of a feature map is seen as a feature detector, the channel focuses on global features. It reduces the spatial dimension of the input feature map in order to appropriately compute channel attention. The channel attention method generates a sigmoid-activated one-dimensional (1-D) tensor for specified feature maps. In a few channel axes of feature maps, it is anticipated that some activation values of the 1-D tensor will be larger than the corresponding feature maps of interest, but others will be smaller so as to prevent the repetition of feature maps. We generate two spatial context descriptors, F_Avg^c and F_max^c, which stand for average-pooled features and max-pooled features, respectively.

Spatial attention

On the basis of the interstitial interaction between features, a spatial attention map is generated. In contrast to channel attention, which focuses on a channel's location, the spatial attention module emphasizes the location of an important feature. To compute spatial attention, we first apply the average-pooling and maximum-pooling processes along the channel axis, then concatenate the results to provide a useful feature descriptor. The concatenated feature descriptor is used in conjunction with a convolution layer to build a spatial attention map that encodes where to highlight or suppress.

Figure 3 shows how we placed channel attention and spatial attention blocks inside the building block of QCNN. These spatial and channel blocks were compatible with quaternion inputs. Adding channel and spatial attention blocks do not increase learnable parameters and hence does not give computational cost.

Comparison of recent related studies

Pneumonia detection via CXR has been an unresolved issue for many years, with the lack of publicly available data constituting the primary limitation. Extensive research has been conducted on traditional machine learning algorithms, which require domain expertise for feature extraction. Deep learning models produced a variety of architectures, such as VGGNet³¹, ResNet³², Inception ResNet³³, etc., which were used with transfer learning techniques³⁴ employing pretrained weights. Recent strategies for detecting pneumonia are split into three categories: (1) those in which researchers have prioritized region of interest extraction, (2) methods emphasizing feature extraction, followed by typical machine learning models or an ensemble of models with average performance, (3) a deep learning architecture based on transfer learning. Table 1 described the recently studied literature.

Table 1 Literature summary of recently related studies.

Full size table

Material and methods

Dataset

The dataset⁴² (https://www.kaggle.com/datasets/paultimothymooney/chest-xray-pneumonia) is organized into the train, test, and validation directory, with a subdirectory for each image type (Pneumonia/Normal) within each directory. There are 5,856 CXR images in JPEG format, split into two categories (P/N). The CXR images of one- to five-year-old infants at the Guangzhou Women and Children's Medical Center were chosen retrospectively from cohorts. CXRs were frequently taken as part of the patient's therapy. Before the images could be used to train an AI system, two expert physicians reviewed them. A third expert evaluated the assessment set more thoroughly to account for any potential grading problems. The training set comprised 5136 images; however, the test set only has 700. Table 2 displays the datasets for each classification.

Table 2 Class wise distribution of thedataset.

Full size table

Table 3 demonstrates that 75% of the dataset has been allocated to the training set, 80% to the test set, and 20% to the validation set.

Table 3 Train, test, and validation dataset partitioning.

Full size table

Proposed framework

The proposed method comprises image preprocessing with an image enhancement technique and image resizing, dataset imbalance handling, augmentation of training images, the transformation of input images into the quaternion domain, training on a Quaternion residual network with spatial and channel attention modules, and evaluation of Pneumonia classification with the proposed model. Figure 6 depicts our suggested design, which augments the structure of quaternion residual network architecture with channel and spatial attention modules.

Data preprocessing

In preparation for image normalization, the photos are converted into an array and sorted by 255. It allows the scale of an image to be specified between 0.0 and 1.0. It helps each image by removing abnormalities caused by shadows and illumination.

Image enhancement

Image quality affects the performance, and we performed it also to maintain uniformity in the entire dataset input images.

Data augmentation

By applying various types of transformation on input images, challenges of smaller dataset size is rectified.

Dataset balancing

It is done to maintain a balance between the input data size of all dataset classes.

Training of proposed architecture

The preprocessed dataset is projected in quaternion space and trained on the QCSA network.

Evaluation of performance

Trained model is then tested on unseen images to evaluate its performance.

Figure 4 diagrammatically shows the steps carried out in our experiment, which include preprocessing steps on the selected dataset, design of proposed architecture, training of model on the preprocessed dataset, followed by testing of evaluation of the performance of proposed architecture.

Spatial and channel attention modules focus only on the crucial part of the input and extract features from them only. Figure 5 shows the relative positioning of spatial and channel attention blocks in the proposed architecture.

Figure 6 displays the design of the proposed architecture, which shows the detailed structure of the proposed model. In this, we have employed four quaternion residual blocks with attention blocks.

Experimental analysis

Implementation details and hyper-parameter settings

To showcase our proposed architecture, we experimented with one of the most commonly downloaded datasets for testing on Kaggle, a benchmark dataset of CXR images. Utilizing these research and datasets for binary categorization. Python 3.7, Anaconda/3, and CUDA/10 are installed on a Windows server with an i5 CPU, 2 GB GPU, and 8 GB RAM. In addition to the aforementioned parameters, the Python libraries Tensorflow-Keras, OpenCV, matplotlib, os, math, and NumPy are employed. As shown in Table 4, we have trained the system for 40 epochs using hyperparameters.

Table 4 Hyperparameter setting used in the experiment.

Full size table

Performance metrics

Accuracy, precision, recall (or sensitivity), the F1 score, and specificity are used to evaluate the performance of the proposed system with respect to the binary classification problem at hand. True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) must be defined prior to defining these concepts. Assume that the two classes in a problem of binary classification are positive and negative. TP refers to the classification of a sample as positive. FP refers to a sample that has been incorrectly categorized as positive when it actually belongs to the negative class. In a similar manner, TN refers to a sample that has been correctly categorized as a member of the negative class. FN refers to a sample that is classed as negative despite belonging to the positive class.

Accuracy

It is the proportion of correctly classified samples to the total number of samples.

$$Accuracy = \frac{TP + TN}{{TP + TN + FP + FN}}$$

Precision

The proportion of properly recognized Positive samples to the total number of Positive samples determines precision (either correctly or incorrectly). Precision is the degree to which a model correctly identifies a sample as positive.

$$Precision = \frac{TP}{{TP + FP}}$$

Recall

The Recall is calculated as the proportion of correctly recognized Positive samples compared to the total number of Positive samples. Recall measures the model's capacity to recognize Positive samples. As recall grows, an increasing number of positive samples are detected.

$$Recall(sensitivity) = \frac{TP}{{TP + FN}}$$

F1-score

The F1-score combines precision and recall as a measurement. Typically, it is stated as a harmonic mean of precision and recall.

$$F1Score = \frac{2*Precsion*Recall}{{Precsion + Recall}}$$

Sensitivity

It is a test's capacity to appropriately detect diseased patients. It is the same as recall.

Specificity

It is a test's ability to correctly identify healthy individuals.

$$Specificity = \frac{TN}{{TN + FP}}$$

Receiver operator characteristic (ROC)

This curve displays the variations of sensitivity with respect to a (1-specificity). It is used to demonstrate the relationship between sensitivity and specificity.

Area under curve (AUC)

It indicates how successfully the model can differentiate between positive and negative categories.

Model's training

For model training, forty iterations of the Adam optimizer were utilized. Smaller batch sizes are chosen since they improve the model's test accuracy and expedite the network's capacity to learn. Adam's optimization has a 0.001 percent learning rate. Adam is utilized for training the model since it updates the network weight repeatedly based on the training dataset. The results of adaptive moment estimation in Adam. The dataset is separated into sections for training, validation, and testing. The CXR dataset's validation loss is the condition for epoch termination. The training accuracy is higher than the validation accuracy because the validation data points are newly inserted unseen data points and it gives a general idea how the proposed model will predict unseen samples.

Performance evaluation of the proposed methodology

In our experiment, we evaluated the performance of Pneumonia prediction on two architectures: (i) QCNN without Attention blocks and (ii) QCNN with spatial and channel attention blocks. The same set of hyper-parameters values as in Table 4 and the dataset in Table 2 has been used to make a comparative analysis. Table 5 presents the performance of both architectures. As in Table 5, we observed a rise of 4% in classification accuracy when attention modules are augmented in the QCNN architecture.

Table 5 Performance Comparison between the architectures.

Full size table

Analysis of result

The ultimate goal of Pneumonia detection using deep learning is to minimize false positive and negative cases, as they can have significant consequences for patient care. False positives can lead to unnecessary treatments, which can be costly and potentially harmful to the patient, while false negatives can result in delayed diagnosis and treatment, which can be life-threatening. Therefore, in the context of pneumonia detection, it is more important to prioritize accuracy over training and prediction time. Table 5 and Figs. 7, 8, 9, 10, 11 and 12 present the performance of QCNN with spatial and channel attention modules. The performance curve shows the promising results of Pneumonia prediction. Table 5 also shows that there is a significant rise in all performance metrics when spatial and channel attention modules are augmented in QCNN architecture. Figures 13, 14, 15, 16 and 17 show the performance metrics comparison between Pneumonia detection with QCNN and QCNN with attention modules. These Figs. 13, 14, 15, 16 and 17 show that by augmenting the attention mechanism in QCNN, we get a significant rise in performance which improves the result of Pneumonia detection.

Whave performed the experiment with this dataset applying different deep learning architectures which is presented by Table 6 with performance metrics such as accuracy, f1-score, number of trainable parameters, and non-trainable parameters. We have presented the accuracy of models by bar graph in Fig. 18, which shows that the proposed method performs better while capturing the complex features and attending the important region of an image.

Table 6 Performance comparison with other architectures on the same dataset.

Full size table

Conclusion and future work

In this research, we provide a system in which deep learning architectures are adapted to the quaternion domain, and it is augmented with attention modules that consist of channel attention and spatial attention modules to focus only on more relevant portions of the image. Quaternion-customized deep neural network architecture shows better classification performance, especially of multi-channel data, because of the real-valued conventional DNN they handle. This architecture was evaluated on a public dataset on Kaggle of CXR images for the detection of pneumonia. We customized the residual network in the quaternion domain. We first evaluated the residual quaternion network on the dataset, and it gave a test accuracy of 90.27%, which is better than real-valued residual CNN architecture. We evaluated quaternion residual network architecture augmented with spatial and channel attention modules, which gave an accuracy of 94.53%. We observed a 4% rise in accuracy in the experiment when the attention mechanism is integrated with Quaternion residual network. The proposed model displays generalization potential when evaluated on distinct data sets. If the proposed architecture is ensembled with predictions of experienced radiologists, it is expected to offer outperforming results, which are left as the future scope of the proposed work.

Data availability

The model, data and scripts are all available at https://github.com/singhsukhendra/MyExperiments2023/blob/main/QCSA_for_Pneumonia_Detection.ipynb.

Abbreviations

CNN:: Convolution neural networks
QNN:: Quaternion neural networks
QCNN:: Quaternion convolution neural networks
CXR:: Chest X-ray
RNN:: Recurrent neural networks
QCSA:: Quaternion channel spatial attention network

References

Wang, L., Lin, Z. Q. & Wong, A. COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 10, 19549 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhu, X., Xu, Y., Xu, H. & Chen, C. Quaternion convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV), 2018 631–647 (2018). https://doi.org/10.1007/978-3-030-01237-3_39.
Gaudet, C. J. & Maida, A. S. Deep quaternion networks. In 2018 International Joint Conference on Neural Networks (IJCNN) 1–8 (2018).
Rodriguez, P. et al. Pay attention to the activations: A modular attention mechanism for fine-grained image recognition. IEEE Trans. Multimed. 22, 502–514 (2020).
Article Google Scholar
Zhao, Q. et al. Attention guided feature pyramid network for crowd counting. J. Vis. Commun. Image Represent. 80, 103319 (2021).
Article Google Scholar
Fazil, M. et al. Show, attend and tell: Neural image caption generation with visual attention. IEEE Trans. Multimed. 2017, 1875–1886 (2021).
Google Scholar
Derose, J. F., Wang, J. & Berger, M. Attention flows: Analyzing and comparing attention mechanisms in language models. IEEE Trans. Visual Comput. Graph. 27, 1160–1170 (2021).
Article Google Scholar
Wang, F. et al. Residual attention network for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition 3156–3164 (2017).
Chen, J. et al. Channel and spatial attention based deep object co-segmentation. Knowl. Based Syst. 211, 106550 (2021).
Article Google Scholar
Yin, Q. et al. Quaternion convolutional neural network for color image classification and forensics. IEEE Access 7, 20293–20301 (2019).
Article Google Scholar
Zhou, Y., Jin, L., Liu, H. & Song, E. Color facial expression recognition by quaternion convolutional neural network with gabor attention. IEEE Trans. Cogn. Dev. Syst. 13, 969–983 (2021).
Article Google Scholar
Parcollet, T. et al. Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations To cite this version : HAL Id: hal-02107632 Quaternion Denoising Encoder-Decoder for Theme Identification of Telephone Conversations (2019).
Parcollet, T., Morchid, M. & Linares, G. Quaternion convolutional neural networks for heterogeneous image processing. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, Vol. 2019–May 8514–8518 (2019).
Parcollet, T., Morchid, M. & Linarès, G. A survey of quaternion neural networks. Artif. Intell. Rev. 53, 2957–2982 (2020).
Article Google Scholar
Qiu, X., Parcollet, T., Ravanelli, M., Lane, N. D. & Morchid, M. Quaternion Neural Networks for Multi-Channel Distant Speech Recognition 329–333 (2020) https://doi.org/10.21437/interspeech.2020-1682.
Parcollet, T. et al. Quaternion convolutional neural networks for end-to-end automatic speech recognition. arXiv preprint arXiv:1806.07789 (2018).
Shahadat, N. & Maida, A. S. Adding quaternion representations to attention networks for classification. arXiv preprint arXiv:2110.01185 (2021).
Singh, S. & Tripathi, B. K. Pneumonia classification using quaternion deep learning. Multimed. Tools Appl. https://doi.org/10.1007/s11042-021-11409-7 (2021).
Article PubMed PubMed Central Google Scholar
Singh, S., Tripathi, B. K. & Rawat, S. S. Deep quaternion convolutional neural networks for breast Cancer classification. Multimed. Tools Appl. https://doi.org/10.1007/s11042-023-14688-4 (2023).
Article PubMed PubMed Central Google Scholar
Shi, L. & Funt, B. Quaternion color texture segmentation. Comput. Vis. Image Underst. 107, 88–96 (2007).
Article Google Scholar
Lan, R. & Zhou, Y. Quaternion-Michelson descriptor for color image classification. IEEE Trans. Image Process. https://doi.org/10.1109/TIP.2016.2605922 (2016).
Article MathSciNet PubMed MATH Google Scholar
Wang, C., Wang, X., Li, Y., Xia, Z. & Zhang, C. Quaternion polar harmonic Fourier moments for color images. Inf. Sci. 450, 141–156 (2018).
Article MathSciNet MATH Google Scholar
Parcollet, T. et al. Quaternion recurrent neural networks. In 7th International Conference on Learning Representations, ICLR 2019 1–19 (2019).
Grigoryan, A. M. & Agaian, S. S. Complex and hypercomplex numbers. In Quaternion and Octonion Color Image Processing with MATLAB 1–84 (2018). https://doi.org/10.1117/3.2278810.ch1.
Balke, S., Dorfer, M., Carvalho, L., Arzt, A. & Widmer, G. Learning Soft-attention models for tempo-invariant audio-sheet music retrieval. In Proceedings of the 20th International Society for Music Information Retrieval Conference, ISMIR 2019 (2019).
Bastidas, A. A. & Tang, H. Channel attention networks. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Vol. 2019-June 881–888 (2019).
Kyunghyun Cho, A. K. C. et al. A general survey on attention mechanisms in deep learning. IEEE Trans. Knowl. Data Eng. 2017-Decem, 1–1 (2021).
Google Scholar
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
Zhang, Y., Fang, M. & Wang, N. Channel-spatial attention network for fewshot classification. PLoS ONE 14, 1–16 (2019).
Article Google Scholar
Zhang, Y. et al. Image super-resolution using very deep residual channel attention networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 11211. LNCS (2018).
Simonyan, K. & Zisserman, A. VGG-16 (2014).
He, K., Zhang, X., Ren, S. & Sun, J. ResNet. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2016).
Szegedy, C., Ioffe, S., Vanhoucke, V. & Alemi, A. A. Inception-v4, inception-ResNet and the impact of residual connections on learning. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017 4278–4284 (2017) https://doi.org/10.1609/aaai.v31i1.11231.
Pan, S. J. Transfer learning. In Data Classification: Algorithms and Applications (2014). https://doi.org/10.1201/b17320.
Kundu, R., Das, R., Geem, Z. W., Han, G. T. & Sarkar, R. Pneumonia detection in chest X-ray images using an ensemble of deep learning models. PLoS ONE 16, e0256630 (2021).
Article CAS PubMed PubMed Central Google Scholar
Siddiqi, R. Efficient pediatric pneumonia diagnosis using depthwise separable convolutions. SN Comput. Sci. https://doi.org/10.1007/s42979-020-00361-2 (2020).
Article Google Scholar
Chakraborty, S., Paul, S. & Hasan, K. M. A. A transfer learning-based approach with deep CNN for COVID-19- and pneumonia-affected chest X-ray image classification. SN Comput. Sci. 3, 1–10 (2022).
Article Google Scholar
Habib, N., Hasan, M., Reza, M. & Motiur, M. Ensemble of CheXNet and VGG-19 feature extractor with random forest classifier for pediatric pneumonia detection. SN Comput. Sci. 1, 1–9 (2020).
Article Google Scholar
Nahiduzzaman, M. et al. A novel method for multivariant pneumonia classification based on hybrid CNN-PCA based feature extraction using extreme learning machine with CXR images. IEEE Access 9, 147512–147526 (2021).
Article Google Scholar
Amyar, A., Modzelewski, R., Li, H. & Ruan, S. Multi-task deep learning based CT imaging analysis for COVID-19 pneumonia: Classification and segmentation. Comput. Biol. Med. 126, 104037 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rajpurkar, P. et al. CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning. 3–9 (2017).
Ho, W. H. et al. Artificial intelligence classification model for macular degeneration images: A robust optimization framework for residual neural networks. BMC Bioinform. 22, 1–10 (2021).
Article Google Scholar

Download references

Author information

Authors and Affiliations

JSS Academy of Technical Education, Noida, India
Sukhendra Singh, Manoj Kumar & Birendra Kumar Verma
National Institute of Technology Patna, Patna, India
Abhay Kumar
Kebri Dehar University, Kebri Dehar, Ethiopia
S. Shitharth

Authors

Sukhendra Singh
View author publications
You can also search for this author in PubMed Google Scholar
Manoj Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Abhay Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Birendra Kumar Verma
View author publications
You can also search for this author in PubMed Google Scholar
S. Shitharth
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed equally to this work. The manuscript was reviewed by all authors.

Corresponding author

Correspondence to S. Shitharth.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Singh, S., Kumar, M., Kumar, A. et al. Pneumonia detection with QCSA network on chest X-ray. Sci Rep 13, 9025 (2023). https://doi.org/10.1038/s41598-023-35922-x

Download citation

Received: 10 January 2023
Accepted: 25 May 2023
Published: 03 June 2023
DOI: https://doi.org/10.1038/s41598-023-35922-x

This article is cited by

Enhancing pediatric pneumonia diagnosis through masked autoencoders
- Taeyoung Yoon
- Daesung Kang
Scientific Reports (2024)
Efficient pneumonia detection using Vision Transformers on chest X-rays
- Sukhendra Singh
- Manoj Kumar
- Shitharth Selvarajan
Scientific Reports (2024)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Segment anything in medical images

Transparent medical image AI via an image–text foundation model grounded in medical literature

Towards a general-purpose foundation model for computational pathology

Introduction

Major contributions

Background and similar works

Quaternion convolution neural network (QCNN)

Algebra of quaternion numbers

Hamiltonian product

Attention mechanism

Channel attention

Spatial attention

Comparison of recent related studies

Material and methods

Dataset

Proposed framework

Data preprocessing

Image enhancement

Data augmentation

Dataset balancing

Training of proposed architecture

Evaluation of performance

Experimental analysis

Implementation details and hyper-parameter settings

Performance metrics

Accuracy

Precision

Recall

F1-score

Sensitivity

Specificity

Receiver operator characteristic (ROC)

Area under curve (AUC)

Model's training

Performance evaluation of the proposed methodology

Analysis of result

Conclusion and future work

Data availability

Abbreviations

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Enhancing pediatric pneumonia diagnosis through masked autoencoders

Efficient pneumonia detection using Vision Transformers on chest X-rays

Comments

Search

Quick links