TOPSIS aided ensemble of CNN models for screening COVID-19 in chest X-ray images

The novel coronavirus (COVID-19), has undoubtedly imprinted our lives with its deadly impact. Early testing with isolation of the individual is the best possible way to curb the spread of this deadly virus. Computer aided diagnosis (CAD) provides an alternative and cheap option for screening of the said virus. In this paper, we propose a convolution neural network (CNN)-based CAD method for COVID-19 and pneumonia detection from chest X-ray images. We consider three input types for three identical base classifiers. To capture maximum possible complementary features, we consider the original RGB image, Red channel image and the original image stacked with Robert's edge information. After that we develop an ensemble strategy based on the technique for order preference by similarity to an ideal solution (TOPSIS) to aggregate the outcomes of base classifiers. The overall framework, called TOPCONet, is very light in comparison with standard CNN models in terms of the number of trainable parameters required. TOPCONet achieves state-of-the-art results when evaluated on the three publicly available datasets: (1) IEEE COVID-19 dataset + Kaggle Pneumonia Dataset, (2) Kaggle Radiography dataset and (3) COVIDx.

• We develop a lightweight but efficient CNN model having much fewer trainable parameters as compared to state-of-the-art CNN models used in CAD systems for the purpose of COVID-19 and pneumonia detection. • We design a TOPSIS, an MCDM technique, aided ensemble method which is found to be more effective than standard ensemble methods such as majority voting, product rule and sum rule. • The performance of the proposed model on three public datasets is comparable to state-of-the-art methods that require heavy computational resources.
The remaining parts of the paper are organised in the following manner. In the "Literature review" section, we first briefly review some past works from the literature of COVID-19 and pneumonia diagnosis system and then analyse each of them to describe their advantages and shortcomings. In the "Preliminaries" section, we briefly describe some prerequisites useful in describing the working procedure of the proposed model. Next, in the "Proposed method" section, we describe in detail how our model (i.e., TOPCONet) works. In the "Experimental results" section, we present all the experimental findings related to this work while we analyse the results and explain each of the experimental findings in the "Discussion" section. We also summarise the advantages and limitations of our work in this section. In the "Conclusion" section, we end with concluding remarks and some scopes for future research.

Literature review
Several research attempts were made by researchers to design effective CAD systems for COVID-19. Here, we review some of the existing works employed for early detection of COVID-19 and/or pneumonia from chest X-ray images. Wang et al. 17 designed a CNN architecture, dubbed as COVID-Net, that achieved 80% sensitivity for COVID-19 detection. This is one of the fundamental works that laid the ground-work for further research in this domain with deep learning-based CAD models for detection of the said diseases. The article by Demir 12 presented a long short term memory (LSTM) network-based approach where the Sobel edge detection and marker-controlled watershed segmentation approach were used in the pre-processing stage. The author was able to significantly boost the model performance, but the computational cost was considerably high because of training both CNN and LSTM models. Convolutional LSTM (ConvLSTM) based approach has also been seen in the literature recently 18,19 .
Classical handcrafted feature engineering-based approaches have also proven their competence for COVID-19 detection. Panetta et al. 20 proposed a novel shape-dependent Fibonacci-p patterns-based feature descriptor for COVID-19 detection. In another work, Chandra et al. 21  www.nature.com/scientificreports/ (GLCM) and histogram oriented gradient (HOG) based features and then used the binary grey wolf optimizer (BGWO) based feature selection technique to select the near-optimal feature set. Finally, a bi-stage majority voting-aided ensemble mechanism was utilised with the seven different machine learning-based classifiers. The authors classifies normal and abnormal images quite successfully, but the model performance was not good while classifying infected samples into pneumonia and COVID-19 cases. One of the possible reasons for such a behaviour might be related to a recent medical research findings 22 where the authors state "In some cases, definite discrimination of the two (COVID-19 and pneumonia) entities might be impossible solely based on the imaging, however, some radiologic features may suggest one diagnosis over the other". Oh et al. 23 used a patch-based classification strategy to mitigate the difficulties due to a limited training sample while using a CNN model-based classification strategy. They first segmented the images using deep CNNs and then generated random patches to train the CNNs using the ResNet-18 model. Thus they were able to train the CNN model using limited training datasets. The approach also lowered the training cost by a margin. In the said work, the normalised validation accuracies were used as fuzzy measures for the base classifiers but it may not work with other datasets. Similarly, in a very recent article, Paul et al. 24 proposed to use an inverted bell-shaped weighted average scheme to ensemble the probabilistic outcomes of deep transfer learning models. In another work, Das et al. 25 designed a bi-stage CAD system for the detection of COVID-19 and pneumonia-infected chest X-ray images. In the first stage, the authors segregated the infected chest X-ray images from the normal ones, and in the later stage, the COVID-19 cases were identified from the infected cases. The authors used the pre-trained VGG-19 as the feature extractor, and a shallow learner was employed as the classifier. The performance of the bi-stage model was good but once a sample is classified erroneously in the first stage, it remained unclassified throughout. Moreover, the bi-stage classification scheme for detecting a COVID-19 or pneumonia case needs classification of a sample twice, which lengthens the testing time. A stacked CNN model using sub-models of VGG-19 and XceptionNet was proposed by Gour et al. 26 , but use of such large CNN models like VGG-19 and XceptionNet increases the training and inference time. Hasoon et al. 27 used a combination of image preprocessing techniques and machine learning algorithms to get 6 different models those are local binary pattern (LBP)-kNN, HOG-kNN, Haralick-kNN, LBP-SVM, HOG-SVM, and Haralick-SVM for chest X-ray image classification. The authors showed that the use of machine learning algorithms failed to produce better result over the CNN based models. A new COVID-19 detection model CoWarriorNet was proposed by Roy et al. 28 that consists of two networks a classification network and a confidence network. The proposed architecture is a derivative of U-Net architecture, which results in poor extraction of image-derived information.
Ouchicha et al. 29 provided a CNN architecture with a sort of inter-network skip connections for screening of COVID-19 cases from chest X-ray images. The authors used two parallel CNN models and connected each of the layers of networks with inter-network and intra-network residual connections for feature transfer to encounter the vanishing gradient problem. Additionally, it is interesting to note that downsampling and skip connections have also been proposed in the literature in the past for COVID-19 detection 30 . Khuzani et al. 31 employed multiple feature extraction schemes in which a novel pooling strategy was used to extract only the relevant features. Finally, the authors used a fully connected neural network to classify chest X-ray images using the extracted features. The authors used the machine learning-based approach, instead of the conventional CNN approach which could significantly reduce the computational cost, but the approach failed to tackle the class imbalance issue. In another work, Kenaway et al. 32 first used a popular CNN architecture as a feature extractor and then proposed an advanced version of the squirrel search optimisation algorithm to select the best set of features. They finally used the multi-layer perceptron (MLP) on top of the selected features for classification, leading to a significant increase in training time, as the transfer learning model is fine-tuned. Besides, the feature selection approach is trainable and the classifier also needs time to train. Gour et al. 33 introduced a uncertainty aware CNN model known as the UA-ConvNet model. The problem of this CNN model is that in its initial stage, a pre-trained EfficientNet-B3 model is fine-tuned, which makes it time-taking. Bashar et al. 34 used transfer learning concept while using CNN models like AlexNet, GoogleNet, VGG-16, VGG-19 and DenseNet to classify chest X-ray images. The authors enhanced the input image quality using techniques like anisotropic diffusion filter, Fourier transform, and edge-aware local contrast manipulation. Here, again the use of transfer learning techniques makes the process computationally expensive to train. Similar problem faced by Senan et al. 35 , where ResNet50 and AlexNet models were used to generate features which are combined with GLCM and LBP features.
Khan et al. 36 proposed an XceptionNet-based classification model, which performed well, but the metrics reveal that it somewhat failed to handle imbalanced classes despite using a CNN-based architecture. Hussain et al. 37 proposed a 22-layers deep CNN model. Goel et al. 38 proposed a CNN model whose hyperparameters were optimised using a nature-inspired meta-heuristic approach: BGWO. This method provided a good alternative to the tiring grid search method, but the model has a slightly higher false-positive rate, which might not be useful in practical field. Aslan et al. 39 designed a hybrid approach using a modified version of the AlexNet hybridised with bi-directional LSTM (BiLSTM) providing better results worth-while results in processing the features for classification. Multi-level feature extraction technique was used by Naeem et al. 40 where the authors first extracted features like GIST, scale invariant feature transform (SIFT), and CNN based deep features and then they used LSTM network to perform classification. Similarly, classical classifiers like artificial neural network (ANN), SVM, KNN, and deep learning classifiers like recurrent neural network (RNN), LSTM were used for predicting COVID-19 cases from chest X-ray images by Goyal et al. 41 . The problem with LSTM and RNN architectures are that they are prone to over-fitting and difficult to apply dropout algorithm there.

Motivation
Numerous deep learning-based methods have been explored in the past for designing CAD systems for the diagnosis of COVID-19 and pneumonia from chest X-ray images. Most of them aimed at designing a system that could perform more efficiently than its predecessors, which is the primary objective of designing a new CAD system at its initial stage. Hence, in many cases, these models became computationally heavier. However, an ideal CAD system should be computationally inexpensive and memory-efficient for its wide applicability and portability so that such a model can be easily executable in a resource-constrained environment without compromising its performance. Besides, the increment of learnable parameters in CNN models increases the space and time requirements. Another important issue is that such models at times extract some redundant features which in turn consume additional space and time during execution without much improvement of performance 42 . Additionally, according to the lottery ticket hypothesis 43 , we can prune the learnable parameters of a heavy CNN model to make it much more cost-efficient and portable without compromising its performance. Keeping these facts in mind, we aim at design a lightweight CNN model that can detect COVID-19 and pneumonia-infected chest X-ray images from normal ones. Additionally, it is to be noted that the authors of the works 6,21,44 used a hierarchical approach to design better CAD models for diagnosis of the said diseases, in which they applied various heavier CNN architectures at each stage. The use of multiple pre-trained CNN models along with evaluating an input image several times makes these models inefficient in terms of storage and computation needs. Hence, we evaluate an input image only once to make it more memory and time-efficient. Apart from these, some methods 6,45 used ensemble techniques in which different CNN models were applied as base classifiers to obtain different learned features from input images. However, the study 46 shows that we can gather complementary information by pre-processing the original image to extract features. Hence in our model, we pass two differently pre-processed images along with the original one to the CNN model for extracting three different features from it. We also observe from the literature that the work reported by Paul et al. 45 used a machine learning-based ensemble technique and showed its effectiveness in improving the end performance. In a work by Pramanik et al. 6 , where the authors propose a fuzzy aggregator to ensemble the CNN outcomes which is non-trainable in nature.
In this work, to capture the complementary information provided by the base classifiers, we propose an ensemble with an MCDM-based aggregator known as TOPSIS. The ensemble framework has a few tunable parameters compared to the machine learning-based ensemble techniques. The use of such methods aims at developing a lightweight but effective CAD model for distinguishing COVID-19 and pneumonia-infected chest X-ray images from normal ones.

Preliminaries
Robert's edge detection technique. The Robert's edge detection technique uses 2D spatial gradient calculation to detect edges in an input image. It uses two operators of size 2 × 2 , known as Robert cross operators, which help in calculating the gradient image. The gradient image is used for identifying the edges in the image. Here, our idea is to extract the minute details present in a chest X-ray image and thus use a small-sized mask while calculating gradient image, which would be helpful rather than using larger masks. This is why we chose this edge detection technique. The Robert cross operators are shown in Eqs. (1) and (2).
In this edge detection technique, an input image is convolved with these two operators i.e., δ x and δ y . Let δ x and δ y be convolved with a gray-scale image (say, I g ) and generate two gradient images (say, G x and G y ) using Eqs. (3) and (4).
We obtain the final edge image using a threshold value (say, th). In the present work, we use the mean of the gradient magnitude values appearing in G. The parameter th is calculated using Eq. (6).
In Eq. (6), M and N stand for the height and width of the input image respectively. Finally the edge image (say, I e ) is obtained using Eq. (7). www.nature.com/scientificreports/ Convolutional neural network. This is probably one of the most revolutionary research findings in the domain of document image processing 47 , which later extended to multiple fields such as video processing 48 , speech processing and emotion recognition. CNN is used as a discriminative architecture 49 inspired by the concept of a time delay neural network (TDNN). In TDNN, the weights are shared in a temporal dimension, which ultimately reduces computation. On the other hand, the CNN model does not require weights to be shared, and convolutions are replaced by matrix multiplication as in standard NNs. This replacement significantly decreases number of weights. Moreover images can be fed as raw input to the network, whereas in classical NN models, features, extracted from the images, are fed to the NNs. In other words, CNNs are seen as a combination of a learned feature extractor and a classifier. Weights for both the classifier and the feature extractor model are learned using the back propagation algorithm 47 . Some important parts/characteristics of any CNN models are described in the later parts.
Convolutional layer. One of the most important parts of any CNN model is the convolutional layer which is suggested by its name. In a convolutional layer, a linear operation occurs in which the input image is convolved with a set of learnable weights. For applying the convolutional process on an image, a two-dimensional array (i.e., matrix) of weights is used which is known as the kernel or filter. The filter is smaller in size compared to the input image. To perform convolution element-wise matrix multiplication is performed between the filter matrix and a filter-sized patch of the input image. After this, all the values of the patch are added together to produce a single value. By using a filter smaller than the input size it can then be applied to the various parts of the input during convolution. Thus, the same weights can be applied to the various parts of the input data resulting in translation invariance. The invariance to translation is an important property for knowing whether a feature is present in the input image rather than in the location of the feature in the image.
Activation function. Activation functions play a crucial role during the training of the CNN models. Let z i = g(x, w i ) be the mathematical formulation of any neuron and y i = f (z i ) be the value that is to be transmitted to the next neuron. The use of an activation function is popular because it helps in countering the vanishing gradient problem associated with any CNN model. In standard NNs, activation functions control the flow of information among the neurons present therein. There are several diverse activation functions in literature at present. The choice of activation functions mainly depends on the objective of the work at hand. Probably, one of the most popularly used activation functions is the rectified linear unit (ReLU) which is well used for image recognition-based tasks 47 .
Batch normalisation. Initially suggested by Ioffe et al. 50 , batch normalisation greatly improvises the convergence of the CNN model during training and also helps produce stable results. Batch normalisation helps in coordinating the update of weights in multiple layers in the model. To do so, it scales the output of the layer such as standardising the activations of each input variable or the activations of the nodes of the previous layer.
Here, standardising means rescaling data to have a mean of 0 and a standard deviation of 1. This standardisation stabilises during the training phase of the CNN model and also speeds up the training time.
Pooling layer. A pooling layer is designed to reduce the dimension of feature maps. This layer ensures that the relevant information passes on to the next level of the network while the redundant features are discarded. Thereby pooling serves as a filter mechanism for the extracted features. In this work, we use max-pooling which returns the maximum local value of each pooling window.
Fully connected network. In fully connected network (FCN) layers, the neurons of the current layer remain connected to each neuron of the previous layer. Figure 1 shows how a single neuron of one layer connects with multiple neurons of another layer during the construction of a multi-layered FCN model. In this figure, the node y is represented with the weighted sum of its inputs as shown in Eq. (8). In this equation, the parameters w and b are optimised using the back propagation algorithm. Relevant studies in the past have shown that a neuron having a larger w value means that it is more important. The parameter b is the bias variable used to shift the value of y. The output layer of the FCN part of a CNN model is an N-dimensional vector, where N is the number of classes for a classification problem. A pictorial description of the classical multi-layered FCN model is given in Fig. 2.
The final FCN layer uses a softmax function (see Eq. 9) that determines the class level probabilities of an input for a classification problem.
(7) I e (x, y) = edge pixel : G(x, y) < th background pixel : otherwise TOPSIS algorithm. TOPSIS, proposed by Hwang and Yoon 15 , is a simple (though effective) multi-criteria decision analysis (MCDA) or MCDM method. It is derived from the idea that the geometric distance of the highest-ranked alternative is the nearest to the ideal solution while the lowest-ranked alternative is the nearest to the worst solution. The positive ideal solution capitalises on the beneficial attributes most while minimising on the cost attributes. The TOPSIS method uses all of the information in the form of a decision matrix and criteria weights as a vector and generates a ranking for all the alternative candidate. A detailed procedure is described in Algorithm 1.  www.nature.com/scientificreports/

Proposed method
In this work, we propose TOPCONet, an ensemble-based approach using lightweight CNN based three base learners (hereafter termed Classifier 1, Classifier 2 and Classifier 3) to detect COVID-19 and pneumonia from chest X-ray images. The base classifiers use the same CNN architecture but work with different forms of input images. The image variants are red channel image (i.e., 1-channel red image), original RGB image, and a 4-channel image prepared by combining the edge image generated by the Robert's edge detection method with the original RGB image used in Classifier 1, Classifier 2 and Classifier 3 respectively. To begin with we resize all the images to the spatial dimension of 224 × 224 pixels to feed to the base classifiers. While evaluating we estimate the class probability scores using the base classifiers and feed them to the TOPSIS-based ensemble method. The complete pipeline for our proposed model can be found in Fig. 3.
Customised CNN. It has been described earlier that a CNN architecture consists of two main components viz., a feature extractor and a classifier network. Therefore, while designing a customised CNN architecture for any specific task, both the feature extractor and the classifier should be stressed upon. A very deep network consisting of many layers might be effective for extracting very deep features but on the other hand, optimisation of the model becomes a difficult task which means the optimisation of the very deep network is much difficult when compared to small-sized networks. The same has been empirically and theoretically shown by Choromanska et al. 53 . The major motivations for using CNNs as a feature extractor over handcrafted features is the presence of shared weights, pooling, local connections and many possible layers 47 . Therefore for each of these layers, while designing, one thing that should be considered is the extraction of the best possible features for the classifier to perform efficiently. It must also be kept in mind to minimise the number of redundant features. To efficiently deal with this, nowadays deep feature selection-based approaches are also gaining popularity 42,54 . Filter pruning strategy is one of the appproaches to reducing the number of redundant features; one such work was communicated by Luo et al. 55 . On a similar note, in this work, we aim to extract meaningful and less redundant features. Considering the mentioned facts, the customised CNN designed with five blocks of convolutions, is carefully tuned to extract the best possible features and minimize the redundant extracted features. Each level of convolution is followed by a layer of activation to control the flow of information to the next level of convolution and for, batch normalisation for re-scaling and re-centring of the values to have smoother learning and max-pooling to get rid of redundant features. It the end, an FCN with two hidden layers is coupled with exponential linear unit (ELU) activation function to impart a non-linearity to the input signal. Thus, we ensure that both the feature extractor and the classifier work efficiently. The detailed structure is shown in Fig. 4.

TOPSIS-aided ensemble method.
As already been mentioned, in the present work, we employ a TOP-SIS-aided ensemble method to decide the final class of an input chest X-ray image depending on the class level confidence scores generated by the base classifiers. We use the available classes as the alternatives and the confidence scores from each of the base classifiers as the criteria values for each of the alternatives (i.e., classes). The ijth element of the decision matrix (i.e., r ij ) used in the TOPSIS method is constructed by considering the www.nature.com/scientificreports/ confidence score of the ith base classifier corresponding to the jth class. We employ all the steps as described in Algorithm 1 on the decision matrix to obtain the top most-ranked alternative which here is the final class.
It should be noted that we resort to column normalisation of the decision matrix to comply with the TOPSIS algorithm. During the normalisation process, we divide the input score by the square root of the summation of squares of the observations. In our case, all the criteria are beneficial.
Computational complexity of the TOPSIS aided ensemble model. As the proposed ensemble method is nontrainable, it is important to discuss the computational complexity of the model. In this part, we discuss the computational complexity using the O-notation.   www.nature.com/scientificreports/

Experimental results
In this section, we discuss and also attempt to explain the experimental findings. It should be noted that all the performances are evaluated on machines supported by Nvidia Tesla T4 GPUs. The programming environment used to train and test our model is Python 3.8.

Dataset description.
To evaluate the performance of our proposed method for COVID-19 and pneumonia detection from chest X-ray images we use samples from two datasets. The first dataset (termed as Dataset-1 hereafter) consists of chest X-ray images of pneumonia-affected patients and normal subjects from Guangzhou Medical Centre: Chest-X-ray-Pneumonia 52 and COVID-19 affected patients from IEEE COVID-chest X-raydataset 51  Performance comparison: customised CNN model vs. state-of-the-art CNN models. It has already been mentioned that we propose a customised CNN model that is lightweight in nature (see Fig. 4).
Based on this CNN model, three different classifiers are formed. We compare the performances of the proposed classifiers (including TOPCONet) with state-of-the-art CNN models following the transfer learning concept and www.nature.com/scientificreports/ training from the scratch. In the transfer learning approach, we load the pre-trained weights of the respective CNN models which are obtained after training the models on the ImageNet dataset. Subsequently, using the loaded weights we fine-tune the models on both datasets. To maintain uniformity during performance comparison, the total number of epochs used is 15 for both state-of-the-art CNN models and the proposed classifiers.  Table 3. Performance comparison of the proposed 3 base classifiers and TOPCONet with some state-of-theart CNN models utilising the concept of transfer learning and training from scratch on Dataset-2. For citing recall (Rec), precision (Pre) and F1-score micro-average method is used. *After the CNN models' name means that the models have been trained from scratch. # As the proposed ensemble method is non trainable, hence time required to run on the training split is not applicable (N/A). RA indicates recognition accuracy and bold faced numbers indicate the best scores.  (7,978,856). Moreover, these tables also show that the total number of trainable parameters used in the proposed TOPCONet model is 1,001,324 (trainable parameters of Classifier 1 + Classifier 2 + Classifier 3 + TOPSIS-aided ensemble) which is evidence that although TOPCONet consists of three base classifiers it uses less number of trainable parameters than the state-of-the-art CNN models. Hence, we can safely comment that the proposed TOPCONet is computationally less expensive than the popularly used state-of-the-art CNN models used in literature in terms of trainable parameters.

Performance (in %) in terms of
For Dataset-1, when compared in terms of accuracy, the proposed Classifier 3 among the three base classifiers demonstrates the highest recognition accuracy which is 0.58% more than the second-highest 98.20% obtained using the VGG-19 model which uses transfer learning concept. In the case of the precision score, the MobileNetV2 model trained from scratch on Dataset-1 offers a better result (98.39%) which is 0.05%, more than our Classifier 2. For recall the DenseNet121 (trained from scratch) and MobileNetV2 (following transfer learning concept) give better results over our base classifiers. In the case of the F1-score, the proposed Classifier 3 performs the best. However, TOPCONet outperforms all the CNN models on this dataset in terms of all the parameters. Now comparing our base classifiers with other popularly used CNN models on Dataset-2, it is evident that the proposed Classifier 3 outperforms all the other popular CNN models in terms of recognition accuracy with a difference of 0.63% with respect to the second-best result given by the present Classifier 2. It is also to be noted that apart from the accuracy, the Classifier 3 performs better in terms of precision, recall and F1-score values which are 0.54% , 0.48% and 0.77% better than the second-best method.
Although in some cases the proposed base classifiers are not able to outperform the other CNN models, it is to be kept in mind that it has far fewer of trainable parameters than those other CNN models which makes it easier to train, making it appropriate use in a resource constraint environment. It is evident from the training time column that the proposed base classifiers take far lesser time to train than the other state-of-the-art CNNbased classifiers in the case of both the datasets. In the case of CNN models, training from scratch takes more time than using the transfer learning approach. However, in the case of the proposed classifiers although they are being trained from scratch they take less time than the standard CNN models using transfer learning approach.
In a nutshell, from the experimental results discussed in this subsection and shown in Tables 2 and 3 we can safely comment that (1) the proposed customised CNN model takes much less time to train from scratch as compared to state-of-the-art CNN models (please note Classifier 2 take the same input type (i.e., RGB image) as other state-of-the-art CNN models), (2) our customised CNN model also takes less time as compared to other CNN models when they are trained according to the transfer learning concept, (3) both the customised CNN model-based base classifiers and TOPCONet have less trainable parameters compared to other CNN models, (4) unsurprisingly, TOPCONet outperforms all the models with moderate training and testing time.
Performance comparison: TOPSIS-aided ensemble method vs. standard ensemble methods. We implemented some standard ensemble methods such as product rule, sum rule, weighted average method and majority voting method. After evaluation on the test sets of Dataset-1 and Dataset-2, the performances of the ensemble methods are shown in Table 4. Here we would like to mention that for implementing the TOPSIS-aided ensemble approach, we have first ranked the base classifiers depending on validation accuracy and then assigned the weights (termed as criteria weights in the TOPSIS method) 0.55, 0.35 and 0.1 chronologically from the best to the worst-performing base classifiers. We set these values experimentally. This parameter tuning is conducted only for Dataset-1 while using the same values for evaluating TOPCONet on Dataset-2. In terms of accuracy, the TOPSIS aided ensemble method outperforms all the other standard ensemble methods used here for comparison on Dataset-1. The difference in accuracy is 0.16% when compared to the second best performers: weighted average and sum rule based ensemble methods. Considering the results on Dataset-2, the difference in accuracy between the highest accuracy obtained using the proposed method and the second best result obtained by product rule ensemble method is 0.19%. www.nature.com/scientificreports/ In terms of precision for Dataset-1, the sum rule and weighted average-based ensemble methods exceed that of the proposed method by 0.18% and for Dataset-2, the product rule-based ensemble outperforms the proposed TOPSIS-aided ensemble method by 0.26% . The highest result for performance in terms of the recall value is given by the proposed method which is 98.26% for Dataset-1 but for Dataset-2 it is the product rule based ensemble that exceeds the proposed method by 0.12% . If we compare in terms of F1-score on the Dataset-1, it can be concluded that the majority voting ensemble method gives the best F1-score with 0.05% more than the proposed method while for Dataset-2, it is the product rule based ensemble method that gives the highest F1-score with the difference being 0.19% from the proposed method.
Moreover, the performances of the ensemble methods of weighted average and sum rule are almost the same. It may have happened because, in the case of the weighted average method, we take the normalised validation accuracies as the weights for each classifier which themselves are almost similar and so despite using weighted average ensemble to prioritise the best performing classifiers, the ensemble gives almost equal priority to the three classifiers which in turn become similar to sum rule based-ensemble method.

Analysis of execution time at module level.
In this section, we describe the execution time for different steps in TOPCONet with the machine configuration described previously. The time required to process one batch of samples (i.e., 16 images at a time) at different stages is tabulated in Table 5. From Table 5 it is clear that the total time required to test one sample using the proposed approach is (240 + 12 + 13 + 14 + 4)/16 which is nearly 18 ms. The increase in training time for Classifier 2 and Classifier 3 over Classifier 1 is attributed to the fact that the number of trainable parameters for Classifiers 2 and 3 is higher than Classifier 1. This is because Classifier 2 and Classifier 3 accept 3-channel and 4-channel input images respectively, while Classifier 1 accepts 1-channel input images.

Performance of the TOPCONet model using 5-fold cross validation technique.
To test the effectiveness of the proposed model, we also evaluate our model performance using the standard 5-fold crossvalidation scheme. In this setting, we split the dataset into five separate folds. Each time sample of four splits is considered a training samples while the rest samples are used for the testing. The detailed experimental outcomes are recorded in Table 6. From Table 6 it is evident that for Dataset-1, the TOPCONet model performs the best on Fold 4 experiment with accuracy 98.24%. In the case of Dataset-2, the highest accuracy for the proposed ensemble method is achieved during Fold 3 experimentation which is 97.85%. For the base classifiers in the case of Dataset-1, the highest accuracy for Classifier 1 is achieved in Fold 2, for Classifier 2 it is Fold 4 and for Classifier 3 it is Fold 3 experiment. For Dataset-2, during Fold 3 experimentation we attain the highest accuracy for all the base classifiers. Table 5. Time required to process 16 sample images at a time. The training time reported is recorded as the sum of total time required to train for each epoch. The time reported is the average of five independent runs.

Process Time to train (in ms) Time to test (in ms)
Pre-processing 240 240 Classifier 1  350  12   Classifier 2  375  13   Classifier 3  425  14 TOPSIS ensemble -4 Table 6. Accuracies (in %) while evaluating using 5-fold cross validation scheme.  28 and Senan et al. 35 . To have a fair comparison with these methods include the experimental setups, especially the approach followed to prepare the samples of train and test sets, used by these methods. We found that two major approaches were used by these researchers: (1) partitioning the samples of the entire dataset into train and test sets 18,20,25,[34][35][36]38,39,[57][58][59]61 , as shown in Table 1 (we call this experimental setup as hold-out test set approach) and (2) standard 5-fold cross validation approach 26,27,29,30,33,37,40,41,56,60 as described in subsection "Performance of TOPCONet model using 5-fold cross validation technique". However, the method designed by Paul et al. 24 do not follow any of the mentioned experimental setups and hence we have developed the models at our end and evaluated following our experimental schemes. For the method proposed by Paul et al. 24 we use transfer learning concept. Likewise, we also estimate the performances of the proposed TOPCONet model using both the mentioned approaches. On the other hand, the method proposed by Bashar et al. 34 and Senan et al. 35 designed for 4-class classification problem. Hence, for the comparison with our method we train and test these two models on the present datasets using their experimental setups, and consider the best performance obtained. The comparative results are recorded in Table 7 and Table 8 for Dataset-1 and Dataset-2 respectively. For Dataset-1, among the state-of-the-art methods evaluated following the hold-out test set approach, the methods proposed by Senan et al. 35 and Das et al. 25 stand first and second with a test accuracy of 98.70% and 98.45% respectively. Also, the best recall, F1-score and precision values are given by the optimized ResNet50 model by Senan et al. 35 with values 98..67%, 98.67% and 98.00% respectively. Whereas, the pre-trained Xception model proposed by Jain et al. 57 gives the best precision of 98.00% among the state-of-the-art methods. Now, when comparing TOPCONet with state-of-the-art methods that followed the hold-out test set based experimental setup, results from Table 7 are the evidence that the highest accuracy for the Dataset-1 is given by the proposed method, which is 0.08% more than the optimized ResNet50 model proposed by Senan et al. 35 . In terms of precision score, our method exceeds the pre-trained Xception model by Jain et al. 57 and the optimized ResNet50 model by Senan et al. 35 by 0.67% . However, in terms of F1-score the optimized ResNet50 model outperforms our TOPCONet method by ( 0.33% .) and in terms of recall it is outperformed by Senan et al. 35 by 0.67% . In the case of methods that followed the five-fold cross-validation approach, the proposed method outperforms the state-of-the-art method that employed CoroDet.
For Dataset-2 among the state-of-the-art methods that followed five-fold cross-validation experimental setup, the best performer is the UA-ConvNet model which is proposed by Gour et al. 33 with an accuracy of 98.90% while the second-best accuracy is provided by the ensemble of InceptionV3 and MobileNetV2 proposed by Ahmad et al. 60 with a value of 98.77% . In the case of precision, recall and F1-score the ensemble of InceptionV3 and MobileNetV2 60 outperforms all other methods in the case of 5-fold cross-validation experimental setup. It also outperforms the proposed method with 5-Fold cross-validation by 0.57% , 0.55% , 0.56% and 0.92% in precision, recall, F1-score and recognition accuracy respectively. www.nature.com/scientificreports/ When it comes to the comparison of the methods using the hold-out data splitting method, the best accuracy, precision, recall and F1-score are achieved by the hybrid architecture using the modified AlexNet model and BiLSTM with the value of 98.70% , 98.77% , 98.76% and 98.76% respectively. However, this method outperforms the proposed method in precision, recall, F1-score and accuracy by 0.93% , 0.91% , 0.91% and 0.09% respectively. Although from the comparison, it can be observed that some methods have outperformed the proposed method in Dataset-2 for both the hold-out test set and five-fold cross-validation experimental setups but it is to be kept in mind that the number of parameters used by the proposed method is far less than that of these state-of-the-art methods. The total number of parameters of the combination of three classifiers is around 1 million which is far less than that of AlexNet, MobileNetV2 and InceptionV3 models as shown in Table 2.

Performance of TOPCONet on COVIDx CXR-3 dataset. To investigate scalability and robustness
of the proposed TOPCONet model, we have evaluated its performance on a recently published large daraset, known as COVIDx CXR-3 Dataset 62 . This dataset was made public by Wang et al. 17 . It consists of 30,000 chest X-ray images from over 16,490 subjects (either COVID-19 positive patients or normal subjects). The images in the dataset are already partitioned into train and test sets by the researchers in the work 17 . In the train set of the dataset, there are 16,400 chest X-rays of COVID-19 positive patients and 13,992 chest X-rays belonging to non COVID-19 subjects whereas, the test set contains 200 chest X-ray images of the COVID-19 positive and negative patients each. We have followed the standard dataset divisions to conduct our experiments. The performance of the TOPCONet along with the base classifiers are provided in Table 9. In order to compare the performances of our model on this dataset, we compare it with some state-of-the-art COVID-19 detection methods like Xception model used by Jain et al. 57 , the CheXNet model by Chowdhury et al. 56 , the ResNet50 model used Senan et al. 35 , the CovidConvLSTM model by Dey et al. 19 and the use of VGG-16 as a transfer learning model that was used by Bashar et al. 34 . Performances of these state-of-the-art models are also recorded in Table 9 along with performances of our final model and base classifiers. From Table 9, it can be concluded that the proposed TOPCONet model gives the highest performance in all metrics like accuracy, precision, recall and F1-score. The second and third highest results in all metrics are given by Classifier 1 and Classifier 2 respectively. However, the Classifier 3 is outperformed in terms of accuracy by other state-of-the-art methods. Apart from the performance, present models acquire less trainable parameters than the past methods used here for comparison.

Discussion
It has already been mentioned that in this work, we propose a new CAD system (i.e., TOPCONet) for COVID-19 and pneumonia detection from chest X-ray images. The TOPCONet model is computationally inexpensive and efficient. The performance of the model is evaluated on two publicly available datasets and the obtained results are described in the previous section. However, in this section, we discuss more findings of the TOPCONet model. To start with we discuss the performance and the number of training parameters of all the basic components of the model. It is to be noted here that TOPCONet comprises of four major components: three base classifiers, www.nature.com/scientificreports/ namely Classifier 1, Classifier 2 and Classifier 3 and TOPSIS aided ensemble process. The performances on Dataset-1 and Dataset-2 using the experimental setup are discussed in the subsection "Dataset description" and the number of trainable/tunable parameters present in these major components are provided in Table 2. From the performances recorded in Tables 2 and 3 we can see that Classifier 3 that accepts 4-channel input images performs the best among the base classifiers for both datasets. From this observation, we can safely comment that combining the edge image, generated by Robert's edge detection technique, with the original image helps us to obtain better performance. From these results, we can also say that our model learns the association of features.
In addition to these, we can also observe that after the ensemble of the outputs from the three base classifiers using the TOPSIS-aided ensemble method, it results in an increase of accuracy by 0.25% for Dataset-1 and in the case of Dataset-2 the increase in accuracy is 0.33% . The results establish that the TOPSIS-aided ensemble method can learn the better association of confidence scores provided by the classifiers. For a deeper understanding, we examine the confusion matrices of the base classifiers and the TOPSIS aided ensemble method for all the datasets. The confusion matrices are shown in Fig. 6. From the figure, we can conlcude the TOPSIS aided ensemble works optimally for all the three datasets. Hence we can also claim that the proposed method works well across the datasets. The proposed ensemble method is able to capture multiple complimentary information provided by the base classifiers for COVID-19 and pneumonia detection.
We can also observe from Tables 2 and 3 that a certain number of parameters are trained for each of the base classifiers. This can be explained because the input dimension of each of the base classifiers differs. Moreover, it must be noted that the ensemble approach consists of several criteria (in this case, three). Corresponding to each of the criteria, a criteria weight is assigned. These weights are manually tuned i.e., these are not trained by any learning methods.
For further analysis of the proposed 3-channel CNN model (i.e., Classifier 2), we use grad-cam analysis, which works by using the gradients generated inside the model to create a heat-map to show the regions where the model focuses to perform the intended classification task. In Fig. 7, the original chest X-ray images along with their grad-cam counterparts are shown. In this figures, we observe that the TOPCONet focuses mainly on the lung area. It is noteworthy that a few activation regions can be seen outside the lung. This is probably because the model learns a few redundant features along with the useful ones. These redundant features are the ones that get insignificant importance from the classifier (FCNNs in this case) during the final classification.
Advantages and limitations of the proposed method. The advantages of the proposed model are listed below: • The proposed CNN model can learn effective features from input images of different types.
• It has fewer tuning parameters compared to state-of-the-art CNN models.
• The TOPSIS aided ensemble method can efficiently aggregate the confidence scores estimated by the base classifiers to provide the final class. • The ensemble method used here has very few tunable parameters (same as the number of base classifiers used in ensemble mechanism) and hence very little additional cost is required.
To provide our observations on the proposed work, we also mention some of its limitations which are as follows: • The proposed CNN model might not work well on rotated and arbitrarily cropped images. It might require different augmented training samples like other CNN models. • The TOPSIS aided ensemble method fails to classify a sample if two of the base classifiers incorrectly classify it with high confidence. • As we have used deep learning models as our base learners, it demands resources such as graphics processing unit (GPU), dedicated random access memory (RAM) etc. for training the same.

Conclusion
Keeping in mind the present scenario of COVID-19 cases, the situation needs to be addressed properly to help humanity to come out of this crisis as soon as possible. In the present work, we propose a CAD system, called TOPCONet, for screening COVID-19 and pneumonia which performs well on publicly available three chest X-ray datasets. In TOPCONet, we first pre-process the images to obtain different forms of the original images, which are further used to train and evaluate the model. The skeletal structure of the base classifiers of the ensemble approach is the same, however, the type of input differs for each of the cases. We combine the decisions of these base classifiers using the proposed TOPSIS aided ensemble method. The proposed model can learn deep features better, and it performs at par with the state-of-the-art methods. It is noteworthy to mention that our customised CNN model uses a very less number of trainable parameters compared to the popularly used CNN models in several deep learning-aided CAD systems. It should be noted that these datasets suffer from the problem of imbalanced classes. Hence further works can be focused on handling this issue efficiently. Since our model gives an end-to-end classification result for COVID-19 and pneumonia diseases using chest X-ray images and is lighter in nature, it may be employed in detecting such diseases to ease the burden on radiologists and doctors. However, the proper advice of medical professionals should be considered along with the outcome from this model. Apart from these, further work may be focused on using domain knowledge as it may increase the model performance. Last but not the least, the present model can be used to extract features first and then a suitable feature selection technique may be employed to select a near-optimal subset of features that may further reduce the storage and time requirements.