EpistoNet: an ensemble of Epistocracy-optimized mixture of experts for detecting COVID-19 on chest X-ray images

The Coronavirus has spread across the world and infected millions of people, causing devastating damage to the public health and global economies. To mitigate the impact of the coronavirus a reliable, fast, and accurate diagnostic system should be promptly implemented. In this study, we propose EpistoNet, a decision tree-based ensemble model using two mixtures of discriminative experts to classify COVID-19 lung infection from chest X-ray images. To optimize the architecture and hyper-parameters of the designed neural networks, we employed Epistocracy algorithm, a recently proposed hyper-heuristic evolutionary method. Using 2500 chest X-ray images consisting of 1250 COVID-19 and 1250 non-COVID-19 cases, we left out 500 images for testing and partitioned the remaining 2000 images into 5 different clusters using K-means clustering algorithm. We trained multiple deep convolutional neural networks on each cluster to help build a mixture of strong discriminative experts from the top-performing models supervised by a gating network. The final ensemble model obtained 95% accuracy on COVID-19 images and 93% accuracy on non-COVID-19. The experimental results show that EpistoNet can accurately, and reliably be used to detect COVID-19 infection in the chest X-ray images, and Epistocracy algorithm can be effectively used to optimize the hyper-parameters of the proposed models.


Scientific Reports
| (2021) 11:21564 | https://doi.org/10.1038/s41598-021-00524-y www.nature.com/scientificreports/ and effective way for making clinical decisions 7 . Diagnosis of breast cancer 8 , epilepsy 9 , cardiovascular disease 10 , lung cancer 11 , and pneumonia 12 via deep learning models has become a popular technique in the medical field. In this paper, we propose a new approach for detecting COVID-19 infection on chest X-ray images using a decision tree-based ensemble model consisting of two mixtures of discriminative experts (MoE) called Epis-toNet. The Epistocracy algorithm, a recently proposed hyper-heuristic evolutionary method, has been recruited to build and optimize the neural networks used in this work. The main motivation of developing EpistoNet is to employ it as a diagnostic tool that can help healthcare providers to detect COVID-19 faster, cheaper, and more accurately and accelerate the treatment of those who need it the most. Due to several key differences in other proposed approaches such as the size of the dataset used, the pre-processing steps of the data, statistical noise, hyper-parameter tuning, etc., the highest accuracy we achieved on our testing dataset using other approaches was less than 70%. We decided to develop our own model/algorithms to improve this accuracy. To the best of the authors' knowledge, there is no similar study that proposed such a model for detecting COVID-19 in chest X-ray images.
The main contributions of this study can be summarized as follows: 1. A new ensemble model called EpistoNet is proposed. EpistoNet is a decision tree-based ensemble model using two mixtures of discriminative experts to classify COVID-19 lung infection from chest X-ray images. 2. A new dataset of 2500 X-ray images is created. All collected images belong to the Henry Ford Health System in Michigan where this research was conducted. These images have been individually reviewed, interpreted, and labeled by experienced radiologists. 3. In order to accurately classify COVID-19 and non-COVID-19 X-ray images, we created a mixture of experts trained on k clusters of visually similar images. 4. We also recruited the Epistocracy algorithm, a recently developed, multi-population, and self-adaptive optimization method to optimize the architecture and hyper-parameters of the designed neural networks.

Related work
Many researches have been recently proposed methods to detect COVID-19 positive cases from CXR and CT imaging using artificial intelligence (AI) and machine learning (ML) techniques. X-ray images are widely used in the diagnosis and evaluation of various diseases including COVID-19 infections by clinical experts. X-ray radiography is typically less expensive and exposes the patients to much less radiation compared to CT scans 13 . However, clinical diagnosis from X-rays compared to other imaging modalities is much more difficult 14 and requires significant training and expertise. El Asnaoui et al. 15 conducted a comparative study using various deep learning models (VGG16, VGG19, DenseNet201, InceptionResNetV2, InceptionV3, Resnet50, and MobileNetV2) to detect and classify COVID-19.The experiments were performed using 6087 chest X-ray & CT images cases of COVID-19. The dataset was randomly split with 80% of the images for training and 20% for validation. The highest accuracy was achieved by InceptionResNetV2 with 92.18% of overall accuracy and 82.80% accuracy for detecting patients with Coronavirus.
A deep learning-based method called COVID19XrayNet was proposed by Zhang et al. 16 to predict COVID-19 from X-ray images. COVID19XrayNet comprises of two-step transfer learning pipeline based on ResNet32 with two newly integrated layers: smoothing layer (FSL) and feature extraction layer. COVID19XrayNet achieved 91.92% overall accuracy outperforming the original version of ResNet32.
Hemdan et al. 17 suggested COVIDX-Net, a deep learning framework based on seven convolutional neural network models namely MobileNetV2, VGG19, InceptionV3, DenseNet201, InceptionResNetV2, ResNetV2 and Xception to detect COVID-19 from chest X-ray images. COVIDX-Net was validated on 50 images comprised of 25 COVID-19 positive cases and 25 normal cases. In their review, VGG19 and DenseNet showed the best results of classification with f1-scores of 91% and 89% for COVID-19 and normal, respectively.
In order to identify COVID-19 from normal or other pneumonia cases, Horry et al. 18 proposed a multimodal classification network based on optimized VGG19 architecture. Before training their model, they applied histogram equalization to images followed by enhancement to textures and contrasts using OpenCV library. Their proposed network achieved 86% accuracy on X-ray images, 84% for CT scans, and 100% for Ultrasound.
In Wang et al. 3 , the authors presented COVID-Net, a deep convolutional neural network consisting of a heterogeneous mix of convolution layers with variation of kernel sizes for the detection of COVID-19 cases from chest X-rays. COVID-Net was trained and tested on COVIDx dataset comprised of 13,975 chest X-ray images. The proposed model was able to achieve an overall test accuracy of 93.3% and 91% accuracy specifically for COVID-19 cases.
Rahimzadeh et al. 19 proposed deep convolution network based on the concatenation of Xception and ReNet50V2. They evaluated their model on 11,302 chest X-ray images, consisting of only 31 cases of COVID-19 and 11,271 cases from the other two classes. Their proposed model achieved an average accuracy of 99.50%, and 80.53% sensitivity for the COVID-19, and an overall accuracy of 91.4%.
Kaur et al. 20 proposed a metaheuristic-based deep COVID-19 screening model using modified AlexNet architecture for feature extraction and classification of the input images. Strength Pareto evolutionary algorithm-II (SPEA-II) was used to tune the hyper-parameters of modified AlexNet. The proposed model achieved a validation accuracy of 99.26%.
COVID-CheXNet, is another hybrid deep learning framework developed by Al-Waisy et al. 21 to diagnose COVID-19 infection from the X-ray images. The COVID-CheXNet system combines the results obtained from two different pre-trained deep learning models based on ResNet34 and HRNet (high-resolution network model) trained using a large-scale dataset. By enhancing the contrast of the X-ray images and reducing the noise Finally, Ismael et al. 23 reported another deep learning approach that allows detection of COVID-19 patients. Authors used pretrained deep CNN models (ResNet18, ResNet50, ResNet101, VGG16, and VGG19) for feature extraction, and the Support Vector Machines (SVM) for classification. Their dataset contained 180 COVID-19 and 200 normal chest X-ray images. The deep features extracted from the ResNet50 model and SVM classifier achieved an accuracy of 94.7%.
These approaches lack the generalizability for unseen data due to various pre-processing steps performed and assumptions involved in the model development and hyper-parameter fine tuning conducted specific to their own dataset. In this paper we describe the development and evaluation of a new approach for detection of COVID-19 from chest X-ray images using a minimal pre-processing pipeline and automatic optimization of the hyper-parameters of various models using a recently proposed algorithm.

Methods
In this section, we will discuss the architecture design methodology and the key components of EpistoNet, motivated by the need of developing a feasible solution to help combat COVID-19.
As depicted in Fig. 1, first, the procedure to create the training dataset was briefly described. Then, the preprocessing steps, the EpistoNet architecture design, and the optimization of expert networks of the proposed approach were explained. Dataset description. The dataset utilized in this research is comprised of 2500 X-ray images consisting of 1250 COVID-19 and 1250 non-COVID-19 images provided by Henry Ford Health System (HFHS) of Michigan in Detroit (see Table 1). All images stored in JPEG format containing 3 channels of 8-bit data. The non-COVID-19 images include normal, as well as non-COVID-19 viral and bacterial pneumonia infections (see Fig. 2). These X-rays depict the front view of a patient's upper torso, with a clear view of the lungs. All images have been cropped to frame the entire rib cage with reasonable padding space and down sampled to 224 by 224 pixels when compiled into a singular dataset. No other modification or image enhancement was done to the original images to further minimize the time between taking the X-ray and detection of COVID-19. Out of 2500 images, we left out 500 images containing 250 COVID-19 and 250 non-COVID-19 for testing. The remaining 2000 images were used 80% for training and 20% for validation.  Table 1. Distribution of X-ray images in training, validation, and testing datasets. The total number of X-ray images in each category and across the dataset is displayed in bold.

COVID-19 Non-COVID-19 Total
Training 800 800 1600 www.nature.com/scientificreports/ Figure 2 shows examples of the X-ray images that we received, with the first set of images representing COVID-19, and the second set representing non-COVID-19 images: In this study, we did not use online datasets mainly due to the limited number of COVID-19 positive cases available in the public datasets and lack of verification mechanisms that allows us to verify the validity and reliability of these datasets.
Ethical issues. First, our study used anonymized X-ray images collected at the Department of Radiology, Henry Ford Health System (HFHS), Detroit, MI. There were no potentially identifying marks/features and no patient identifiers in the images. This study was approved by the IRB committee of HFHS (No. 14030).
Secondly, the IRB committee of HFHS waived the need for obtaining the informed consent for this study. Thirdly, all methods were performed in accordance with the relevant guidelines and regulations, including those of the Declaration of Helsinki.

EpistoNet architecture design.
To build an efficient classification model, we propose a method using mixture of deep CNN experts to detect COVID-19 from chest X-ray images. The identification and extraction of relevant features from X-ray images is a challenging task that requires multiple neural network architectures to directly operate on the given data and find patterns that help in detection and classification of the COVID-19 infection. For this purpose, we have designed an ensemble model which is able to exploit discriminative features and obtain higher accuracy than individual CNN models on the HFHS dataset.
Mixture of experts model. Mixture of experts is a type of ensemble based on the divide-and-conquer principle where each individual model is specialized in a given part of the input space, learning different aspect of the problem.
As shown in Fig. 3, MoE architecture is composed of k expert models which are supervised by a gating network. The gating network is a discriminator network trained together with the experts on the same input and decides which expert(s) to use for the final classification task.  www.nature.com/scientificreports/ The output of the gating network can be interpreted as the probability that input x is assigned to expert i (see Eq. (1)). The gating network employs softmax function for activation: in Eq. (2), z j is the output of the gating network. The softmax function makes the outputs of the gating network sum to one. This network of experts can potentially improve the accuracy and the reliability of the overall classification system 24 .
Data partitioning using K-means clustering method. In order to effectively discriminate COVID-19 from non-COVID-19 X-ray images, we decided to split our main dataset into k clusters and train explicitly localized expert networks on each cluster capable of differentiating between visually similar images. To this point, first we applied a cluster-based pre-processing step to our dataset of 2000 images and partitioned them into 5 clusters of variable size using K-means clustering method. K-means is a type of unsupervised machine learning technique commonly used for clustering unlabeled data into k clusters.
Optimization of expert networks using Epistocracy algorithm. To further improve the accuracy of each convolutional neural network, we have used Epistocracy algorithm 25 . Epistocracy algorithm is a multi-population selfadaptive optimization method that uses different explorative and exploitative techniques to search the problem space and find the optimal solution. To avoid stagnation and to prevent a premature convergence, the algorithm employs multiple mechanisms such as dynamic population allocation and regression-based leadership adjustment. The algorithm uses a stratified sampling method called Latin Hypercube Sampling (LHS) 26 to evenly distribute the initial population for an efficient exploration of the search space. Figure 4 shows the flow diagram of the proposed algorithm.
As illustrated in Fig. 4, the Epistocracy algorithm is made of two key components: Governors and Citizens. Citizens are individual solutions that are randomly, and uniformly generated. In each iteration, all individuals are evaluated with a pre-defined fitness function. Governors are the top-performing individuals who are selected through the Select() function to lead the population and influence and evolve the generation of the new population via Lead() function. In Epistocracy algorithm, citizens can directly vote for governors and affect their position in the government.
The architecture of each expert model is made of a base and a head. The base model is a popular CNN model pre-trained on ImageNet for transfer learning. The head model which consists of fully connected layers, is automatically constructed using Epistocracy algorithm. By repeatedly evolving each architecture and optimizing their corresponding hyper-parameters, Epistocracy algorithm can effectively produce the optimal model fine-tuned for classification.
Neural network architecture design. Using Epistocracy algorithm, the architecture of each neural network in the initial population is generated on a modular basis, in which each module consists of 1 dense layer and 1 dropout layer (see Fig. 5). The MAX_DENSE_LAYERS is used as a variable to define the maximum number of dense layers allowed in the head model. The last module of the architecture only contains 1 dense layer with two neurons performing the binary classification task. Each layer in the fully connected layers are randomly switched on and off with a given probability to randomly create variable length architectures. Figure 5 illustrates a modular example for the fully connected layers.
(1) www.nature.com/scientificreports/ In addition to generating the architecture of neural networks, the hyperparameters of the fully connected layers are also individually randomized. These hyperparameters are randomly selected to increase the diversity of the population and the possibility of finding an optimal one. The number of neurons, the activation method, and the dropout rate are randomly selected from Table 2: The Epistocracy algorithm strives to find the optimal CNN architecture in an efficient amount of time.
CNN fitness score function. A model is first created by calling a unique function specific to the desired CNN architecture such as VGG16. This function takes the mixed list of hyper-parameter values and returns a model built with those values. Included within this function is functionality to map each hyper-parameter value to its respective place in the layer construction of the model. The new model is then trained on the input dataset using global variables for training parameters (such as number of epochs, etc.). Once training is complete, the validation accuracy score for the model is retrieved from the training history and returned by the function.
Epistocracy parameters. When running Epistocracy, the population size (number of individuals) was set to 100. Mutation rate was 20%, and crossover rate (number of individuals recombined genetically each generation) was 80%. 20 full generations of Epistocracy were run in full without any early stoppage. After Epistocracy fully runs and various architectures are generated the top performing one is finally returned.
The proposed architecture of EpistoNet. The proposed EpistoNet decision tree is then designed using MoE I and MoE II as shown in Fig. 6:

Experimental results and discussion
Extensive experiments were performed to evaluate the performance of the proposed model to classify COVID-19 from chest X-ray images. In our experiment, we set the training, and validation ratios to 80%, and 20% respectively. We hold out 25% of the entire dataset, namely 500 images out of 2500 for testing.

Evaluation of deep convolutional neural networks for detection of COVID-19.
To identify the best classification model, we employed deep convolutional neural networks of different depth and complexity and evaluated their performance using HFHS testing dataset. We applied transfer learning to initialize the data training and to facilitate the feature extraction from the input data using ImageNet weights. As it is shown in Table 3, VGG16 achieved the highest accuracy (0.86%) among eight different CNN models. This is treated as ground truth for comparing the accuracy of the proposed model.

Evaluation of cluster-based CNN models.
To determine the optimal number of clusters we employed the elbow method. In fact, we tried different numbers of clusters k, and plotted the number of clusters k versus the inertia which is the average of the squared distances from the cluster centers of the respective clusters. As it is shown in Fig. 7, for the given data, the optimal number of clusters is 5, where inertia starts decreasing in a linear fashion. To design our classifier, we employed eight state-of-the-art CNN models on each cluster (see Table 4). To train each CNN model, we used 50 epochs and a batch size of 32. The input data was split into training and Figure 5. This specific chromosome would result in the following sequence of layers: dense, dropout, dense, dense, dropout, dense (classification layer). The last layer is always on since this classification layer is a required component of the model. Number 1 indicates "on" and 0 indicates "off ".  Table 4. In Table 4, M1-M8 models are VGG16, VGG19, Xception, InceptionV3, InceptionResNetV2, ResNet50V2, EfficientNetB7, and MobileNetV2 respectively.

Optimization and fine-tuning of CNN parameters.
To further optimize the performance of CNN models, we applied the Epistocracy algorithm to generate the optimal architecture of the neural network classifier. As shown in Table 5, the CNN models were noticeably improved, confirming the capability of Epistocracy algorithm in the optimization of complex and non-linear problems.
Next, we designed a mixture of experts consisting of 5 optimized CNN models. To choose a gating network for the mixture of experts, we trained and tested different CNN models. From the experimental results obtained (see Table 6), InceptionV3 achieved the highest accuracy for detection of COVID-19 cases among all CNN models, whereas InceptionResNetV2 presented the highest accuracy for non-COVID-19 cases. To improve the performance of the classification model, therefore, we built two mixtures of experts with different gating networks. In the first mixture of experts (MoE I), we used InceptionV3, and in the second mixture of experts (MoE II) we used InceptionResNetV2.     www.nature.com/scientificreports/ As shown in Fig. 8, we employed 5 expert networks and a gating network to compute the weights for each expert and dynamically combine the inputs. The weights of the gating network are adjusted during the general training of the model on HFHS training dataset.
To optimize the overall performance of each mixture of experts, once again, we employed the Epistocracy algorithm to design the architecture and to optimize the hyper-parameters of the classification layers. Figure 9 displays the confusion matrix corresponding to MoE I and MoE II: As summarized in Table 7, MoE I, and MoE II were able to achieve 93% classification accuracy on non-COVID-19, and 95% accuracy on COVID-19 chest X-rays respectively using the HFHS testing dataset.
To classify COVID-19 from CXR images, we developed EpistoNet, a decision tree-based ensemble from MoE I and MoE II. Given a new X-ray image, first MoE II will classify the image. Any image classified by MoE Table 5. Improving the accuracy of expert models using Epistocracy algorithm. The numbers in bold represent the improved accuracy after using Epistocracy algorithm.     Fig. 10, the Grad-CAM visualization heatmap of some testing images are shown. From Fig. 10a, it is obvious that our model is focusing on lung opacities, which are the main indicators of COVID-19 infection. In Fig. 10b, the misclassification is due to the indistinguishability of the texture or presence of medical devices and wires in the images which result in very similar probability values for each class.
EpistoNet performance analysis. Compared to the performance of the individual models using the HFHS testing dataset, EpistoNet exhibits an excellent classification performance. During the initial testing on the Henry Ford dataset before any optimization, the highest accuracy achieved was 86% by VGG16. After partitioning the training-validation dataset into 5 clusters using K-means clustering algorithm and applying the Epistocracy optimization technique and building two mixtures of discriminative experts out of 5 best individual Convolutional Neural Network models the classification accuracy was significantly increased.
Based on experimental results, it is demonstrated that EpistoNet can accurately, and reliably detect COVID-19 infection from X-ray images. The accuracy of the proposed model compared to the related work is quite encouraging, given the limited amount of labeled data and differences in the quality and quantity of samples used for training and testing. Using EpistoNet, the diagnosis of the Coronavirus disease can be done automatically at a low cost, rapidly, and with high accuracy. With isolation of suspicious cases and treatment of infected patients, the spread of the disease can be significantly reduced.

Conclusion
In this study we proposed EpistoNet, an ensemble of Epistocracy-optimized mixture of discriminative experts for automatic detection of COVID-19 infection from chest X-rays. Each mixture of expert consists of 5 deep convolutional neural networks and a gating network. We evaluated the performance of various state-of-the-art convolutional neural networks using HFHS dataset. Transfer learning was utilized to get a better initialization state for classification of COVID-19 disease. Epistocracy algorithm was also employed to build and optimize the head models composed of neural networks of variable length. The experimental results show that EpistoNet can effectively classify COVID-19 vs. non-COVID-19 infections, even with a limited data set. The accuracy rates achieved by EpistoNet for the classification of COVID-19 were found to be higher than that of stand-alone  www.nature.com/scientificreports/ VGG16 or similar models trained on HFHS dataset. Other approaches lack the generalizability of our method for unseen data due to various pre-processing steps performed and hyper-parameter fine tuning conducted specific to their own dataset. In EpistoNet pipeline, a minimal pre-processing step is required, and Epistocracy algorithm is recruited to systematically optimize the models' hyper-parameters without any human intervention. EpistoNet can be effectively leveraged as a fast, cheap and portable tool to provide excellent diagnostic aid to healthcare professionals such as physicians and radiologists for the early detection and urgent treatment of patients with COVID-19, mitigating the devastating impact of COVID-19 on lives and livelihoods.