A bagging dynamic deep learning network for diagnosing COVID-19

COVID-19 is a serious ongoing worldwide pandemic. Using X-ray chest radiography images for automatically diagnosing COVID-19 is an effective and convenient means of providing diagnostic assistance to clinicians in practice. This paper proposes a bagging dynamic deep learning network (B-DDLN) for diagnosing COVID-19 by intelligently recognizing its symptoms in X-ray chest radiography images. After a series of preprocessing steps for images, we pre-train convolution blocks as a feature extractor. For the extracted features, a bagging dynamic learning network classifier is trained based on neural dynamic learning algorithm and bagging algorithm. B-DDLN connects the feature extractor and bagging classifier in series. Experimental results verify that the proposed B-DDLN achieves 98.8889% testing accuracy, which shows the best diagnosis performance among the existing state-of-the-art methods on the open image set. It also provides evidence for further detection and treatment.


Bagging dynamic deep learning network
In this section, the proposed bagging dynamic deep learning network (B-DDLN) is designed and analyzed detailedly in four stages. First of all, the construction of the B-DDLN is proposed. Secondly, the principle of dynamic learning network are presented for designing a classifier. Thirdly, neural dynamic learning algorithm (NDLA) of a dynamic learning network is derived in detail. Fourthly, based on bootstrap aggregation and the voting decision method among several dynamic learning networks trained by NDLA with various mapping functions, a bagging dynamic learning network classifier is proposed to enhance the generalization performance of a dynamic learning network.

Construction of B-DDLN diagnosis model.
For the application of medical image classification, a convolutional neural network is very effective because of its convolution blocks for extracting differentiated image features 19,20 . However, the generalization performance of fully connected layers in CNNs may not be strong enough to discriminate and classify deep convolutional features 28 . For diagnosing COVID-19, the accuracy and precision cannot be stabilized at a satisfactory level when only a CNN is used. To enhance the generalization Scientific Reports | (2021) 11:16280 | https://doi.org/10.1038/s41598-021-95537-y www.nature.com/scientificreports/ performance of diagnosis model, B-DDLN is proposed in this paper, which consists of two modules: a feature extractor and a bagging classifier. In the feature extractor, the convolution blocks consist of convolutional layers, pooling layers, batch normalization, ReLU layers and shortcut connections. They are designed and pre-trained as a whole feature extractor for X-ray chest radiography images, and bagging dynamic learning network classifier based on NDLA and bagging algorithm is applied to classify these image features and generate diagnostic results. Figure 1 shows how the proposed B-DDLN is constructed. We use images from training set to pre-train five designed convolution blocks as a feature extractor. Suppose a three-channel image tensor g s ∈ R 224×224×3 is input into this feature extractor, its corresponding feature vector h(g s ) ∈ R 1×512 can be obtained. After gathering all the features of the training images from the feature extractor, we use these features to train the bagging dynamic learning network classifier and evaluate the complete proposed B-DDLN. The bagging dynamic learning network classifier consists of N dynamic learning networks, where N denotes the number of dynamic learning networks. These N dynamic learning networks are trained by using N subsets randomly collected from all the extracted training feature samples and NDLA with N different types of mapping functions.
As the proposed B-DDLN diagnosis model combines convolution blocks as a feature extractor and bagging dynamic learning network classifier, the constructed principle is termed B-DDLN constructed algorithm. Algorithm 1 illustrates the steps in detail. x si (1 ≤ s ≤ l, 1 ≤ i ≤ m; s, i ∈ Z) ∈ X represents the value of the ith dimension for the sth sample. V : An m × n matrix storing weights connecting input and hidden layers. v ij (1 ≤ i ≤ m, 1 ≤ j ≤ n; i, j ∈ Z) ∈ V represents the weight component connecting the ith input neuron with the jth hidden neuron. I : An l × n matrix input into the hidden layer. The convolution blocks consist of convolutional layers, pooling layers, batch normalization, ReLU layers, and shortcut connections. In the feature extractor, five convolution blocks are designed and pre-trained for extracting the features of X-ray chest radiography images, and a bagging dynamic learning network classifier that consists of N unit dynamic learning networks is responsible for recognizing these features to generate the final diagnosis results. www.nature.com/scientificreports/ Q : An l × n hidden output matrix correspond to I. W : An n × q matrix storing weights connecting hidden and output layers. w jr (1 ≤ j ≤ n, 1 ≤ r ≤ q; j, r ∈ Z) ∈ W represents the weight component connecting the jth hidden neuron with the rth output neuron. Y : An l × q predicted diagnosis output matrix. f j (·)(1 ≤ j ≤ n; j ∈ Z) : Activation functions in the jth hidden neurons. g(·) : Activation function in the output neurons. Y , L : l × q label matrices, where Ȳ is encoded as −1 -1 format and L is encoded by one-hot vectors. P : An l × q class possibility matrix calculated by softmax formula. ε : Training error of dynamic learning network calculated by cross entropy formula.

Topology and principle of dynamic learning network.
The feed-forward output of the dynamic learning network is formulated as where In Eq. (4), i j ∈ R l×1 denotes the jth column vector of I ∈ R l×n . For activation functions in the hidden and output neurons, two cases are listed as examples, i.e., Case 1: All the activation functions are set as softsign function. The expression is Case 2: Activation functions of the output neurons are still set as softsign function, while those of the hidden neurons are power-softsign function. The expression of power-softsign function is For the sth sample x s = [x s1 , . . . , x sm ] ∈ R 1×m ( 1 ≤ s ≤ l; s ∈ Z ) from X , the output through dynamic learning network y s = y s1 , . . . , y sq ∈ R 1×q is calculated using Eq. (1), and the corresponding class probabilities vector P s is obtained using the softmax formula where P s1 , . . . , P sq ∈ [0, 1], q r=1 P sr = 1 . If P sr = max P s1 , . . . , P sq (1 ≤ r ≤ q; r ∈ Z) , x s is predicted as belonging to the rth class according to Bayesian decision principle based on the minimum classification error probability 44 . Similarly, the class probability matrix P for l input samples is calculated below.
Learning algorithm of dynamic learning network. Neural dynamic learning algorithm (NDLA) has been favored in recent years owing to more rapid convergence and higher precision 40,45 . Its design formula is expressed as (6) f j (z) = g(z) = z 1 + |z| (1 ≤ j ≤ n; j ∈ Z).  (10), t ∈ R denotes continuous time. e(t) ∈ R denotes deviation between prediction output and expectation over t. ė(t) denotes the derivative of the deviation e(t) with respect to t. > 0, ∈ R denotes NDLA parameter. Φ(·) denotes mapping function that is a monotonically increasing and odd function 38,40,43,45 . NDLA is conducted in digital computer. Such serial time expression (i.e., Eq. (10)) is transformed into a discrete ones combined with Euler discrete formula 46 , which is shown below.
where h > 0,h ∈ R denotes the discrete step, and k > 0, k ∈ Z denotes discrete time and the kth training round. Equation (11) is further transformed as where α =h > 0, α ∈ R denotes NDLA design coefficient. Thus, Eq. (12) is a discrete NDLA design formula.
Theorem 1 A deviation based learning rule is formulated as a discrete NDLA design formula, i.e., e(k + 1) − e(k) = −αΦ(e(k)) , where α > 0 is a proper value within a range. If the mapping function Φ(·) in NDLA is a monotonically increasing and odd function, then the deviation between prediction output and expectation e(k) shows absolute convergence to zero along with discrete time k increasing.
For the mapping function, the linear, tanh and sinh types are applied as examples in this paper, i.e., The discrete NDLA design formula in this paper is used to train dynamic learning network, in which the weight matrix W is iterated. As shown in Fig. 2, when the kth training round is finished, the diagnosis deviation matrix E(k) is calculated by (12) e(k + 1) = e(k) − αΦ(e(k)) ∈ R (13) e(k + 1) − e(k) = −αΦ(e(k)).

Figure 2.
Diagram of neural dynamic learning algorithm (NDLA). If training error ε(k) calculated by the class possibility matrix P(k) and the label matrix L satisfies ε(k) ≥ ε ′ , where ε ′ denotes a threshold, the weight matrix connecting between hidden and output layers W(k) will be updated as W(k + 1) at the (k + 1)th training round under the situation of known NDLA design coefficient α , mapping function Φ(·) , hidden output matrix Q , the diagnosis deviation matrix E , predicted diagnosis output matrix Y , and the label matrix Ȳ . where Y (k) denotes output through dynamic learning network and Ȳ denotes labels of samples in −1 -1 format. Before the next round, the training error ε(k) is calculated by cross entropy formula at first, i.e., where L sr ∈ L and P sr ∈ P(k) . The class probability matrix P(k) is calculated according to Eq. (9). Suppose the threshold of training error is ε ′ ∈ R . If ε(k) < ε ′ , stop and quit this training process. Otherwise, continue updating W.
For training details of dynamic learning network in the (k + 1)th training round, the diagnosis deviation matrix E(k + 1) is obtained by the following equation, i.e., where When Φ(·) is a monotonically increasing and odd function, E(k) can rapidly converge to 0 along with k increasing according to Theorem 1, which indicates that lim k→+∞ Y (k) =Ȳ according to Eq. (17). Through softmax formula (i.e., Eq. (9)) and encoder, lim k→+∞ P(k) =P is obtained, where P denotes a constant targeted class probability matrix. Thus, the training error ε(k) satisfies lim k→+∞ ε(k) =ε according to Eq. (18), which indicates that ε(k) theoretically converges to a targeted constant ε along with k increasing.
Substituting Eqs. (17) and (20) (25) shows the iterated relation between W(k + 1) and W(k) . A training round is thus finished. Without loss of generality, the softsign function is applied as an example of g(·) . However, the inverse function of softsign function can not be found. Under this circumstance, an approximate expression ĝ(·) is used instead, i.e., Bagging dynamic learning network classifier. To further enhance the generalization of the dynamic learning network, bootstrap aggregating (bagging) algorithm is utilized to construct a more robust bagging dynamic learning network classifier, as illustrated in Fig. 1. In bagging algorithm, some of the samples are randomly collected from all the training features as a subset, which is used to train a dynamic learning network. This process is repeated for several times, and several trained dynamic learning networks are obtained. In combination strategy, these trained dynamic learning networks start by predicting testing samples. Then with regard to these predicted results, the final belonging class is determined based on the principle of plurality voting, i.e., the majority rule 43,47 . Specifically, we initialize a zero vector a t = a t1 , . . ., a tq ∈ R 1×q for the vote statistic. If the k th dynamic learning network model determines that the testing sample x t = [x t1 , . . ., x tm ] ∈ R 1×m belongs to the rth class ( k ∈ [1, N],r ∈ [1, q];k,r ∈ Z ), a tr will be increased by one. Based on the majority rule, we determine that the index corresponding to the maximum number of votes among all the elements in a t is the final predicted class after vote statistic.
Structure and principle of bagging dynamic learning network classifier are shown in Fig. 3, where dynamic learning networks trained by NDLA using various mapping functions Φ(·) are simultaneously trained by using various subsets drawn randomly from shuffled training features. Suppose there are two kinds of diagnosis results (positive and negative) and N dynamic learning networks. When x t is entered into this proposed bagging dynamic learning network, according to the majority rule, these dynamic learning networks mutually independently judge the kind of diagnosis result to which it belongs, and then determine the final diagnosis result. For example, when the predicted result of the first dynamic learning network is positive (COVID-19) while those of

Results
In this section, the application of the proposed B-DDLN to diagnose the presence of COVID-19 by analyzing X-ray chest radiography images is described. This description consists of six parts: description and preprocessing of image set, feature extraction, analysis and random under-sampling of samples, experimental results of the proposed B-DDLN, and comparisons between B-DDLN and other diagnosis models. Figure 4 shows a flowchart of the entire experiment where the feature extractor and bagging classifier are successively trained using training images and corresponding features, respectively, to construct the B-DDLN diagnosis model. Testing images are used to evaluate this proposed B-DDLN diagnosis model and the diagnosis results are obtained.
Description and preprocessing of the image set. In this study, we combine and modify the four data repositories to create the COVID-19 image set with binary-classification by leveraging the following types of patient and normal cases from each of the data repositories: • COVID-19 patient cases from COVID-19 open image data collection, which was created by assembling medical images from websites and publications 48 . This data collection currently contains hundreds of frontal-  www.nature.com/scientificreports/ view X-ray images, which is a necessary resource to develop and evaluate tools to aid in the diagnosis of COVID-19 49 . An example is shown in Fig. 5A. • COVID-19 patient cases from Fig. 1 COVID-19 Chest X-ray Dataset Initiative, which are built to enhance diagnosis models for COVID-19 disease detection and risk stratification. An example is shown in Fig. 5B. • COVID-19 patient and normal cases from ActualMed COVID-19 Chest X-ray Dataset Initiative. A COVID-19 example is shown in Fig. 5C and a normal example is shown in Fig. 5E. • COVID-19 patient and normal cases from COVID-19 Radiography Database (winner of the COVID-19 data set award by Kaggle community), which is created by a team of researchers from Qatar University, Doha, Qatar and the University of Dhaka, Bangladesh along with their collaborators from Pakistan and Malaysia in collaboration with medical doctors. A COVID-19 example is shown in Fig. 5D and a normal example is shown in Fig. 5F.
After gathering X-ray chest radiography images from these collections, there are 2284 images in the newly formed image set (816 from COVID-19 patient cases and 1468 from normal cases). Preprocessing steps of X-ray chest radiography images are necessary for improving the quality of image data and enhancing image features. First of all, image channel sequence is changed from BGR to RGB. Secondly, the sizes of all the images are uniformly set as 224 × 224  www.nature.com/scientificreports/ blocks and pre-train them. When these convolution blocks finish pre-training as a feature extractor h(·) , a 224 × 224 × 3 preprocessed X-ray chest radiography image tensor g s is input into h(·) , and its corresponding 512-dimension extracted feature vector h(g s ) can be obtained as output. Therefore, the features extracted from training images are utilized to train the bagging dynamic learning network classifier, and the features extracted from testing images are used to evaluate this bagging dynamic learning network classifier.

One-way analysis of variance. To investigate whether different values of an attribute have a significant
effect on the category, one-way analysis of variance (one-way ANOVA) is carried out by using the features extracted from training images in this study. Each training feature sample is a 512-dimension vector. The analytic results of the previous three dimensions by one-way ANOVA are obtained as examples, which are listed in Table 1. Corresponding box-plots are shown in Fig. 6. It can be seen from Table 1 that the p-value is much smaller than 0.05, which suggests that a correlation exists between these three dimensions of features and sample category. The results of F-value and p-value in Table 1 and Fig. 6 (14)), tanh (i.e., Eq. (15)) and sinh (i.e., Eq. (16)) types. Furthermore, the bagging dynamic learning network classifier is constructed by these three dynamic learning networks and bagging algorithm.

Classification performance of B-DDLN.
In this study, two cases of activation functions are designed in the dynamic learning network. In case 1, all the activation functions in the hidden and output neurons are set as softsign function (i.e., Eq. (6)). In case 2, activation functions in the output neurons are also designed as softsign function while those in the hidden neurons are set as power-softsign function (i.e., Eq. (7)). For classifying the extracted features from convolution blocks of B-DDLN, trends of classification errors calculated according to      Table 2, most testing samples distribute in the TP and TN areas, which illustrates that the proposed B-DDLNs in two cases achieve high diagnosis accuracy. The accuracies of dynamic learning networks with corresponding hyper-parameters in two cases are listed in Table 3. In the process of training the proposed dynamic learning network, the proper sets of hyper-parameters are adjusted and selected by 10-fold cross validation, which makes the testing performance including accuracy reach the best level. Early stopping is also adopted, which can avoid over-fitting. From Table 3, we can see that the highest testing accuracy of a single dynamic learning network can reach 98.3333% in a short time. After plurality voting, the bagging dynamic learning networks in two cases achieve 98.8889% accuracy, which shows good precision for diagnosing COVID-19. The number of hidden neurons is no more than 60, which suggests that the dynamic learning network is a lightweight network.   For comparisons on the same application domain, we use the same images and 10-fold cross validation to train some state-of-the-art methods under the same operating environment, and we apply the same testing images to evaluate them. Testing accuracy is used as the main indicator to measure the classification performance of deep learning models, whose results are listed in Table 4. In Table 4, testing accuracy of the proposed B-DDLN is an average value, while those of others show the highest values. From Table 4, we can see that the testing accuracy of the proposed B-DDLN reaches the top level when compared with other deep learning models, which manifests the superior precision of the proposed B-DDLN in diagnosing COVID-19. In addition, the structure of proposed B-DDLN is simpler and more lightweight than most of the existing state-of-the-art models, which suggests that B-DDLN is more easily deployed in a medical software system with a higher speed of diagnostic prediction.
For the same extracted image features, comparison results in classification performance between the proposed B-DDLN, especially its bagging dynamic learning network, and other classifiers are listed in Table 5, where the corresponding experiments are conducted under the same conditions. From Table 5, we can see that the specificity and sensitivity of the proposed B-DDLN are 0.9875 and 0.9900, respectively, which indicates that misdiagnosis and missed diagnosis of COVID-19 are very rare. Most of the evaluation indicators simultaneously remain the highest among all the classifiers. These findings substantiate the excellent advantages and superiorities of the proposed B-DDLN in diagnosing the presence of COVID-19.

Discussion
From the experimental and compared results, we can see that using the proposed B-DDLN, which consists of a feature extractor and a bagging dynamic learning network classifier, achieves the best diagnostic performance compared with other state-of-the-art models on the same application domain. The proposed B-DDLN is trained and constructed in a short time, and shows superior classification performance without over-fitting. The diagnosis accuracy for distinguishing COVID-19 or normal stays at the top level when using the proposed B-DDLN owing to the existence of NDLA and bagging algorithm. The proposed B-DDLN diagnosis model is so lightweight that it can be deployed as a medical software system under the restrictions of high real-time requirement and limited hardware resources. However, the proposed B-DDLN has certain limitations, and some improvements can be made. First of all, the extracted features of X-ray chest radiography images are not detailed and separable enough. The feature extractor needs to be improved by remolding and optimizing the structure and pre-training method. Secondly, the forms of labels for input images, i.e., Ȳ and L , are not unified, which impedes training dynamic learning networks. Thirdly, the calculation method of class possibility needs to be improved to enhance confidence. Furthermore, the proposed B-DDLN should be used for diagnosing different types of diseases to verify its generalization performance.

Conclusion
In this paper, B-DDLN has been proposed for diagnosing presence of COVID-19 using X-ray chest radiography images. After some preprocessing steps, training images have been used to pre-train convolution blocks as a feature extractor. For the training features extracted from this extractor, a bagging dynamic learning network classifier based on neural dynamic learning algorithm (NDLA) and bagging algorithm has been trained. Experimental results have verified that the proposed B-DDLN diagnosis model possesses high training efficiency, and has considerably increased diagnosis accuracy under the restrictions of high real-time requirement and limited hardware resources. The diagnosis accuracy of the proposed B-DDLN has reached 98.8889% , which is the highest among all the existing state-of-the-art methods. In addition, corresponding performance indicators of bagging dynamic learning network classifier have reached the top level among all, further substantiating that the proposed B-DDLN possesses excellent diagnostic ability. Such accurate diagnosis outcomes provide assertive evidences in early detection of COVID-19, which helps to provide suitable treatments for patients and maintain human health.
In the future, the proposed B-DDLN diagnosis model can be trained for diagnosing not only the presence of COVID-19, but also other types of pneumonia such as SARS and community-acquired pneumonia, which cannot be detected by nucleic acid amplification testing alone. For excellent diagnostic performance, a medical software system based on the proposed B-DDLN diagnosis model can be developed for diagnosing multi-classification of pneumonia by automatically recognizing X-ray chest radiography images, generating diagnostic reports, and providing corresponding therapeutic schemes for diagnostic results. Such a fully functional system would be extremely useful for clinicians.