A comprehensive study on classification of COVID-19 on computed tomography with pretrained convolutional neural networks

The use of imaging data has been reported to be useful for rapid diagnosis of COVID-19. Although computed tomography (CT) scans show a variety of signs caused by the viral infection, given a large amount of images, these visual features are difficult and can take a long time to be recognized by radiologists. Artificial intelligence methods for automated classification of COVID-19 on CT scans have been found to be very promising. However, current investigation of pretrained convolutional neural networks (CNNs) for COVID-19 diagnosis using CT data is limited. This study presents an investigation on 16 pretrained CNNs for classification of COVID-19 using a large public database of CT scans collected from COVID-19 patients and non-COVID-19 subjects. The results show that, using only 6 epochs for training, the CNNs achieved very high performance on the classification task. Among the 16 CNNs, DenseNet-201, which is the deepest net, is the best in terms of accuracy, balance between sensitivity and specificity, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document}F1 score, and area under curve. Furthermore, the implementation of transfer learning with the direct input of whole image slices and without the use of data augmentation provided better classification rates than the use of data augmentation. Such a finding alleviates the task of data augmentation and manual extraction of regions of interest on CT images, which are adopted by current implementation of deep-learning models for COVID-19 classification.


Scientific RepoRtS
| (2020) 10:16942 | https://doi.org/10.1038/s41598-020-74164-z www.nature.com/scientificreports/ augmentation for training the pretrained deep-learning model with new image data. The rationale is that transfer learning can relieve the need for acquiring a large amount of training data by reusing a developed model as the starting point for training a new model with a different task. The data augmentation was performed by using a large dataset of chest X-ray images. The rationale for image data augmentation is to increase the size of the training dataset with plausible examples in order to improve the performance and ability of the deep-learning model to generalize the power of classification by getting familiar with samples of high variance. Another recent work reported on the use of ten pretrained CNNs for classifying CT scans of COVID-19 and non-COVID-19 subjects 13 . These authors reported that ResNet-101 and Xception provided the best classification results on training and testing a CT dataset consisting 106 COVID-19 patients and 86 non-COVID-19 subjects.
The CNNs were trained and tested with regions of interest extracted from the CT scans that were defined by a radiologist.
Other previous works on the classification of COVID-19 on CT scans were reported in [14][15][16] . A 3D deeplearning network was developed for the detection of COVID-19 from 4356 3D chest CT scans obtained from 3322 patients 14 . The network extracted both 2D local and 3D global features from the CT scans. This network, called COVNet, was built on the pretrained RestNet50. In 15 , the pretrained Inception was modified to detect COVID-19 using extracted regions of interest on CT scans obtained from 180 cases of COVID-19 and 79 cases of SARs-COV-2. In 16 , a total of 618 CT scans were used, consisting of 219 CT scans from 110 COVID-19 patients, 224 CT scans from 224 patients with Influenza-A viral pneumonia, and 175 CT scans from healthy people. Pulmonary regions of interest were extracted from the CT scans, and pretrained ResNet-18 was used for image feature extraction. Finally, the Noisy-or Bayesian function was used to classify the image regions into three types: COVID-19, Influenza-A-viral-pneumonia, and irrelevant-to-infection.
However, it should be noted that the CT datasets used in the studies reported in [13][14][15][16] are not publicly available. In this study, a comprehensive investigation on 16 pretrained CNNs for classification of COVID-19 using a publicly available CT database is presented. These pretrained CNNs reflect a variety of computational complexity and accuracy based on the training and testing of the ImageNet database 17 . Findings of this investigation would facilitate the timely deployment of AI-assisted tools to hospitals and clinics in terms of ease of both data preparation and software implementation for fighting against the pandemic.

COVID-19 CT database.
The COVID-19 CT database used in this study is publicly available 18 , and its details are described in 12 . The database consists of 349 CT images containing clinical findings of COVID-19 from 216 patients, and 397 CT images obtained from non-COVID-19 subjects. These CT images were collected from COVID19-related papers published in medRxiv, bioRxiv, NEJM, JAMA, Lancet, and others. Figure 1 shows These pretrained networks were trained on more than a million images from the ImageNet database 17 . The pretrained networks can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, these networks have learned rich features representing a wide range of images. The properties of these networks are described in Table 1.
To enable the reproduction of the results reported in this study, configurations for the transfer learning are described as follows. First, the layer graph from the pretrained network was extracted. If the network was a SeriesNetwork object, such as AlexNet, VGG-16, or VGG-19, then the list of layers was converted to a layer graph. In most pretrained networks, the last layer with learnable weights is a fully connected layer. This fully connected layer was replaced with a new fully connected layer with the number of outputs equal to the number of classes in the new data set, which is 2, in this study. In some pretrained networks, such as SqueezeNet, the last learnable layer is a 1-by-1 convolutional layer instead. In this case, the convolutional layer was replaced with a new convolutional layer with the number of filters equal to the number of classes.
For the option of data augmentation in this study, random reflection, translation, and scaling were carried out. Random reflection was done in the top-bottom direction, where each image was reflected vertically with probability = 0.5. The range of horizontal translation applied to the input image = [− 30, 30], where the translation distance was measured in pixels. The horizontal translation distance was selected randomly from a continuous uniform distribution within the specified range. Similarly, the interval of vertical translation applied to the input image in pixels = [− 30, 30]. The vertical translation distance was selected randomly from a continuous uniform distribution within the specified interval. The range of horizontal scaling was applied to the input image, where the horizontal scale factor was selected randomly from a continuous uniform distribution within the specified interval = [0.9, 1.1]. Similarly, the range of vertical scaling was applied to the input image, where the vertical scale factor was selected randomly from a continuous uniform distribution within the specified interval = [0.9, 1.1].
The original whole CT images were converted into RGB images and resized to fit into the input image size of each pretrained CNN. For the training options, the stochastic gradient descent with momentum optimizer was used, where the momentum value = 0.9000; gradient threshold method = L 2 norm; minimum batch size = 10; maximum number of epochs = 6; initial learning rate = 0.0003; the learning rate remained constant throughout training; the training data were shuffled before each training epoch, and the validation data were shuffled before each network validation; and factor for L 2 regularization (weight decay) = 0.0001.

Statistical measures of classification performance.
Five statistical measures used for evaluating the two-class classification performance of the pretrained CNNs are accuracy, sensitivity, specificity, F 1 score, and the area under the receiver operating characteristic (ROC) curve (AUC). where TN is called true negative and denotes the number of non-COVID-19 subjects who are correctly identified as having no infection of COVID-19, FP false positive, denoting the number of non-COVID-19 subjects who are misclassified as having the infection, and N the total number of non-COVID-19 subjects.
The percent accuracy (ACC ) of the classification is defined as The F 1 score is defined as the balance between precision (TP divided by TP and FP) and sensitivity: The ROC is a probability curve created by plotting the TP rate against the FP rate at various threshold settings, and the AUC represents the measure of performance of a classifier. The AUC value is within the range between 0.5 and 1, where the value = 0.5 represents the performance of a random classifier and the value = 1 indicates a perfect one. Thus, the higher the AUC is, the better the classifier performs. The AUC was calculated using the trapezoidal integration to estimate the area under the ROC curve.

Results
To compare the results with those obtained from previous reports, the dataset was randomly split into 80% for training and 20% for testing. The data splitting was repeated 5 times to obtain the average and standard deviation for each CNN. The whole CT images were used as the data input, which were resized to fit the input image size of each pretrained CNN, in both training and testing phases. The network training was performed for with and without data augmentation. Tables 2 and 3  (4)  Tables 2 and 3 Two CNNs using data augmentation that have accuracy > 90%, sensitivity > 90%, specificity > 90%, and F 1 score > 0.9 are ResNet-50 and Inception-v3. The networks without data augmenttaion have higher or equal values for the average AUC than or to those with data augmentation.
In summary, without data augmentation, the best classifier is DenseNet-201, which has the best accuracy, best balance between sensitivity and specificity, top F 1 score, and top AUC. Figure 2 shows the plot of accuracy versus relative training time obtained from the 16 pretrained CNNs without data augmentation.

Discussion
The benchmark results using the same database reported in 12 , with a fixed split data of about 80% for training and 20% for testing, have accuracy = 84.7%, sensitivity = 76.2%, and F 1 score = 0.85, using a fine-tuned pretrained DenseNet with data augmentation. The results obtained from the 16 CNNs without data augmentation are better than these benchmark results.
The study published in 13 applied 10 pretrained CNN using a different COVID-19 database with the same ratio of training and testing data, which is not publicly available, reported among all the 10 networks, ResNet-101 was the best classification model. ResNet-101 achieved accuracy = 99.51%, sensitivity = 100%, and specificity = 99.02%. Although using a different database, the results obtained in this study are comparable. However, the input data processing and training reported in 13 requires much effort by requiring the extraction of regions of interest by a radiologist, which is subjective, time-consuming, and likely hinders the real-time application of the pretrained networks.
The work reported in 14 requires the pre-processing of 3D CT scans by extracting the regions of interest using a U-Net for image segmentation. The pre-processed images were then passed to the COVNet for the prediction. The sensitivity and specificity obtained from COVNet were 87% and 92%, respectively, using a dataset that is not publicly available. Another work on the classification of COVID-19 CT images collected from 259 patients reported in 15 modified the pretrained Inception that achieved accuracy = 79.3%, sensitivity = 67%, and specificity = 83%, and another test achieving accuracy = 85.2%. Similarly, the input images are extracted regions of interest such as small patchy shadows and interstitial changes, multiple ground glass and infiltrates in both lungs. The study reported in 16 used the concatenation of two pretrained ResNet-based networks and the Bayesian function for screening COVID-19 patients using CT imaging. The data pre-processing of classification procedure requires 3D segmentation, extraction of regions of interest (such as ground-glass appearance, striking peripheral distribution along with the pleura, and independent focus of infections), and data augmentation. The overall  [14][15][16] in terms of accuracy and implementation of input data. Although the use of regions of interest or cropped images is widely adopted for deep learning, including other classification problems [19][20][21][22] , this study finds that the direct input of CT images, which are then resized to fit the input size of the pretrained CNN, and transfer learning without data augmentation can achieve very high and better classification performance than those using data augmentation. Such findings are useful for the rapid deployment of AI tools to meet the urgent demand for curbing the pandemic, because it can relieve the task of manual detection of regions of interest carried out by experienced radiologists, employment of image segmentation methods, and more data collection.
Using the described network-training configuration with only 6 epochs, the CNNs could provide a very high performance of classification. Figures 3 and 4 show one of the training processes of DenseNet-201 (best network) and some features obtained from the deep learning of the best network, respectively.  www.nature.com/scientificreports/ As the numbers of COVID-19 and non-COVID-19 CT images used in this study are 349 and 397, respectively, the binary classification in this study was not much disadvantaged from the class imbalance problem, where the class distributions are highly imbalanced. Due to imbalanced data, classifiers tend to result in low predictive accuracy for the minority class. Medical datasets are often not balanced in the class labels because of limited samples collected from patients and cost for acquiring annotated data. There are many techniques proposed for addressing class imbalance , which can be applied to medical imaging, such as the "deep domain adaptation" 23 for handling the shortage of large amounts of labeled data, weighted loss method by updating the loss function to result in the same loss for all classes, downsampling by removing images from the majority class, and oversampling by adding more images to minority classes using artificial data augmentation 24,25 . Open challenges in imbalance data and exploration for solutions can be found in 26 . conclusions AI-based medical diagnosis systems based on deep learning of medical imaging are increasingly recognized to be clinically useful. However, development of suitable deep-learning networks and effective training strategy for clinical applications is a topic of research that needs to be explored 27 . Through a comprehensive investigation of 16 pretrained CNNs using certain parameter specification and training strategy for the networks, this study discovers the very high performance of several of these networks for COVID-19 diagnosis using CT images. The network configuration of the pretrained models can be implemented for classification of other image modality, such as X-ray, for the detection of COVID-19.
Most AI studies on chest CT used for differentiating COVID-19 pneumonia from other causes of pneumonia consider both three-class classification problems (COVID-19 pneumonia, non-COVID-19 pneumonia, and healthy) and two-class classification (COVID-19 pneumonia and healthy) 2 . Due to the limit of publicly available data, this study concerns with the two-class classification. However, extension of the use of pretrained CNNs to the three-class classification of COVID-19 imaging data is straightforward.
The findings reported from this study bring benefits to the development of fast and efficient diagnostic tools using imaging data and contribute to further leading into the development of more accurate point-of-care diagnostic and detection tools for containing the coronavirus pandemic.