Deep learning model for the automatic classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy: a multi-center retrospective study

Nishio, Mizuho; Kobayashi, Daigo; Nishioka, Eiko; Matsuo, Hidetoshi; Urase, Yasuyo; Onoue, Koji; Ishikura, Reiichi; Kitamura, Yuri; Sakai, Eiro; Tomita, Masaru; Hamanaka, Akihiro; Murakami, Takamichi

doi:10.1038/s41598-022-11990-3

Download PDF

Article
Open access
Published: 17 May 2022

Deep learning model for the automatic classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy: a multi-center retrospective study

Mizuho Nishio¹,
Daigo Kobayashi¹,
Eiko Nishioka¹,
Hidetoshi Matsuo¹,
Yasuyo Urase¹,
Koji Onoue²,
Reiichi Ishikura²,
Yuri Kitamura³,
Eiro Sakai⁴,
Masaru Tomita⁵,
Akihiro Hamanaka⁶ &
…
Takamichi Murakami¹

Scientific Reports volume 12, Article number: 8214 (2022) Cite this article

2259 Accesses
19 Citations
1 Altmetric
Metrics details

Subjects

Abstract

This retrospective study aimed to develop and validate a deep learning model for the classification of coronavirus disease-2019 (COVID-19) pneumonia, non-COVID-19 pneumonia, and the healthy using chest X-ray (CXR) images. One private and two public datasets of CXR images were included. The private dataset included CXR from six hospitals. A total of 14,258 and 11,253 CXR images were included in the 2 public datasets and 455 in the private dataset. A deep learning model based on EfficientNet with noisy student was constructed using the three datasets. The test set of 150 CXR images in the private dataset were evaluated by the deep learning model and six radiologists. Three-category classification accuracy and class-wise area under the curve (AUC) for each of the COVID-19 pneumonia, non-COVID-19 pneumonia, and healthy were calculated. Consensus of the six radiologists was used for calculating class-wise AUC. The three-category classification accuracy of our model was 0.8667, and those of the six radiologists ranged from 0.5667 to 0.7733. For our model and the consensus of the six radiologists, the class-wise AUC of the healthy, non-COVID-19 pneumonia, and COVID-19 pneumonia were 0.9912, 0.9492, and 0.9752 and 0.9656, 0.8654, and 0.8740, respectively. Difference of the class-wise AUC between our model and the consensus of the six radiologists was statistically significant for COVID-19 pneumonia (p value = 0.001334). Thus, an accurate model of deep learning for the three-category classification could be constructed; the diagnostic performance of our model was significantly better than that of the consensus interpretation by the six radiologists for COVID-19 pneumonia.

Segment anything in medical images

Article Open access 22 January 2024

Generative models improve fairness of medical classifiers under distribution shifts

Article Open access 10 April 2024

Towards a general-purpose foundation model for computational pathology

Article 19 March 2024

Introduction

The novel coronavirus disease (COVID-19) outbreak is caused by a strain of coronavirus known as the severe acute respiratory syndrome coronavirus 2 that originated in Wuhan in the Hubei province in China at the end of 2019¹. The World Health Organization declared COVID-19 as a pandemic on March 11, 2020, then it had spread across the world². The website of the World Health Organization has listed the total number of reported patients with COVID-19 and the associated deaths. At the time of writing this paper, 163,869,893 patients and 3,398,302 deaths were reported on the website³.

COVID-19 is diagnosed using real-time polymerase chain reaction (RT-PCR) in many clinical situations. However, RT-PCR sensitivity is not very high in the detection of COVID-19; for example, one study reported that the sensitivity of RT-PCR (71%) was lower than that of chest computed tomography (98%)⁴. Owing to the low RT-PCR sensitivity, the effectiveness of chest X-Ray imaging (CXR) and computed tomography in the diagnosis of COVID-19 has been investigated⁵. The combination of CXR and artificial intelligence, such as deep learning (DL)⁶, has been extensively examined for automatic diagnosis of COVID-19^{7,8,9,10,11,12,13,14}. Since CXR is widely available and its cost is relatively low, the combination of CXR and artificial intelligence could be employed for screening purposes of COVID-19 without the need for medical doctors.

Recent advances in DL have shown promising diagnostic performance for automatic classification of various diseases of the skin, retinal fundus, brain, and other organs^6,15,16,17. DL-based automatic diagnosis is reportedly accurate, and performed well in the classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy on CXR images^{7,8,9,10,11,12,13}. Elgendi et al. compared the performance of 17 DL models with and without different geometric augmentations and examined the influence of data augmentation with respect to automatic classification of COVID-19 pneumonia. Their results demonstrated that the removal of the geometrical augmentation steps actually improved the performance of the DL models¹³. Monshi et al. optimized the data augmentation and the DL hyperparameters for classifying COVID-19 pneumonia. Their proposed CovidXrayNet based on EfficientNet-B0 achieved state-of-the-art accuracy¹⁸. Karakanis et al. proposed a new approach to classify COVID-19 pneumonia by exploiting a conditional generative adversarial network that generated synthetic images for augmenting the limited data amount. Their lightweight DL model (ResNet8-based) achieved competitive performance¹⁹. These technical advances of DL make the classification models of COVID-19 pneumonia more accurate and robust. However, the performance of DL models was mainly investigated using the public database of CXR, and the comparison of the diagnostic performance between DL models and radiologists was limited¹⁴.

Our study aimed to develop and validate a DL model for the automatic diagnosis of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy using CXR images. In order to develop and validate our DL model, two public datasets and one private dataset of CXR images were implemented in the current study; CXR images of the private dataset were collected from six hospitals. To compare the diagnostic performance, both our DL model and six radiologists evaluated the CXR images of the private dataset. In addition, code-available DL models for diagnosing COVID-19 were also compared with our DL model. The major contributions of this study were as follows. (i) The two large public datasets of CXR images were constructed, which can be available online. (ii) Our DL model was validated with CXR images of our private dataset of clinical cases. (iii) The comparison of diagnostic performance was performed between our DL model and six radiologists.

Methods

This retrospective study was approved by the institutional review boards of six hospitals (Kobe University Graduate School of Medicine, Kobe City Medical Center General Hospital, Kobe City Nishi-Kobe Medical Center, Hyogo Prefectural Kakogawa Medical Center, Kita Harima Medical Center, and Hyogo Prefectural Awaji Medical Center); the requirement for acquiring informed consent was waived owing to the retrospective nature of the stud. This study complied with the Declaration of Helsinki and Ethical Guidelines for Medical and Health Research Involving Human Subjects in Japan (https://www.mhlw.go.jp/file/06-Seisakujouhou-10600000-Daijinkanboukouseikagakuka/0000080278.pdf).

Proposed DL model

EfficientNet²⁰ was used as our DL model. By use of the EfficientNet B5 pretrained with noisy student²¹, transfer learning was performed for the automatic classification of CXR images of COVID-19, non-COVID-19 pneumonia, and the healthy. The implementation of our DL model was based on the open-source software (https://github.com/jurader/covid19_xp) of a prior study¹⁰. While VGG16²² was used as the pretrained model in the prior study¹⁰, EfficientNet with noisy student was used in the current study. The outline of the DL model is shown in Fig. 1. The details of the DL model are described in the Supplementary information. Grad-CAM was used for visual explanation of the diagnosis by our DL model²³.

Datasets

CXR images with anterior–posterior or posterior-anterior views of two public datasets and one private dataset were implemented in the current study. One public dataset was the COVIDx dataset^12,24. The other public dataset was constructed from two public datasets: the PadChest dataset^25,26 and BIMCV-COVID19 + dataset^27,28. Hereafter, we will refer to the second public dataset as COVID_BIMCV. CXR images of the private dataset (COVID_private) were retrospectively collected from the six hospitals. The details of the three obtained datasets are described in the Supplementary information.

Table 1 shows the total number of CXR images and the number of CXR images of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy in the COVIDx, COVID_BIMCV, and COVID_private datasets, respectively. The total number of CXR images was 14,258, 11,253, and 455 in the COVIDx, COVID_BIMCV, and COVID_private datasets, respectively. The number of COVID-19 pneumonia cases were 617, 1475, and 177 in the COVIDx, COVID_BIMCV, and COVID_private datasets, respectively.

Table 1 Numbers of CXR images in the COVIDx, COVID_BIMCV, and COVID_private datasets.

Full size table

The patient characteristics of the COVID_private dataset are shown in Table 2. The number of CXR images of the healthy, non-COVID-19 pneumonia, and COVID-19 pneumonia in the COVID_private dataset was 139, 139, and 177, respectively. The COVID_private dataset included 198 males and 257 females, aged 61.0 ± 18.6 years. The examination date of CXR in the COVID_private dataset ranged from January 13th, 2015 to December 22th, 2020.

Table 2 Patients’ characteristics in the COVID_private dataset.

Full size table

Dataset splitting and model training

Since the development set and test set were defined for the COVIDx dataset, they were used in the current study. A total of 100 and 50 CXR images were randomly selected as test sets for each of the COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy, in the COVID_BIMCV and COVID_private datasets, respectively. The other CXR images were used as development sets in the COVID_BIMCV and COVID_private datasets. Thus, the number of CXR images of the development set was 13,958, 10,953, and 305 in the COVIDx, COVID_BIMCV, and COVID_private datasets, respectively. The test set size was 300 in the COVIDx and COVID_BIMCV datasets, and 150 in the COVID_private dataset.

The development set was further divided into a training and validation set for each dataset. The validation set size was 300 in the COVIDx and COVID_BIMCV datasets, and 90 in the COVID_private dataset. The combined training set was constructed from the training sets of the three datasets for training the DL model. For the development set, five different random divisions of training and validation sets were performed for each dataset. Based on the five random divisions, model training with transfer learning and performance validation were performed. Therefore, five different trained models were obtained. In order to predict the diagnosis from the CXR image of the test set, an ensemble of the five trained models was used. Schematic illustration of the dataset splitting, model training, and prediction using our DL model is shown in Fig. 2.

Comparison with other DL models

Three code-available DL models were used for comparison. The first model was the COVID-Net model trained with the COVIDx dataset¹². Its pretrained model is available at https://github.com/lindawangg/COVID-Net (COVIDNet-CXR4-A). The second model was the DL model of Sharma A et al.¹¹, whose pretrained model is available at https://github.com/arunsharma8osdd/covidpred (Combined model 3 [101 epochs]). The final model was the DarkCovidNet⁹, which is available at https://github.com/muhammedtalo/COVID-19. Since the pretrained model of DarkCovidNet was unavailable, its model training was performed from scratch by the authors.

Observer study by the radiologists

In order to compare our DL model with the radiologists’ diagnostic ability, an observer study was performed including six radiologists (experience of the six radiologists ranged from 10 months to 15 years). The radiologists visually evaluated the CXR images of the test set of the COVID_private dataset and determined the diagnosis for the three-category classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy. With the exception of the CXR images, the radiologists were blinded to any clinical information of the test set of the COVID_private dataset. Since the combined training set used for our DL model was too large for the radiologists, the development set of the COVID_private dataset were provided for the radiologists’ training before the observer study. The training and interpretation time were not limited.

Performance evaluation

For our DL model, performance evaluation was conducted using the classification metrics of the three-category classification (class-wise precision, recall, F1-score, and three-category classification accuracy) in the three test sets²⁹. For radiologists and the code-available DL models, the same performance evaluation was conducted in the test set of the COVID_private dataset with 150 CXR images. In addition, the class-wise area under the curve (AUC) of the receiver operating characteristics (ROC) analysis was calculated for COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy²⁹. For the ROC analysis of the radiologists, a consensus interpretation score for the six radiologists was determined by majority voting of the individual interpretations¹⁴; the score ranged from 0 to 6.

Statistical analysis

The 95% confidence intervals (CI) of the classification metrics were calculated using 2000 bootstrap samples¹⁴. In addition, the class-wise AUC was compared using DeLong’s test between our DL model and the consensus interpretation of the radiologists. In order to control the family-wise error rate, Bonferroni correction was used; a p value less than 0.01666 was considered statistically significant. Statistical analyses were performed using scikit-learn package³⁰ of Python and pROC package³¹ of R (version 4.0.4, https://www.r-project.org/).

Results

Table 3 shows the results of the diagnostic performance of the four DL models, including our DL model, and the six radiologists in the test set of the COVID_private dataset. The three-category classification accuracy of our DL model was 0.8667 (130/150), and those of the six radiologists ranged from 0.5667 (85/150) to 0.7733 (116/150). The 95% CI of the three-category classification accuracies were 0.8067–0.9200 and 0.7067–0.8400 for our DL model and the radiologist with best accuracy (Radiologist 3), respectively. The three-category classification accuracy of our DL model was better than that of the six radiologists. For our DL model, the class-wise F1-scores of the healthy and COVID-19 pneumonia were higher than that of the non-COVID-19 pneumonia. This indicates that for our DL model, the diagnostic performance of the healthy and COVID-19 pneumonia was better than that of the non-COVID-19 pneumonia. On the other hand, for the six radiologists, the class-wise F1-scores of the healthy were higher than those of the COVID-19 pneumonia and non-COVID-19 pneumonia; hence, the diagnostic performance of the healthy was higher than that for COVID-19 and non-COVID-19 pneumonia. The three-category classification accuracies of the three code-available DL models were 0.6467 (97/150), 0.4267 (64/150), and 0.4000 (60/150), and COVID-Net¹² achieved the highest accuracy in the three-category classification among the three code-available DL models. Although the three-category classification accuracy of COVID-Net (0.6467) was comparable to those of the six radiologists, those of the other code-available DL models (0.4267 and 0.4000) were worse than those of the six radiologists. The class-wise F1-scores of the three code-available DL models for COVID-19 pneumonia were 0.3636, 0.5684, and 0.4160, and the DL model of Sharma et al.¹¹ achieved the highest class-wise F1-score for COVID-19 pneumonia among them; the class-wise F1-score of the DL model of Sharma et al. (0.5684) was higher than those of two radiologists (Radiologist 5 and Radiologist 6). However, the class-wise F1-score of the DL model of Sharma et al. for the healthy was 0.0000. Table S1 of the Supplementary information shows the results of the diagnostic performance in our DL model in the test sets of the COVIDx and COVID_BIMCV datasets.

Table 3 Class-wise precision, recall, F1-score, and three-category classification accuracy of four DL models and six radiologists in the COVID_private dataset.

Full size table

Table 4 shows the results of class-wise AUC and its 95% CI of our DL model in the test sets of the COVIDx, COVID_BIMCV, and COVID_private datasets. Table 4 also shows the results of the consensus of the six radiologists in the test set of the COVID_private dataset. Figure 3 shows the class-wise ROC curves of our DL model and consensus of the six radiologists in the test set of the COVID_private dataset. The class-wise AUC and its 95% CI of our DL model were as follows: 0.9914 and 0.9837–0.9990 for the healthy, 0.9772 and 0.9601–0.9942 for non-COVID-19 pneumonia, and 0.9934 and 0.9871–0.9996 for COVID-19 pneumonia. The class-wise AUC and its 95% CI of consensus of the six radiologists were as follows: 0.9656 and 0.9401–0.9911 for the healthy, 0.8654 and 0.8022–0.9286 for non-COVID-19 pneumonia, and 0.8740 and 0.8164–0.9316 for COVID-19 pneumonia. The difference of the class-wise AUC between our DL model and consensus of the six radiologists was statistically significant for COVID-19 pneumonia (p value = 0.001334). The differences were not statistically significant for the healthy and non-COVID-19 pneumonia (p values = 0.07252 and 0.02617, respectively). Table S2 of the Supplementary information presents the confusion matrix of the three-category classification for our DL model in the test set of the COVID_private dataset. Table S3 of the Supplementary information shows the class-wise AUC and its 95% CI for our DL model when changing the data splitting between the test and development sets. Figures S1 and S2 of the Supplementary information show the class-wise ROC curves of our DL model in the test sets of the COVIDx and COVID_BIMCV datasets, respectively.

Table 4 Class-wise AUC and its 95% CI of our DL model and consensus of six radiologists.

Full size table

Figure 4 shows the CXR images and the results of Grad-CAM for the healthy, non-COVID-19 pneumonia, and COVID-19 pneumonia. The result of Grad-CAM of Fig. 4A illustrates that our DL model focused on the non-specific areas for diagnosis of the healthy. Figure 4B shows that our DL model focused on the infiltration shadow of the right lung field for diagnosis of non-COVID-19 pneumonia. Figure 4C shows that our DL model focused on the ground glass shadow of the peripheral area of both the lung fields for the diagnosis of COVID-19 pneumonia.

Discussion

The results of this study indicate that it is possible to construct an accurate DL model using the two public datasets (COVIDx and COVID_BIMCV) and one private dataset (COVID_private). Our deep learning model based on EfficientNet with noisy student could achieve an accurate diagnosis of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy. The three-category classification accuracy of our model was 0.8667, and those of the six radiologists ranged from 0.5667 to 0.7733. Difference of class-wise AUC between our model and the consensus of the six radiologists was statistically significant for COVID-19 pneumonia (p value = 0.001334).

Using the two public datasets and one private dataset, our DL model could achieve a higher diagnostic performance than the three code-available DL models and the six radiologists. Especially, for COVID-19 pneumonia, the class-wise AUC of our DL model was significantly higher than that of the consensus of the six radiologists. In DL, a large number of datasets is necessary for accurate classification. While COVID-Net used more than 10,000 CXR images to develop and evaluate its model¹², we used more than 20,000 CXR images for our DL model. We believe that the dataset size was a major factor in the diagnostic performance of our DL model. Another reason for the superiority of our DL model could be attributed to the use of a pretrained model constructed using noisy student²¹. Noisy student is a relatively new method for increasing the robustness of the DL model; the pretrained model of EfficientNet²⁰ with noisy student could be useful in improving our DL model.

The results of the three code-available DL models demonstrate that their classification metrics are not satisfactory. Although the three-category classification accuracy of COVID-Net was the highest in the three DL models, the F1-score of COVID-Net was the worst for COVID-19 pneumonia. In the other two models, the three-category classification accuracy was lower than those of the six radiologists. Many studies have used DL models for automatic classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy using CXR images^{7,8,9,10,11,12,13,14,18,19}. Table 5 summarizes these previous studies. While most of them were developed and validated using CXR images of public datasets, they were not validated with those of clinical cases. Our results indicate that most of the DL models of COVID-19 pneumonia in previously published papers may not be useful in clinical situations.

Table 5 Summary of COVID-19 DL models on CXR images.

Full size table

The three-category classification accuracy of the six radiologists ranged from 0.5667 to 0.7733. There was large variability in the diagnostic performance of the radiologists in the classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy using CXR images. Inversely, this indicates that the radiologists’ diagnostic performance could be improved using our DL model. The effectiveness of our DL model for computer-aided diagnosis system should be evaluated in future studies.

There are certain limitations to our study. First, although our DL model was developed and validated using two public datasets and one private dataset, it was not evaluated using external validation. Clinical usefulness of our DL model should be further evaluated by external validation³². Second, our DL model focused on the three-category classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy. The DL model ignored lung cancer and other diseases, which are considered important for detection on CXR images. This three-category classification may be considered unnatural from a clinical viewpoint. However, we speculate that this was justified owing to the higher priority of the three-category classification in the COVID-19 pandemic. Third, our observer study was conducted on the CXR image obtained from relatively large-sized hospitals. However, since CXR can be performed in various hospitals and clinics, further studies are warranted to determine whether our DL model is effective in small hospitals and clinics. Thus, the outputs of our DL model should be adjusted based on the circumstances in which our DL model is used. Fourth, we focused on the automatic classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy using CXR images and the diagnostic performance of radiologists with our DL model was not evaluated. Thus, we did not evaluate the usefulness of our DL model as a computer-aided system. If radiologists doubt the results of our DL model, the diagnostic performance of radiologists may not be improved using our DL model. Therefore, in the future, it is crucial to build trust between the radiologists and the DL model for its implementation in clinical practice³³. Fifth, although the results of Grad-CAM (for example, Fig. 4) could be useful to radiologists for comprehending the classification results of our DL model, the effectiveness of the results of Grad-CAM was not validated in the current study.

In conclusion, it is feasible to create an accurate model of DL for three-category classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy. The diagnostic performance of our model was significantly better than that of the consensus interpretation by the six radiologists for COVID-19 pneumonia.

Data availability

The private dataset cannot be disclosed because of privacy protection and regulation. Source code of our DL model and the two public datasets are available from the following URL: https://github.com/jurader/covid19_xp_efficientnet.

Abbreviations

COVID-19:: Novel coronavirus disease
RT-PCR:: Real-time polymerase chain reaction
CXR:: Chest X-ray imaging
DL:: Deep learning
COVIDx:: Public dataset used for COVID-Net
COVID_BIMCV :: Public dataset obtained from the PadChest dataset and the BIMCV-COVID19 + dataset
COVID_private :: Private dataset collected from six hospitals
AUC:: Area under the curve
ROC:: Receiver operating characteristics
CI:: Confidence interval

References

WHO | Novel Coronavirus – China. https://www.who.int/emergencies/disease-outbreak-news/item/2020-DON233.
COVID-19 situation reports. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports.
WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/.
Fang, Y. et al. Sensitivity of chest CT for COVID-19: Comparison to RT-PCR. Radiology https://doi.org/10.1148/radiol.2020200432 (2020).
Article PubMed Google Scholar
Bai, H. X. et al. Performance of radiologists in differentiating COVID-19 from viral pneumonia on chest CT. Radiology https://doi.org/10.1148/radiol.2020200823 (2020).
Article PubMed Google Scholar
Yamashita, R., Nishio, M., Do, R. K. G. & Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Imaging 9, 611–629 (2018).
Article Google Scholar
Shorfuzzaman, M. & Hossain, M. S. MetaCOVID: A siamese neural network framework with contrastive loss for n-shot diagnosis of COVID-19 patients. Pattern Recognit. 113, 107700 (2021).
Article Google Scholar
Islam, M. M., Karray, F., Alhajj, R. & Zeng, J. A review on deep learning techniques for the diagnosis of novel coronavirus (COVID-19). IEEE Access 9, 30551–30572 (2021).
Article Google Scholar
Ozturk, T. et al. Automated detection of COVID-19 cases using deep neural networks with X-ray images. Comput. Biol. Med. 121, 103792 (2020).
Article CAS Google Scholar
Nishio, M., Noguchi, S., Matsuo, H. & Murakami, T. Automatic classification between COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy on chest X-ray image: Combination of data augmentation methods. Sci. Rep. 10, 1–6 (2020).
Article Google Scholar
Sharma, A., Rani, S. & Gupta, D. Artificial intelligence-based classification of chest X-Ray images into COVID-19 and other infectious diseases. Int. J. Biomed. Imaging 2020 (2020). https://www.hindawi.com/journals/ijbi/2020/8889023/.
Wang, L., Lin, Z. Q. & Wong, A. COVID-Net: A tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci. Rep. 10, 1–12 (2020).
CAS Google Scholar
Elgendi, M. et al. The effectiveness of image augmentation in deep learning networks for detecting COVID-19: A geometric transformation perspective. Front. Med. 8, 629134 (2021).
Article Google Scholar
Wehbe, R. M. et al. DeepCOVID-XR: An artificial intelligence algorithm to detect COVID-19 on chest radiographs trained and tested on a large U.S. clinical data set. Radiology 299, E167–E176 (2021).
Article Google Scholar
Chilamkurthy, S. et al. Deep learning algorithms for detection of critical findings in head CT scans: A retrospective study. Lancet 392, 2388–2396 (2018).
Article Google Scholar
Esteva, A. et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017).
Article ADS CAS Google Scholar
Gulshan, V. et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA J. Am. Med. Assoc. 316, 2402–2410 (2016).
Article Google Scholar
Monshi, M. M. A., Poon, J., Chung, V. & Monshi, F. M. CovidXrayNet: Optimizing data augmentation and CNN hyperparameters for improved COVID-19 detection from CXR. Comput. Biol. Med. 133, 104375 (2021).
Article CAS Google Scholar
Karakanis, S. & Leontidis, G. Lightweight deep learning models for detecting COVID-19 from chest X-ray images. Comput. Biol. Med. 130, 104181 (2021).
Article CAS Google Scholar
Tan, M. & Le, Q. V. EfficientNet: Rethinking model scaling for convolutional neural networks. In 36th Int. Conf. Mach. Learn. ICML 2019 10691–10700 (2019).
Xie, Q., Luong, M.-T., Hovy, E. & Le, Q. V. Self-training with Noisy Student improves ImageNet classification. https://github.com/google-research/noisystudent (2020).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings (International Conference on Learning Representations, ICLR, 2015).
Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128, 336–359 (2020).
Article Google Scholar
GitHub - lindawangg/COVID-Net: COVID-Net Open Source Initiative. https://github.com/lindawangg/COVID-Net.
PADCHEST – BIMCV. https://bimcv.cipf.es/bimcv-projects/padchest/.
Bustos, A., Pertusa, A., Salinas, J. M. & de la Iglesia-Vayá, M. PadChest: A large chest x-ray image dataset with multi-label annotated reports. Med. Image Anal. 66, 101797 (2020).
Article Google Scholar
BIMCV-COVID19 – BIMCV. https://bimcv.cipf.es/bimcv-projects/bimcv-covid19/.
Vayá, M. de la I. et al. BIMCV COVID-19+: A large annotated dataset of RX and CT images from COVID-19 patients. arXiv (2020). arXiv:2006.01174
Javaheri, T. et al. CovidCTNet: an open-source deep learning approach to diagnose covid-19 using small cohort of CT images. npj Digit. Med. 4, 17 (2021).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet MATH Google Scholar
Robin, X. et al. pROC: An open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinform 12, 1–8 (2011).
Article Google Scholar
Park, S. H., Choi, J. & Byeon, J. S. Key principles of clinical validation, device approval, and insurance coverage decisions of artificial intelligence. Korean J. Radiol. 22, 442–453 (2021).
Article Google Scholar
Dzindolet, M. T., Peterson, S. A., Pomranky, R. A., Pierce, L. G. & Beck, H. P. The role of trust in automation reliance. Int. J. Hum. Comput. Stud. 58, 697–718 (2003).
Article Google Scholar

Download references

Acknowledgements

We thank Yoichiro Kuwata and Yoshiaki Watanabe for their cooperation.

Funding

The present study was supported by JSPS KAKENHI (Grant Number JP19K17232, 19H03599, and 22K07665).

Author information

Authors and Affiliations

Department of Radiology, Kobe University Graduate School of Medicine, 7-5-2 Kusunoki-cho, Chuo-ku, Kobe, 650-0017, Japan
Mizuho Nishio, Daigo Kobayashi, Eiko Nishioka, Hidetoshi Matsuo, Yasuyo Urase & Takamichi Murakami
Department of Radiology, Kobe City Medical Center General Hospital, 2-1-1 Minatojimaminamimachi, Chuo-ku, Kobe, 650-0047, Japan
Koji Onoue & Reiichi Ishikura
Department of Diagnostic Radiology, Kobe City Nishi-Kobe Medical Center, 5-7-1 Kojidai, Nishi-ku, Kobe, 651-2273, Japan
Yuri Kitamura
Department of Radiology, Hyogo Prefectural Kakogawa Medical Center, 203 Kanno-cho kanno, Kakogawa, 675-8555, Japan
Eiro Sakai
Department of Radiology, Kita Harima Medical Center, 926-250 Ichiba-cho, Ono, 675-1392, Japan
Masaru Tomita
Department of Radiology, Hyogo Prefectural Awaji Medical Center, 1-1-137 Shioya, Sumoto, 656-0021, Japan
Akihiro Hamanaka

Authors

Mizuho Nishio
View author publications
You can also search for this author in PubMed Google Scholar
Daigo Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Eiko Nishioka
View author publications
You can also search for this author in PubMed Google Scholar
Hidetoshi Matsuo
View author publications
You can also search for this author in PubMed Google Scholar
Yasuyo Urase
View author publications
You can also search for this author in PubMed Google Scholar
Koji Onoue
View author publications
You can also search for this author in PubMed Google Scholar
Reiichi Ishikura
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Kitamura
View author publications
You can also search for this author in PubMed Google Scholar
Eiro Sakai
View author publications
You can also search for this author in PubMed Google Scholar
Masaru Tomita
View author publications
You can also search for this author in PubMed Google Scholar
Akihiro Hamanaka
View author publications
You can also search for this author in PubMed Google Scholar
Takamichi Murakami
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: M.N. Data curation: M.N., D.K., E.N., Y.U., K.O., R.I., Y.K., E.S., M.T., A.H. Formal analysis: M.N. Funding acquisition: M.N. Investigation: M.N. Methodology: M.N. Project administration: M.N. Resources: M.N., D.K., E.N., Y.U., K.O., R.I., Y.K., E.S., M.T., A.H.. Software: M.N., H.M. Supervision: T.M. Validation: M.N., H.M. Visualization: M.N. Writing—original draft: M.N. Writing—review & editing: M.N., D.K., E.N., H.M., Y.U., K.O., R.I., Y.K., E.S., M.T., A.H., T.M.

Corresponding author

Correspondence to Mizuho Nishio.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Nishio, M., Kobayashi, D., Nishioka, E. et al. Deep learning model for the automatic classification of COVID-19 pneumonia, non-COVID-19 pneumonia, and the healthy: a multi-center retrospective study. Sci Rep 12, 8214 (2022). https://doi.org/10.1038/s41598-022-11990-3

Download citation

Received: 26 August 2021
Accepted: 03 May 2022
Published: 17 May 2022
DOI: https://doi.org/10.1038/s41598-022-11990-3

This article is cited by

Enhancing pediatric pneumonia diagnosis through masked autoencoders
- Taeyoung Yoon
- Daesung Kang
Scientific Reports (2024)
Generalizable disease detection using model ensemble on chest X-ray images
- Maider Abad
- Jordi Casas-Roma
- Ferran Prados
Scientific Reports (2024)
Computer-aided diagnosis of chest X-ray for COVID-19 diagnosis in external validation study by radiologists with and without deep learning system
- Aki Miyazaki
- Kengo Ikejima
- Takamichi Murakami
Scientific Reports (2023)
Leveraging human expert image annotations to improve pneumonia differentiation through human knowledge distillation
- Daniel Schaudt
- Reinhold von Schwerin
- Christopher Kloth
Scientific Reports (2023)
MultiCOVID: a multi modal deep learning approach for COVID-19 diagnosis
- Max Hardy-Werbin
- José Maria Maiques
- Joan Gibert
Scientific Reports (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Methods

Proposed DL model

Datasets

Dataset splitting and model training

Comparison with other DL models

Observer study by the radiologists

Performance evaluation

Statistical analysis

Results

Discussion

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links