Assisting scalable diagnosis automatically via CT images in the combat against COVID-19

The pandemic of Coronavirus Disease 2019 (COVID-19) is causing enormous loss of life globally. Prompt case identification is critical. The reference method is the real-time reverse transcription PCR (RT-PCR) assay, whose limitations may curb its prompt large-scale application. COVID-19 manifests with chest computed tomography (CT) abnormalities, some even before the onset of symptoms. We tested the hypothesis that the application of deep learning (DL) to 3D CT images could help identify COVID-19 infections. Using data from 920 COVID-19 and 1,073 non-COVID-19 pneumonia patients, we developed a modified DenseNet-264 model, COVIDNet, to classify CT images to either class. When tested on an independent set of 233 COVID-19 and 289 non-COVID-19 pneumonia patients, COVIDNet achieved an accuracy rate of 94.3% and an area under the curve of 0.98. As of March 23, 2020, the COVIDNet system had been used 11,966 times with a sensitivity of 91.12% and a specificity of 88.50% in six hospitals with PCR confirmation. Application of DL to CT images may improve both efficiency and capacity of case detection and long-term surveillance.

www.nature.com/scientificreports/ with chest computed tomography (CT) abnormalities, some even before the onset of symptoms. We tested the hypothesis that the application of deep learning (DL) to 3D CT images could help identify COVID-19 infections. Using data from 920 COVID-19 and 1,073 non-COVID-19 pneumonia patients, we developed a modified DenseNet-264 model, COVIDNet, to classify CT images to either class. When tested on an independent set of 233 COVID-19 and 289 non-COVID-19 pneumonia patients, COVIDNet achieved an accuracy rate of 94.3% and an area under the curve of 0.98. As of March 23, 2020, the COVIDNet system had been used 11,966 times with a sensitivity of 91.12% and a specificity of 88.50% in six hospitals with PCR confirmation. Application of DL to CT images may improve both efficiency and capacity of case detection and long-term surveillance.
The world is suffering from the COVID-19 pandemic since its outbreak in December 2019 [1][2][3] . COVID-19 is highly contagious and infected patients can be asymptomatic but infectious 4 . As of July 11, 2020, there have been over 12 million confirmed COVID-19 cases and 556,335 deaths worldwide 5 . Community transmission has been increasingly reported in more than 180 countries 5 . Before any effective and safe vaccine of COVID-19 becomes available in clinical settings, improving the efficiency of the current clinical pathways and the capacity of patient management are crucial to successfully combat the COVID-19 pandemic and possible resurgence in the future 6,7 . Case identification is an important first step for subsequent clinical triage and treatment optimization. The reference detection method is using the real-time reverse transcription PCR (RT-PCR) assay to detect viral RNA 1 . Several limitations of this assay may curb its prompt large-scale application [8][9][10] .
Chest computed tomography (CT) can effectively capture the manifestations of COVID-19 infections and even asymptomatic infections [10][11][12] . Deep learning, an artificial intelligence (AI) technology, has achieved impressive performance in the analysis of CT images [13][14][15][16] . Chest CT with the aid of deep learning offers promises to reduce the burden of prompt mass case detection, especially under the shortage of RT-PCR 17 . We developed an automated robust deep learning model, COVIDNet, by directly analyzing 3D CT images, to assist screening and diagnosis of COVID-19 infected patients. Furthermore, as of March 23, 2020, the COVIDNet system had been employed in 6 hospitals in China with PCR confirmation. We provided clinical insights into the image features extracted by COVIDNet and proposed a practical scenario on how the developed tool might improve clinical efficiency.

Results
Two independent cohorts of 2,800 patients were retrospectively recruited for model development and secondary test (Fig. 1). The model development cohort enrolled 920 COVID-19 patients and 1,073 non-COVID-19 patients, and all the patients in this cohort were randomly divided into three non-overlapping sets at the patient level: training, validation, and initial test, approximately at a 3:1:1 ratio (Fig. 1, Supplementary Tables 1-4). The secondary test consisted of 233 COVID-19 patients and 289 non-COVID-19 pneumonia patients (Fig. 1, Supplementary  Tables 5-7). Regarding the two cohorts, the training and validation datasets consisted of the images of all scans for each patient to train and fine-tune the COVIDNet system. However, the initial and the secondary test dataset only employed the first CT scan image of each patient to calculate the model performance at the first-diagnosis.
The initial test dataset included 372 patients, 41.4% (154/372) of which were confirmed COVID-19 cases. COVIDNet yielded a remarkable diagnostic performance, with an accuracy rate of 96.0% and an AUC of 0.986. More performance measures were shown in Table 1 and Extended Data Fig. 1. The model performance was based on the first CT scan of each patient. Note that our development dataset might have an overlap of 17 patients with the dataset of Li et al. 18 , including 14 patients in the training set, and 3 patients in the initial test set. Retraining and retesting without these patients yielded similar results (Supplementary Table 8 and Extended Data Fig. 2).
The t-SNE representation of chest CT showed two clear clusters, color-coded by the class labels (Fig. 2). Most cases are located within their respective clusters, suggesting that COVIDNet successfully extracted distinct CT features of COVID-19 pneumonia. We selected three groups of representative cases (G1, G2, G3 in Fig. 2), and presented their CT manifestations along with the probability of COVID-19 in Extended Data Table 8. A typical manifestation of COVID-19 pneumonia is multiple ground-glass opacity (GGO) in the subpleural area of bilateral lungs. Radiologists confirmed a similar manifestation in the COVID-19 cluster (for example, the G1 red points). As for the misclassified COVID-19 cases (the G3 red points), three cases had no definite finding, which was difficult to identify only using the images; the other cases had extensive GGO with partial consolidation or combined with pleural effusion and interstitial edema, which were not the typical manifestations of COVID-19. The misclassified non-COVID-19 cases (the G2 blue points) consisted of one bacterial and two influenza B patients, which were classified as COVID-19 due to the appearance of extensive GGO.
After the secondary test, we deployed the COVIDNet system in 6 hospitals to assist radiologists to screen suspected patients upon initial contact. The application pipeline of COVIDNet was illustrated in Fig. 3 and Extended Data Fig. 3 www.nature.com/scientificreports/ insights for regions under considerable strains of nucleic test kits, with limited testing facilities, or facing community transmission epidemic. As of March 23, 2020, the COVIDNet system had been used to process 11,966 CT scans in six hospitals with PCR confirmation, resulting in a sensitivity of 90.52% and a specificity of 88.50% ( Fig. 4 and Supplementary Table 11).

Discussion
The COVID-19 pandemic continues to spread widely around the world. Until an effective vaccine becomes available in clinical use, we are in the combat against SARS-CoV-2 for the foreseeable near future. Accurate and prompt diagnosis of COVID-19 infection is essential for patient management. The specified criteria described in the current COVID-19 clinical management guideline have faced several challenges 20 . As the primary diagnostic tool, the nucleic acid test has several disadvantages 21 . Early clinical manifestations of COVID-19 are fever, cough, and dyspnea that are similar to non-COVID-19 viral pneumonia. Although chest radiograph is the initial screening image tool in some countries around the world, chest CT has been a vital part of the COVID-19 infection diagnostic pathway. COVID-19 infection with the main CT presentation of GGO can be easily confused with other viral pneumonia and fungal pneumonia. COVID-19 infection with the main CT manifestation of consolidation may be confused with a bacterial infection. Our research showed that COVIDNet offered one powerful tool for screening the COVID-19 suspected patients. It could distinguish COVID-19 from other pneumonia infections promptly and accurately. The secondary test showed COVIDNet's robustness against seven other types of pneumonia with confirmed pathogen evidence and various CT devices, as well as its faster and more accurate performance over expert radiologists. Our results also showed that the radiologists from the COVID-19 designated hospitals performed better than those from the non-epidemic regions. The excellent inter-rater reliability among the radiologists, together with their overall poorer performance against COVIDNet, suggested that COVIDNet provided more unbiased results and captured clinically important features of COVID-19 infections that might not have been detected by the human experts, given the fact that all COVID-19 cases were confirmed via nucleic test.
One recent study developed a deep learning screening tool for COVID-19 18 . They extracted features from each axial CT scan of a patient independently and aggregated the stack of features right before making the classification decision. On the contrary, our COVIDNet model directly extracted spatial features from the entire 3D CT scan using a true three-dimensional deep learning model. We also demonstrated the generalizability of our model on a secondary test dataset through comparison with expert radiologists. Most importantly, COVIDNet had been deployed for clinical use in 6 hospitals in China with PCR confirmation as of March 23rd, 2020.
Lacking methods for visualizing how deep learning works has been one of the major bottlenecks for its application in medical settings. To further investigate how COVIDNet made classification decisions, we visualized the extracted features using t-Distributed Stochastic Neighbor Embedding (t-SNE) 22 . The results showed that COVIDNet indeed extracted image features that could separate COVID-19 from the other types of pneumonia. We reported image signatures from representatives of the correctly classified COVID-19 cases, the misclassified COVID-19, and non-COVID-19 cases. Such image signatures could offer useful insights for clinical decisions.  www.nature.com/scientificreports/ However, due to the limitations of the indistinct outline of lesion regions in the CT images, it would be subjective to classify COVID-19 and non-COVID-19 pneumonia by the approach of image labeling and segmentation, which is a traditional pathway to illustrate the difference of the diseases. When facing an outbreak of COVID-19, often with a severe shortage of medical personnel, prompt and accurate image review and interpretation might be a key limiting factor for appropriate clinical decision making. COVIDNet can rapidly detect clinically relevant lung lesions from hundreds of CT images. Together with the probability sorting, COVIDNet may greatly improve the screening and diagnosis efficiency. Besides, COVIDNet can automatically quantify the proportion of image abnormalities, supporting further clinical decisions. However, CT presentations for patients with COVID-19 vary dramatically according to stages of the disease, especially for those with basic diseases and complications. Other types of pneumonia may also share image abnormalities with COVID-19. Deep learning technology may not perform well under these circumstances. Therefore, a patient's epidemiological and clinical information needs to be closely integrated for further clinical decision-making in the diagnosis and treatment of COVID-19, and the scores produced by the model are not calibrated. Therefore, even though they can serve as a proxy for the classification confidence, their interpretation is rooted in accumulated experience obtained by integrating the model with clinical practice. Moreover, the slight overlap in hospitals between the model development cohort and the external validation may lead to the imperfection of the external Table 1. COVIDNet diagnostic performance on the initial test dataset. 372 patients were considered, including 154 COVID-19 patients and 218 non-COVID-19 patients. The diagnostic metrics of accuracy, AUC, sensitivity, specificity, PPV, NPV, and F1 score were calculated as well as the corresponding CIs. AUC area under the ROC curve, PPV positive predictive value, NPV negative predictive value, CI confidential interval.

COVID-19 (n = 154)
Non-COVID-19 (n = 218) www.nature.com/scientificreports/ validation. Above all, no one can ignore that this virus is evolving in directions that we don't know yet 23 . In this particular occasion, COVIDNet would serve as an effective tool for routine screening in clinical settings where chest CT is prescribed. The screening role of COVIDNet may be limited in regions where chest radiography is the primary investigation method instead of CT.
In conclusion, we have developed an automated classification neural network model, COVIDNet, specifically designed to distinguish COVID-19 from seven other types of pneumonia with confirmed pathogens through analyzing patients' 3D chest CT scans. In principle, the model can be deployed anywhere in the world with CT imaging capability at a low cost and provide radiological decision support where COVID-19 imaging diagnosis expertise is scarce, especially when facing COVID-19 outbreaks. Our results warrant further validation in future studies.

Methods
Datasets. We retrospectively recruited two cohorts for model development and secondary test, with a total of 2,800 patients (1,430 COVID-19 patients and 1,370 non-COVID-19 patients). And only the non-contrast scans were enrolled in this study. The model development dataset consisted of CT scans from 2,278 pneumonia patients, who suffered from either COVID-19 or other types of pneumonia. We collected 1,197 COVID-19 cases between January 5, 2020 and March 1, 2020 from ten designated COVID-19 hospitals in China (Supplementary Table 1). These COVID-19 cases were confirmed by positive results from RT-PCR assays testing nasal or pharyngeal swab specimens. We also randomly selected 1,081 non-COVID-19 patients with chest CT abnormalities according to the criteria listed in Supplementary Table 2 from patients that were hospitalized between November 18, 2018 and February 21, 2020 in three other general hospitals in China (Supplementary Table 4).
We excluded 285 patients under the following two circumstances after screening all images by two senior radiologists with 30-year work experience (Fig. 1): 244 COVID-19 patients with the time between the first CT scan and the first positive nucleic acid test longer than two weeks; and 41 patients with large breathing or body motion artifacts, including 33 COVID-19 patients and 8 non-COVID-19 patients.
For the secondary test cohort, we collected 233 COVID-19 cases between March 2, 2020 to March 13, 2020 from four COVID-19 designated hospitals. We also randomly selected 289 patients that were hospitalized between February 22, 2020 and March 1, 2020 in two general hospitals in China. Two of the COVID-19 hospitals and one non-COVID-19 hospital are also enrolled in the model development cohort. The inclusion criteria of COVID-19 and non-COVID-19 patients are the same as described above.
All CT scans of the two cohorts were performed upon the first contact, with patients in the supine position at full inspiration, and covered the whole chest.

Model evaluation.
The performance of the model was evaluated using accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score. The receiver operating characteristic (ROC) curve and confusion matrix were generated based on the classification results. The area under the ROC (AUC) was also calculated. Bootstrap with 10,000 replications was used to calculate the 95% confidence interval for each metric. Evaluation results were obtained and visualized using python libraries, including NumPy Furthermore, the performance of COVIDNet was compared with eight independent expert radiologists with 6-23 years of experience, on the diagnosis of COVID-19 using the secondary test dataset. Four radiologists are from the COVID-19 designated hospitals and the other four are not. To ensure that the radiologists could concentrate on the trail, each of them only read CT images no more than two hours per day under the surveillance of one research assistant. Before the radiologists initiated the CT image reading, the research assistant informed each radiologist about the CT signs in the guidelines to eliminate knowledge bias. The true pneumonia class was blinded to all the radiologists. We also used Cohen's κ coefficient to evaluate the inter-rater agreement among COVIDNet and the eight radiologists (Supplementary Table 9) 19 . We categorized κ coefficients as follows: poor (0 < κ ≤ 0.20), fair (0.20 < κ ≤ 0.40), moderate (0.40 < κ ≤ 0.60), good (0.60 < κ ≤ 0.80), and excellent (0.80 < κ ≤ 1.00).
To further understand the model's classification decision, we visualized the extracted feature distribution of the model using t-Distributed Stochastic Neighbor Embedding (t-SNE) 22 , an unsupervised non-linear dimension reduction algorithm commonly used to visualize high dimensional data. It projects high-dimensional feature maps right before the final fully-connected layer of COVIDNet onto a two-dimensional space and converts similarities between the original data pairs to similarities between the projected data pairs in the two-dimensional space. Since it considers the local structure so that after projection, it can reveal interesting clusters among the data.