Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography

Computed tomography (CT) is the preferred imaging method for diagnosing 2019 novel coronavirus (COVID19) pneumonia. We aimed to construct a system based on deep learning for detecting COVID-19 pneumonia on high resolution CT. For model development and validation, 46,096 anonymous images from 106 admitted patients, including 51 patients of laboratory confirmed COVID-19 pneumonia and 55 control patients of other diseases in Renmin Hospital of Wuhan University were retrospectively collected. Twenty-seven prospective consecutive patients in Renmin Hospital of Wuhan University were collected to evaluate the efficiency of radiologists against 2019-CoV pneumonia with that of the model. An external test was conducted in Qianjiang Central Hospital to estimate the system’s robustness. The model achieved a per-patient accuracy of 95.24% and a per-image accuracy of 98.85% in internal retrospective dataset. For 27 internal prospective patients, the system achieved a comparable performance to that of expert radiologist. In external dataset, it achieved an accuracy of 96%. With the assistance of the model, the reading time of radiologists was greatly decreased by 65%. The deep learning model showed a comparable performance with expert radiologist, and greatly improved the efficiency of radiologists in clinical practice.

In December 2019, a new coronavirus infection disease (hereinafter referred to as COVID- 19) was first reported in Wuhan. Subsequently, the outbreak began to spread widely in China and even abroad [1][2][3] .
The clinical manifestations of the COVID-19 pneumonia is complicated and could be characterized as fever, cough, myalgia, headache, and gastrointestinal symptoms onset 4 . Although the nucleic acid detection was considered determinant for identifying the COVID-19 infection and more rapid detection kit for the novel coronavirus has come into mass production, computed tomography (CT) scan is still the most efficient modality for detecting and evaluating the severity of pneumonia 5 . An update series demonstrate that CT findings were positive in all 140 laboratory-confirmed COVID-19 patients, even in the early stage 4,6 . In the fifth version of diagnostic manual of COVID-19 launched by the National Health and Health Commission of China, the radiographic characteristics of pneumonia was included the clinical diagnostic standard in Hubei Province 7  www.nature.com/scientificreports/ of COVID-19 were reported within 1 day on Feb 13, 2020 in Wuhan, including 13,332 cases of clinical diagnoses 8 . This highlighted the importance of CT in the diagnosis of COVID-19 pneumonia. Due to the outbreak of the COVID-19, thousands of patients waited in line for CT examination in the designated fever outpatient hospital at Wuhan and other cities. As of Feb 14, there are 5,534 suspected cases, 38,107 confirmed patients receiving treatment in hospital, and 77,323 cases under medical observation in Hubei province 9 . Most of them need to undergo CT examination, however, there are less than 4,500 radiologists in cities of Hubei according to the China Health Statistical Yearbook (2018) 10 . Meanwhile, because the lung infection foci are small in the early stage of the COVID-19 infection, thinner layer (2.5 mm, 1.25 mm or even 0.625 mm) scanning were usually needed instead of conventional CT scan (5 mm) for diagnosis, which would be more timeconsuming. All these made radiologists overloaded, delay the diagnosis and isolation of patients, affect patient's treatment and prognosis, and ultimately, affect the control of COVID-19 epidemic.
Deep learning, an important breakthrough in the domain of AI in the past decade, has huge potential at extracting tiny features by the basic unit of DCNN's sampling kernel in image analysis 11 . Our group also succeeded in recruiting this technique in minor lesion detection and real-time assistance to doctors in gastrointestinal endoscopy [12][13][14][15][16] .
In the present research, we construct and validate a system based on deep learning for identification of viral pneumonia on CT. Our model has comparable performance with expert radiologist, but take much less time. The module and source code developed in this work were shared for global researches in https ://githu b.com/ endo-angel /ct-angel , and an open-access website has been made available to provide free provide to the present system (https ://121.40.75.149/znyx-ncov/index ).  17 . For patients whose CT scans were stored in the retrospective databases, informed consent was waived by the Ethics Committee. A statement to confirm that all methods were carried out in accordance with relevant guidelines and regulations.

Method
Diagnostic testing for COVID-19. Patient's respiratory secretions were collected and transferred to a sterile test tube with a virus transport medium. Fluorescent RT-PCR analysis of samples was performed using the COVID-19 nucleic acid detection kit developed by Shanghai Geneodx Biotechnology Co., Ltd. This detection kit was approved by the US National Drug Administration (NMPA) on January 26, 2019 and recommended by the Centers for Disease Control and Prevention (CDC) 18 . The rapid, high-precision COVID-19 detection kit greatly accelerated the confirmation of human COVID-19 infection. Fig. 1, a total of 46,096 CT scan images from 51 COVID-19 pneumonia patients and 55 control patients of other disease from Renmin Hospital of Wuhan University were collected for developing the model to detect COVID-19 pneumonia. After filtering those images without good lung fields, 35,355 images were selected and split into training and retrospectively testing datasets. Enrolled images in training dataset covered almost all common CT features of COVID19 pneumonia, as presented in Fig. 2. Three radiologists with more than 5 years of clinical experience labelled infection lesions of COVID-19 pneumonia patients in training dataset, and selected images containing COVID19 pneumonia lesions in testing set, and their labels were combined by consensus. For prospectively testing the model, 13,911 images of 27 consecutive patients undergoing CT scans in Feb 5, 2020 in Renmin Hospital of Wuhan University were further collected. All CT scans were obtained in Renmin Hospital of Wuhan University. To estimate the robustness of the system, an external dataset containing 100 patients (13,734 images from 50 COVID-19 and 17,030 images from 50 normal control patients) were retrospectively collected from Qianjiang Central Hospital, China. The instruments used in this study included Optima CT680, Revolution CT and Bright Speed CT scanner (all GE Healthcare).

Datasets. As shown in
Training algorithm. This work is built on the top of UNet++, a novel and powerful architecture for medical image segmentation 19 , for the identification. Resnet-50 was used as backbone of UNet++ as previously described 20 . ResNet-50 21 was pretrained using ImageNet dataset 22 , and all the pre-training parameters of ResNet-50 are loaded to UNet++. The network architecture of UNet++ was shown in Fig. 3. Briefly, UNet++ consists of encoder and decoder connecting through a series of nested dense convolutional blocks. The semantic gap between the feature maps of the encoder and decoder is bridged prior to fusion. The encoder extract features by down-sampling; the decoder map features to the original image by up-sampling, make classification by pixels, and thus achieve the purpose of segmentation. We first trained UNet++ to extract valid areas in CT images using 289 randomly selected CT images and tested it in other 600 randomly selected CT images. The prediction schematic of the model was shown in Fig. 4. Raw images were firstly input into the model, and after processing of the model, prediction boxes framing suspicious lesions were output. Valid areas were further extracted and unnecessary fields were filter out to avoid possible false positives. To predict by case, a logic linking the prediction results of consecutive images was added. CT images with the above prediction results were divided into four quadrants, and results would be output only when three consecutive images were predicted to have lesions in the same quadrant.

Testing of the model in retrospective data.
To evaluate the performance of the model on CT scan images, five metrics including the accuracy, sensitivity, specificity, positive prediction value (PPV) and negative prediction value (NPV) were calculated as follows: accuracy = true predictions/total number of cases, sensitivity = true positive/positive, specificity = true negative/negative, PPV = true positive/(true positive + false positive), NPV = true negative/(true negative + false negative). The "true positive" is the number of correctly predicted COVID-19 pneumonia cases/images, "false positive" is the number of mistakenly predicted COVID-19 pneumonia cases/images, "positive" is the number of cases/images of COVID-19 pneumonia patients, "true negative" is the number of correctly predicted non-COVID-19 pneumonia cases/images, "false negative" is the number of mistakenly predicted non-COVID-19 pneumonia cases/images and 'negative' is the number of non-COVID-19 pneumonia cases/images enrolled.  Representative images of COVID19 pneumonia. More than six common Computed tomography (CT) features of COVID19 pneumonia were covered in selected images. 1(a-d), the lesions were mainly ground-glass-like, with thickened blood vessels walking and including gas-bronchial signs in 1(c); 2(a-d), the lesions were mainly ground glass changes, and paving stone-like changes were observed on 2(d); 3(a-c), the lesions become solid with a large range, and air-bronchial signs are seen inside; 4, the lesion is located in the lower lobe of both lungs, and is mainly grid-like change with ground glass lesion; 5(a,b), the lesions are mainly consolidation; 6(a,b), the lesions are mainly large ground glass shadows, showing white lung-like changes, with air-bronchial signs.   The performance of the model in consecutive prospective patients. Twenty-seven patients were enrolled in the prospective dataset in Renmin Hospital of Wuhan University. Sixteen (59.26%) patients were diagnosed as viral pneumonia by the expert radiologist, and the other eleven patients were not. Two other radiologists reviewed the CT imaging, approved the expert's results, and summarized that the CT characteristics of  Table 3.   Comparison between the efficiency of radiologist with or without the assistance of AI. In the first time the expert radiologist read CT scan images of the 27 prospective patients, the average reading time for him to determine whether each patient has viral pneumonia was 116.12 s per case (IQR 85.69-118.17). After 10 days of wash out period, the same expert radiologist re-read the CT images of the 27 prospective patients with the assistance of the AI model. The results for determining whether each patient has viral pneumonia were not changed, while the average reading time of the expert was greatly decreased by 65%. This indicates that the efficiency of radiologist could be greatly improved with the assistance of AI. A website has been made available to provide free access to the present model (https ://121.40.75.149/znyxncov/index ) (Fig. 6). CT scan images could be uploaded by both clinicians and researches as a second opinion consulting service, especially in other provinces or countries unfamiliar with the radiologic characteristics of COVID-19. Cases of COVID-19 pneumonia were also been made available on the open-access website, which might be a useful resource for radiologists and researchers for fighting COVID-19 pneumonia. Furthermore, the module and source code developed in this work were shared for global researches in https ://githu b.com/ endo-angel /ct-angel . Table 3. The performance of the deep learning model on both retrospective and prospective dataset. PPV positive prediction value; NPV negative prediction value.  www.nature.com/scientificreports/

Discussion
As of Feb 14, 2020, the national health commission had reported 66,492 confirmed cases, 1,523 deaths and 8,969 suspected cases 23 . In the face of such large number of patients and high contagiosity of the novel coronavirus (with an estimated reproduction number R0 of 2.2 ~ 6.47), timely diagnosis and isolation are the keys to prevent further spread of the virus [24][25][26][27][28] . CT scan is the most efficient modality for screening and clinically diagnosing COVID-19 pneumonia 5,7 . However, compared to the needs of the patients, the number of radiologists is quite small, especially in Hubei province, China, which could greatly delay the diagnosis and isolation of patients, affect patient's treatment and prognosis, and ultimately, affect the overall control of COVID-19 epidemic. Deep learning, a technology has shown great performance on extracting tiny features in radiology data, may hold the promise to alleviate this problem 11 . Recently, Ardila D, et al. achieved end-to-end lung cancer screening on low-dose chest CT with an AUC of 94.4% 29 . Chae KJ, et al. successfully used the convolutional neural network to classify small (≤ 2 cm) pulmonary nodules on CT scan images 30 . However, there was rare research being conducted to detect viral pneumonia 11,29,30 . Most previous studies detected pneumonia on X-ray using deep learning while not focused on viral pneumonia. Furthermore, CT is more sensitive and commonly used than X-ray for identifying COVID-19. In our previous work, we succeeded in recruiting deep learning in minor lesion detection and real-time assistance to doctors in gastrointestinal endoscopy [12][13][14][15][16] . Here, we enrolled this technique in identification of COVID-19 pneumonia in CT images. Results from both retrospective and prospective patients showed that the model was comparable to the level of expert radiologist, and hold great potential to reduce diagnosing time. (Fig. 7).
Early diagnosis and early isolation of suspected patients are the most important ways to prevent the spread of epidemic 19 . Due to the sudden outbreak of COVID19, the radiology department is overloaded and patients have to wait for long times for chest CT scan, which largely increase the risk of cross-infection. In recent days, radiologists' daily workload is huge in Hubei province, and a CT scan report has to be awaited several hours to achieve. Based on the number of suspected patients and close contacts in being, radiologists in the hardest hit, Hubei province, China, may not be enough to resist the rapid spread of the virus, which holds high estimated R0 of 2.2 ~ 6.47 [25][26][27][28] . It could be inferred that before radiologists fulfilling the demands of existing patients, newly infected cases would appear, and the overall burden of radiologists is more overwhelming like a growing snowball. Relieving the pressure of radiologists is essential for the control of virus spreading. In the present study, our model achieved a comparable performance but with much shorter time compared with expert radiologists. It holds great potential to relieve the pressure of radiologists in clinical practice, and contribute to the control of the epidemic.
Timely diagnosis and early treatment of infected patients is important for patients' prognosis 31 . The fatality rate of COVID19 patients in Hubei province is significantly higher than that of other regions, which probably Figure 7. Abstract diagram. Computed tomography (CT) is the most efficient modality for screening and clinically diagnosing COVID-19 pneumonia. However, compared to the needs of the patients, the number of radiologists is quite small. After enrolling artificial intelligence in identifying COVID-19 pneumonia in CT images, the efficiency of diagnosis is greatly improved. The artificial intelligence holds great potential to relieve the pressure of frontline radiologists, accelerates the diagnosis, isolation and treatment of COVID19 patients, and therefore contribute to the control of the epidemic. www.nature.com/scientificreports/ due to delayed treatment and shortage of medical resources 8,32 . Accelerating diagnosis efficiency is significant for improving patient outcomes. In the present study, our model helped expert radiologists achieve the same work with much shorter time, which greatly accelerats the efficiency of diagnosis in clinical practice, and may contribute to the improvement of patient outcome. In addition to relieving radiologists' pressure and accelerating diagnosis efficiency, artificial intelligence also holds the potential to reduce miss diagnosis of COVID-19 patients. The lung infection foci are sometimes mild in the early stage of the COVID-19 infection 5 , and requires careful observation under 0.625 mm layer scanning. Radiologists vary in skills, and could be affected by subjective status and outside pressure. One miss diagnosis could lead to multiple spread. The model is highly sensitive and stable, and would never be affected by work burden and work time. As a preliminary screening tool, it might help radiologists improve the sensitivity and reduce miss diagnosis.
Notably, the sensitivity per patient is better while the other performance per patient is worse than the performance per image. Each patient has a large number of CT images (about 500), most of which were negative images without lesions. The specificity is equal to the true negative divided by all the negatives. The denominator increases hundreds of times when calculating specificity by image, while the numerator (false positive) does not increase so much, therefore, the specificity per image is higher than that of per patient. The same principles could be applied to accuracy and PPV. For sensitivity, a few images having suspicious lesions may be missed in COVID19 patients (sensitivity per image), while the probability that all images having suspicious lesions in a patient would be much lower (sensitivity per patient).
On the basis of the accuracy and efficiency of the model in detecting COVID-19 pneumonia, a cloud-based open-access artificial intelligence platform was constructed to provide assistance for detecting COVID-19 pneumonia worldwide. CT scan images could be uploaded freely by both clinicians and researches as an assistant tool, especially in other provinces or countries unfamiliar with the radiologic characteristics of COVID- 19. This free open-access website can read images in batches, provide high-level auxiliary diagnostic services for different hospitals in free, and expand the boundaries of regions and manpower. Cases of COVID-19 pneumonia were also been made available on the open-access website, which might be a useful resource for radiologists and researchers for fighting COVID-19 pneumonia.
In summary, the deep learning-based model achieved a comparable performance with expert radiologist using much shorter time. It holds great potential to improve the efficiency of diagnosis, relieve the pressure of frontline radiologists, accelerates the diagnosis, isolation and treatment of COVID19 patients, and therefore contribute to the control of the epidemic.