A novel deep learning-based quantification of serial chest computed tomography in Coronavirus Disease 2019 (COVID-19)

This study aims to explore and compare a novel deep learning-based quantification with the conventional semi-quantitative computed tomography (CT) scoring for the serial chest CT scans of COVID-19. 95 patients with confirmed COVID-19 and a total of 465 serial chest CT scans were involved, including 61 moderate patients (moderate group, 319 chest CT scans) and 34 severe patients (severe group, 146 chest CT scans). Conventional CT scoring and deep learning-based quantification were performed for all chest CT scans for two study goals: (1) Correlation between these two estimations; (2) Exploring the dynamic patterns using these two estimations between moderate and severe groups. The Spearman’s correlation coefficient between these two estimation methods was 0.920 (p < 0.001). predicted pulmonary involvement (CT score and percent of pulmonary lesions calculated using deep learning-based quantification) increased more rapidly and reached a higher peak on 23rd days from symptom onset in severe group, which reached a peak on 18th days in moderate group with faster absorption of the lesions. The deep learning-based quantification for COVID-19 showed a good correlation with the conventional CT scoring and demonstrated a potential benefit in the estimation of disease severities of COVID-19.

Chest CT estimation by radiologists. The major CT demonstrations were described using internationally standard nomenclature defined by the Fleischner Society glossary and peer-reviewed literature on COVID-19, including ground-glass opacity (GGO) and consolidation 7,[15][16][17][18] . A conventional semi-quantitative scoring system (CT score) was used to estimate the involved pulmonary volume of all these abnormalities 7,19 . There was a score of 0-5 corresponding to the percentage of pulmonary involvement in each lobe as: 0, 0%; 1, < 5%; 2, 6-25%; 3, 26-49%; 4, 50-75%; 5, > 75%. The scores in five lobes were summed resulting in a total CT score ranging from 0 to 25. Two experienced radiologists (BL and LY, who had 25 and 22 years of experience in thoracic radiology, respectively) performed the estimations on the institutional digital database system (Vue PACS, version 11.3.5.8902, Carestream Health, Canada) independently and a consensus was reached after their discussion if there was a disagreement. The results of chest CT evaluation using deep learning-based quantification were blinded to both radiologists.
Chest CT evaluation using deep learning-based quantification. The deep learning-based quantification was performed using a novel established inflammation module (COVID-Lesion Net) based on one automatic segmentation software (Yitu CT, YITU Healthcare Technology Co., Ltd., China). This module was developed as a combination of U-net and Fully convolutional networks [20][21][22] . In order to detect the lung lesions effectively, a contracting path and an expansive path were employed in this COVID-Lesion Net structure, which consists of three different network components: (1) Twelve convolutional segment, which included convolutional layer, batch normalization layer, and an activation layer; (2) Three max-pooling layer for down-sampling; and (3) Three transpose convolutional layer for up-sampling (Fig. 1). Information on the input CT images was passed through convolutional segments along the two paths. In addition, concatenation operations were performed between convolutional segments as bridges of contracting and expansive paths to improve the information propagation within the network. In order to train and test the COVID-Lesion Net, chest CT images without respiratory artifacts from other 942 confirmed COVID-19 patients (from 1st Jan 2020 to 1st Mar 2020) and 1340 healthy persons participating in health examinations (from 1st September 2019 to 1st November 2019) were retrospectively collected from 1st January 2020 to 1st March 2020, and randomly divided into a training set (75%) and a test set (25%) (patients not involved in this study). 100 training epochs were performed for networking training with a batch size of 8. Adam algorithm was used for the model optimizer. The ground truth region of interest (GT-ROI) for lung lesions was first drawn by a radiologist (LL with 5-year experience in thoracic radiology) and then reviewed by a senior radiologist (GC with 28-year experience in thoracic radiology), who was responsible to modify ROIs if not accepted. Dice coefficient was used to evaluate the performance of this in-house built network for both training and test set using the following equation: After the lesion detection, Hellinger distance and intersection over union (IOU) of lung CT distribution were calculated to reflect the differences between patients with COVID-19 and reference patients (normal CT findings in the training set) 23,24 . Quantification parameters related to lung lesions including GGO and consolidation were determined with CT value thresholds of − 750 HU and − 350 HU, respectively 25 . The bilateral lungs were also segmented by adaptive thresholding and morphological operation [26][27][28] . Afterwards, the volumes of bilateral lungs and pulmonary lesions including GGO, consolidation, and both were calculated. In the meanwhile, the percentages of GGO, consolidation, and both (equal to 100 × lesions volume/bilateral lung volume) were calculated as a result of "percent of GGO/consolidation/pulmonary lesions". Study goals. 1. Correlation between conventional CT scoring and the deep learning-based quantification; 2. Exploring the dynamic patterns using conventional CT scoring and the deep learning-based quantification between moderate and severe groups.
Statistical analysis. Statistical analyses were performed using IBM SPSS Statistics Software (version 24; IBM, New York, USA). Quantitative data were presented as median with inter-quartile range (IQR) and frequency data were presented as the percentage of the total. The comparisons of the quantitative and counting data between moderate and severe groups were statistically evaluated using the Mann-Whitney U test and Chisquare test, respectively. The Spearman's correlation coefficient between CT score and deep learning-based quantification assessed using deep learning-based quantification was calculated. The SPSS curve estimation module was performed to explore the optimal fitting 7 . A p-value of < 0.05 was defined as having statistical significance.
Ethical approval. This

Results
Clinical characteristics. The details of the patients' clinical information were summarized in Table 1. The median age of the patients was 45 years (IQR: 35-60 years) with an approximately 1:1 ratio of male to female, but the median age of severe patients was higher than moderate patients (55 years vs. 39 years) but without statisti-  Table 2). The Spearman's correlation coefficient between CT score and percent of pulmonary lesions assessed by deep learning-based quantification was 0.920 (p < 0.001) (Table 3). Besides, the curve estimation presented an optimal quadratic fitting between two assessments with the r 2 = 0.924, which was better than the linear fitting (r 2 = 0.850) (Fig. 2). Table 1. Basic characteristics and clinical outcomes. The comparisons of the quantitative and counting data between moderate and severe groups were statistically evaluated using the Mann-Whitney U test and Chisquare test, respectively, and the statistical difference (p < 0.05) between the two groups was noted (*).     (Table 4). Besides, the volume of bilaterally uninvolved lungs was significantly lower in severe patients compared to moderate group (Table 4). In each group at different time points, it demonstrated significant correlations between CT score and percent of pulmonary involvement assessed by deep learning-based quantification (p < 0.001) ( Table 5). However, Spearman's correlation coefficient was higher in severe group than in moderate group at each time point (Table 5).
Dynamic patterns between moderate and severe groups estimated. CT scoring and the deep learning-based quantification involving 319 chest CT scans in moderate group and 146 chest CT scans in severe group were analyzed using SPSS curve estimations, respectively. Similar patterns were observed for both groups between the predicted CT score and the percentage of pulmonary lesions calculated by deep learning-based quantification (Fig. 3A,B). The pulmonary involvement increased more rapidly and reached the peak at 23rd days from symptom onset in severe group, while, in moderate group, it reached the peak at 18th days and experienced faster absorption (Fig. 3A,B). In moderate group, the predicted percentage of GGO and consolidation lesions followed similar patterns, which reached a peak at 18th days from symptom onset (2.65% and 0.72%, respectively) and decreased afterwards (Figs. 3C and 4). But in severe group, the peaks of the predicted percentage of GGO and consolidation lesions (23.03% and 4.99%, respectively) were higher than moderate group and the consolidation started to be absorbed earlier than GGO lesions (19 days vs. 23 days from symptom onset) (Fig. 3D).

Discussion
This study preliminarily compared a novel deep learning-based qualification to the conventional scoring system in the evaluation of COVID-19 CT manifestations. The results indicated a good correlation between these two estimations and similar findings of the CT patterns between moderate and severe COVID-19, although the correlation was relatively lower in moderate group at different time points than in the severe group. The deep learning-based qualification could calculate the percentage of the lesions separately for GGO and consolidation, which provided an added tool when compared to the conventional scoring system. In previous studies, the CT demonstrations of COVID-19 evolved through time from symptom onset 7,8 . For example, the GGO was the major early abnormal findings but consolidation was increasingly observed with time till the start of recovery 7,8 . Therefore, irregular chest CT scans of the patients might affect the longitudinal correlation analysis between the conventional CT scoring and the deep learning-based quantification. To avoid this potential impact, only the recovered patients that had experienced serial chest CT scans with relatively regular intervals (median: 8 days) were involved. As a result, 95 patients with serial CT follow-up for more than 1 month were involved. In consistence with the previous study, severe patients presented elder age and more abnormalities of the laboratory parameters (e.g. lymphocyte count, C-reactive protein, D-dimer, etc.) 6,[29][30][31][32][33] . Besides, moderate patients underwent more chest CT scans than severe patients resulting from the statistically longer follow-up period in moderate group compared to the severe group. However, the median interval between two adjacent chest CT scans was the same for both groups. In addition, no significant difference in the period from symptom onset to admission was found between the two groups, while the severe patients presented significantly longer hospitalized period owing to the treatment requirements. It must be pointed out that a mean of five chest CT scans was performed on each patient which brought radiation exposure issue. But under the actual pandemic pressure in that period in Wuhan, China, the shortage and high false-negative rates of the RT-PCR tests (about 2-33%) made clinical doctors chose chest CT scans as the first modality in the screening or follow-up for suspicious or confirmed COVID-19 patients which was cheaper and faster in China 4,5,13,34 . However, after the improvement of the shortage of RT-PCR tests in China, the chest CT scan was not firstly recommended at present. Thus, it will be impossible to get serial CT data like this study again.
Although some teams developed similar deep learning-based tools for the diagnosis and risk stratification of COVID-19, none was compared with the conventional radiologist-based estimation involving the whole course of this disease [35][36][37] . In this study, all the data of 465 serial chest CT scans were involved in the correlation analysis between conventional CT scoring and novel deep learning-based quantification. The results demonstrated a good correlation between these two estimations, not only the Spearman's correlation analysis (r = 0.920, p < 0.001). Moreover, the optimal fitting resulted in a quadratic equation (r 2 = 0.924), which was nearly linear with a relatively low slope when the CT score was less than 5 points. This may imply the risk of over-estimation of lesion areas using conventional CT scoring when the lesions were very small but distributed in multiple lobes. For instance, if there was a very small GGO in each lobe, the CT score might be 5 points, while the deep learning-based quantification could yield a lower value with higher precision. As evidence, it demonstrated a higher correlation between two methods when estimating the severe group, which presented more rapid progression and more extensive pulmonary involvement compared with moderate COVID-19 (peak percent of pulmonary lesions: 27.91% vs. 3.37%) leading to a longer disease course until the radiological resolution.
Another advantage of this deep learning-based quantification was the quantification of the lung volume and the percent of the lung involvements for different types of lesions, which was previously impossible in the context of conventional estimation by radiologists due to the extended workload, especially when mixed lesions were presented 7,8 . The novel quantification modality has enabled the dynamic pattern analysis in different groups with the precise quantification of both GGO and consolidation 25 . The quantification results of the dynamic patterns of the moderate and severe patients were similar to a cubic fitting in a previous study 7 . Furthermore, the results demonstrated that severe patients presented significantly lower lung volume than moderate patients at each time point, which might be attributed to the impairment of pulmonary function caused by COVID-19 or www.nature.com/scientificreports/ age factor. Therefore, the volume of bilateral lungs might correlate with the COVID-19 severities worth further exploration. On the other hand, although the predicted percent of consolidation reached the peak at a similar time (18-19 days from symptom onset) in both moderate and severe groups, the predicted peak percent of GGO and total pulmonary lesions was delayed in severe group (23 days from symptom onset, each). It was speculated that the absorption of the large area of consolidation might be accompanied by a temporal increase of GGO, reported as the "melting sugar" sign, which simultaneously demonstrated the decrease of solid components www.nature.com/scientificreports/ and the increase of the lesion area 1 . This phenomenon was not typical in moderate patients where the dynamic changes of GGO and consolidation seemed more synchronized. This was the major difference in the absorption stage between the two severities.
There are limitations in this study. First, although the conventional CT score that was widely used in the CT estimation of COVID-19 was chosen as the reference, to date there has been no gold standard for the lesion area quantification for viral pneumonia. Thus, whether the deep learning-based quantification is more accurate than the CT score is still uncertain. Second, all the deep-learning training and validation were from this single-center, not multi-center. Therefore, more samples from more centers are necessary for further model training to make a better model establishment.
In summary, this study evaluated a novel deep learning-based quantification for COVID-19, which showed a good correlation with the conventional CT scoring. The results indicated the potential application of deep . An exemplary illustration of a CT pattern in a moderate patient with COVID-19. Images from a patient presented fever for 6 days and was diagnosed with moderate COVID-19 afterwards. After admission, the serial chest CT scans were performed which demonstrated a dynamic pattern (First row) and the lesions were automatically segmented and color-coded from cold to warm color with the increase of the density using COVID-Lesion Net module (consolidation-orange; GGO-blue) (Second row). On admission (Day 6), a subpleural lesion with mixed lesions as a so-called "halo sign" [consolidation (6.56 cm 3