Introduction

The World Health Organization (WHO) officially declared coronavirus disease 2019 (COVID-19) a pandemic on March 11, 2020, and currently (October 2022) there are more than 680 million confirmed cases and 6.5 million deaths from COVID 19 worldwide1. Although COVID-19 is a highly contagious disease, the mortality rate is relatively low (1–3.5%), except in elderly patients with multiple underlying medical conditions. However, severe pneumonia develops in 15–20% of those affected, and intensive care unit (ICU) treatment is required in 5–10%2.

Currently, vaccines and treatments against severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) are increasingly being developed worldwide, and although the number of COVID-19 patients is still high, these measures have reduced the severity of the disease and improved its prognosis. However, treatment depends on the severity of COVID-19 symptoms, and severe cases require admission to the ICU. Reverse transcriptase-polymerase chain reaction (RT-PCR) is the most reliable diagnostic method for COVID-193, but computed tomography (CT) also has a high sensitivity for the diagnosis of COVID-19. The most common CT findings in patients with COVID-19 are ground-glass opacity (GGO) and consolidation4,5. In the early stages of COVID-19, clinical and imaging features are most important for establishing a diagnosis, assessing changes in severity, and adjusting treatment plans6.

Since the outbreak of this pandemic, COVID-19 has presented a pneumonia image on CT and an objective numerical CT score has been reported to indicate the extent to which the pneumonia image is occupied7,8,9. Most CT scores are read by a diagnostic radiologist, who anatomically divides the CT image into five lobes, three on the right and two on the left, and assigns a score for the extent of pneumonia to each lobe. The scoring obtained from these can be useful in predicting patient prognosis. However, one of the problems with this is that scoring from CT itself can be difficult and is not possible at every facility. In Japan, it is common for radiologists to read CT images taken and make a presumptive diagnosis on the CT images under the final check by a radiologist. However, non-radiologists such as primary care physicians, surgeons, and emergency physicians also have the skills to read CT images; our results show that the severity of illness did not differ when a skilled and experienced emergency physician read the images, without the need for a radiologist to perform a real-time, specialized reading. The CT score proposed in this study is relatively simple and easy to calculate and can be read by emergency physicians who are likely to see COVID patients first, not radiologists. In this study, the readings of emergency physicians and radiologists were also compared, and the prognostic prediction between the two was discussed. Thus, our CT score can provide physicians with clues about possible clinical outcomes without relying on radiologists, who are not readily available, especially in third-world countries. This study aimed to develop a CT score that can be used by non-radiologist physicians to predict prognosis and correlate it with clinical outcomes.

Methods

This was a retrospective study conducted using the data of patients with COVID-19 (except for outpatients with mild COVID-19) who underwent standard treatment. All patients were intubated with intensive care at Yokohama City University Hospital from June 2020 to September 2021. This was a single-center, retrospective analysis. All patients had positive RT-PCR results and underwent a chest CT at the time of tracheal intubation. The current study compared two groups of patients: those who died or were placed on extracorporeal membrane oxygenation (ECMO) and those who survived only on ventilator management.

Age, sex, and weight were considered as individual-specific information. The time from onset to oxygen administration, from onset to hospitalization, from onset to tracheal intubation, and from onset to CT imaging, were considered. Additionally, the time from admission to CT imaging and from intubation to CT imaging were considered. Systolic blood pressure, heart rate, respiratory rate, body temperature, and SpO2 were considered as vital signs on admission. For ventilation-related information, we examined airway pressure at intubation, airway pressure at intubation > 26.5 cmH2O, maximum airway pressure during hospitalization, maximum positive end-expiratory pressure (PEEP) during hospitalization, and tidal volume during hospitalization. Medical history included the presence or absence of maintenance dialysis, presence or absence of hypertension, and presence or absence of diabetes mellitus. The Coronavirus Clinical Characterisation Consortium (4C) mortality score, Acute Physiology and Chronic Health Evaluation (APACHE II) score, Sequential Organ Failure Assessment (SOFA) score, and Simplified Acute Physiology Score (SAPS II) were evaluated for the prognostic score. For laboratory findings on admission, we examined the patients’ white blood cell (WBC) and platelet counts, and creatinine, bilirubin, blood urea nitrogen (BUN), and brain natriuretic peptide (BNP) levels. Also we examined their blood gas analysis on admission, including the PaO2/FIO2 (P/F) ratio, pH, and HCO3–. Other variables that were considered are as follows: maximum C-reactive protein (CRP) level during hospitalization, prone position during hospitalization, tracheostomy during hospitalization, and presence of bimodal inflammatory changes (i.e., CRP improves once and falls below 4, but the inflammatory response of CRP = 10 or higher persists) during hospitalization. The primary outcomes were death or ECMO management. The secondary outcome was the comparison of the prognostic value of CT scores between emergency physicians and radiologists. The study was approved by the Ethics Committee of Yokohama City University (B200200048) and all patients provided written informed consent. The research was performed in accordance with the Declaration of Helsinki.

Clinical workflow and disease staging

All patients underwent routine blood tests and arterial blood gas (ABG) testing. Patients with pneumonia on CT and reservoir mask with oxygen 7 L/min or higher to maintain SpO2 93% were intubated and ventilated. Patients who required ICU or ventilator management were classified as having severe COVID-19 in accordance with the guidelines of the Japanese Ministry of Health, Labour, and Welfare10. All COVID patients were treated with remdesivir (200 mg/day on day 1 and 100 mg/day on days 2 through 10) as an antiviral drug, dexamethasone 6.6 mg/day for 10 days as a steroid, and continuous heparin 10,000 U/day as an anticoagulant. Ventilator management was limited to a maximum PEEP of 15 cmH2O, and airway pressure did not exceed 30 cmH2O. The lung protection strategy aimed at a tidal volume of 6–8 mL/kg, and deep sedation and muscle relaxants were used if the patient presented with large excess breaths. If CT showed a strong image of pneumonia on the dorsal side and PaO2/FIO2 (P/F) was below 200, the patient was placed in the prone position. Venovenous extracorporeal membrane oxygenation (VV-ECMO) was introduced when oxygenation could not be maintained despite the above mentioned respiratory management. The indications for VV-ECMO were as follows: patients with hypoxemia with FIO2 ≥ 0.8 and P/F < 100, respiratory acidosis with pH ≤ 7.2 and plateau pressure > 32 cmH2O, Murray score > 3, or in the prone position despite treatment intervention for the original disease, lung protection strategy + high PEEP strategy and prone therapy, and poor response to therapy.

CT score

Radiological terms such as GGO, crazy paving pattern, and consolidation were defined based on the definitions by the Fleischner Society for Chest Imaging8. In all cases, the CT was read using the axial mode only. The lung area was divided into three zones from the apex to bottom. The upper lung zone extended from the pulmonary apex to the bronchial bifurcation, the middle from the bronchial bifurcation to the right inferior pulmonary vein, and the lower from the right inferior pulmonary vein to the bottom. Thus, both lungs were separated to the six lung zones. Severity scores were calculated as follows, considering the extent of anatomical lesions: 0, no involvement; 1, less than 5% involvement; 2, 6–25% involvement; 3, 26–50% involvement; 4, 51–75% involvement; and 5, > 75% lesions. The resulting CT score was the sum of the scores of the six lung zones (the total score ranged 0–30) (Fig. 1).

Figure 1
figure 1

CT scores and axial images by lung zone involvement in COVID-19 pneumonia. The lung area was divided into three zones from the apex to bottom. The upper lung zone extended from the pulmonary apex to the bronchial bifurcation, the middle from the bronchial bifurcation to the right inferior pulmonary vein, and the lower from the right inferior pulmonary vein to the bottom.

To evaluate the level of agreement, we calculated Cronbach's Alpha between the following pairs: emergency physicians versus radiologists with all patients’ data; two emergency physicians with all patients’ data; two radiologists with all patients’ data; and two emergency physicians with mild to moderate COVID patients. A value ≥ 0.90 indicates excellent consistency and ≥ 0.80 indicates good consistency.

Statistical analysis

Continuous variables were expressed as median (quartiles), and categorical variables as frequency (%). To compare two divided groups based on outcomes, Mann–Whitney test was used for a continuous variable and Fisher’s exact test for a categorical variable. Receiver operating curve (ROC) analysis was used to predict prognosis. To identify factors associated with the outcomes, we used a multiple logistic regression model with forward selection, controlling for CT score of 16.7 or higher, age, weight, and sex. The following factors were analyzed by forward selection: time from onset of illness to oxygen administration, time from onset of illness to tracheal intubation, presence of maintenance dialysis, presence of hypertension, presence of diabetes, maximal CRP during hospitalization, BNP on admission, presence of prone therapy, presence of tracheostomy, presence of bimodal inflammatory response, presence of GGO, presence of crazy paving pattern, presence of consolidation, 4C mortality score, APACHE II score, SOFA score, SAPS II, and presence of airway pressure of 26.5 cmH2O or higher at the time of intubation. Among these variables, 4C mortality score, airway pressure > 26.5 cmH2O at intubation and maximum CRP during hospitalization were selected. Statistical significance was set at p < 0.05. We used Stata 13 software (Stata Statistical Software: Release 13. College Station, TX: StataCorp LP) for the statistical analysis.

Ethical approval and consent to participate

The study was approved by the Ethics Committee of Yokohama City University (B200200048). All the patients provided informed consent to participate in the study.

Results

Population, clinical presentation, and laboratory findings

Twelve patients (16.9%) died or were placed on ECMO and 59 (83%) survived on ventilator management alone. Age was 62.5 versus 61 years (p = 0.406) in the mortality or ECMO and survival groups, weight was 67.3 versus 72 kg (p = 0.890), and sex (male) was 75 versus 79.6% (p = 0.718), with no statistical difference between the two groups. As for the vital signs on admission, systolic blood pressure was 125 versus 135 (p = 0.228), heart rate was 75 versus 89 (p = 0.544), respiratory rate was 25 versus 25 (p = 0.993), body temperature was 37.5 versus 37.2 (p = 0.460), and SpO2 was 93.5 versus 94 (p = 0.847), with no statistical difference between the two groups. The time from disease onset to oxygen administration was 8 versus 6 days (p = 0.082), from onset to hospitalization, 7.5 versus 6 days (p = 0.089), from onset to tracheal intubation, 10 versus 8 days (p = 0.187), and from onset to CT scan, 7.5 versus 6 days (p = 0.119). Observations obtained from the ventilator showed that the airway pressure at intubation was 25.5 versus 23 cmH2O (p = 0.150) and ROC was 0.631 (0.438–0.824) for predicting death or ECMO group by airway pressure at intubation, with a cut-off value of 26.5, sensitivity of 50%, and specificity of 81%. The maximum airway pressure during the admission course was 29 versus 24 cmH2O (p < 0.0001), airway pressure higher than 26.5 cmH2O at intubation was 50 versus 18.6% (p = 0.020), the maximum PEEP required was 14.5 versus 12 cmH2O (p = 0.057), and the single ventilation rate was 6.47 versus 6.96 mL/kg (p = 0.673). There was no statistical difference in patients' history of pre-existing medical conditions: 8.3 versus 10.2% had maintenance dialysis (p = 0.846), 50 versus 42.4% had hypertension (p = 0.627), and 66.7 versus 42.4% had diabetes mellitus (p = 0.124). There was no statistical difference in prognostic scores: 12 versus 11 for the 4C mortality score (p = 0.110), 12 versus 10 for the APACHE II score (p = 0.239), 4 versus 4 for the SOFA score (0.225), and 33.5 versus 31 for SAPS (p = 0.213). For the laboratory findings on admission, WBC was 9300 versus 7600 (p = 0.240), platelet count was 21.15 versus 19.7 (p = 0.914), creatinine was 0.785 versus 0.75 (p = 0.213), bilirubin was 0.55 versus 0.5 (p = 0.703), BUN was 27 versus 20 (p = 0.113), and BNP was 21.8 versus 25.2 (p = 0.920), with no statistical difference between the two groups. Lastly, for the blood gas analysis on admission, P/F ratio was 158.4 versus 148.2 (p = 0.724), pH was 7.351 versus 7.436 (p < 0.0001), and HCO3- was 23.45 versus 24.6 (p = 0.184). Other factors were as follows: prone therapy was performed in 41.7 versus 22% (p = 0.152), bimodal inflammatory response was observed in 58.3 versus 42.4% (p = 0.311), tracheostomy was required in 25 versus 25.4% (p = 0.975), and the maximum CRP level during hospitalization was 24.3 versus 12.2 mg/dL (p = 0.012) (Table 1).

Table 1 Characteristics of patients at baseline.

CT features and disease scoring

CT score was calculated as readings by two emergency physicians (mean of each score) and two radiologists (mean of each score). Of the 71 cases included in the study, 12 (16.9%) were in the death or ECMO management group, and the CT score of the emergency physicians were 17.75 and 13 for the death or ECMO group and the survival group, and 21.7 and 18 for the radiologists, respectively. The prediction of death or ECMO group by emergency physicians’ CT score was ROC of 0.718 (0.561–0.875) with a cut-off value of 16.75, sensitivity of 67%, and specificity of 76%. The death or ECMO versus survival group (median [quartiles]) had a CT score of 17.75 (14.75–20) versus 13 (11–16.5), p = 0.017. The radiologist CT score predicted death or ECMO group with ROC of 0.681 (0.518–0.844) with a cutoff value of 19.75, sensitivity of 75%, and specificity of 68%. The death or ECMO versus survival group (median [quartiles]) had a CT score of 21.7 (19.5–22.7) versus 18 (15–21), p = 0.048. The prediction of death or ECMO group by the emergency physicians and radiologists was ROC 0.718 (0.561–0.875) versus 0.681 (0.518–0.844), p = 0.238, with no difference between the two groups. For the prognostic score, 4C mortality score predicted death or ECMO with an ROC of 0.646 (0.494–0.797) and cutoff value of 10.5, sensitivity of 83%, and specificity of 47%; APACHE II score predicted death or ECMO with an ROC of 0.608 (0.454–0.761) and cutoff value of 9.5, sensitivity of 83%, and specificity of 46%; SOFA score predicted death or ECMO with an ROC of 0.607 (0.435–0.779) and cutoff value of 5.5, sensitivity of 42%, and specificity of 73%; and SAPS II predicted death or ECMO with an ROC of 0.614 (0.459–0.769) and cutoff value of 30, sensitivity of 92%, and specificity of 46% (Fig. 2).

Figure 2
figure 2

Performance of CT score and prognostic score for predicting death or ECMO management. The figure on the left shows the ROC curve for the emergency physicians versus radiologists’ results. The figure on the right shows the ROC curve for the 4C mortality score, APACHE II score, SOFA score, and SAPS II versus the CT score of emergency physicians. AUC, area under the curve; CI, confidence interval; EP, emergency physician; RD, radiologist; 4C, Coronavirus Clinical Characterisation Consortium; APACHE, Acute Physiology and Chronic Health Evaluation; SOFA, Sequential Ogan Failure Assessment; SAPS, Simplified Acute Physiology Score.

CT image characteristics such as GGO, crazy paving pattern, and consolidation were read for the death or ECMO group and the survival group; GGO was 100 versus 100%, crazy paving pattern was 91.7 versus 78% (p = 0.040), and consolidation was 100 versus 72.9% (p = 0.277) between the two groups. In each zone of the CT scores of emergency physicians, the upper right was 3 versus 2 (p = 0.068), middle right 3 versus 2.5 (p = 0.176), lower right 3.25 versus 2.5 (p = 0.020), upper left 2.5 versus 1.5 (p = 0.205), middle left 2.25 versus 2 (p = 0.174), and lower left 3 versus 2.5 (p = 0.011), respectively. In each zone of the CT score of radiologists, the upper right was 3.75 versus 3 (p = 0.050), middle right 4 versus 3 (p = 0.108); lower right 4 versus 3 (p = 0.136); upper left 2 versus 2 (p = 0.759); middle left 3.75 versus 3 (p = 0.062); and lower left 4 versus 3 (p = 0.123) (Table 2).

Table 2 Frequency of involvement of each section with related CT score and main patterns.

The chest X-rays of patients with severe COVID-19 were also checked for pneumonia with consolidation, and it was found that 33% had bilateral consolidation, 41% had unilateral consolidation, and 25% had no consolidation. We obtained good or excellent consistency in all agreement evaluations (Table 3).

Table 3 Agreement in scores between paired evaluators.

Multivariate analysis on death or ECMO revealed that the odds ratio (95% CI) for CT score 16.7 or higher was 8.762 (1.114–68.865), p = 0.039. For airway pressure 26.5 cmH2O or higher at intubation, odds ratio was 21.460 (1.627–282.957), p = 0.020. The odds ratio for maximum CRP was 1.125 (1.020–1.241), p = 0.018 (Table 4).

Table 4 Logistic analysis of clinical and CT features for COVID-19 pneumonia for death and ECMO.

CT score including moderate COVID 19 pneumonia

We evaluated the CT scores including those of patients who were hospitalized with mild to moderate COVID pneumonia without tracheal intubation during the period of the current study and who underwent CT on admission.

Of the 42 cases who were classified to have moderate COVID pneumonia in the study, 42 (37.5%) were in the not intubation group, and the CT score of the emergency physicians were 5.75 and 14.5 for the not intubation group and intubation group, respectively. The prediction of intubation by emergency physicians’ CT score had an ROC of 0.927 (0.882–0.972) with a cut-off value of 10.25, sensitivity of 83%, and specificity of 98%. The not intubation group versus intubation group (median [quartiles]) had a CT score of 5.75 (2–8.5) versus 14.5 (11.5–17.5), p < 0.0001. The prediction of death or ECMO group by emergency physicians’ CT scores had an ROC of 0.826 (0.719–0.932) with a cut-off value of 14.25, sensitivity of 83%, and specificity of 74% (Fig. 3; Table 5).

Figure 3
figure 3

Performance of CT score for predicting intubation and death or ECMO management. The ROC curve shows the emergency physicians’ CT score for predicting intubation and death or ECMO management. AUC, area under the curve; CI, confidence interval; EP, emergency physician.

Table 5 Intubation, frequency of involvement of each section with related CT score, and level of agreement.

Discussion

COVID-19 is not likely to be severe, but severe cases of pneumonia often lead to death2. Because CT is very useful in determining the severity of the disease, various scoring methods using CT have been used, and it has been found that higher scores are related to the severity of the disease7,8,11,12,13,14,15,16,17,18. Three key points influence the scoring by CT. The first is who reads the CTs. Reports on CT score often indicate that radiologists read the CTs. In most cases, radiologists with approximately 10 years of experience were selected, but in many cases, there were at least two radiologists with 3–18 years of experience12,13,19,20,21. The second is how to determine the region of the lung to be scored. Most often, the lungs are read in five lobes (three right and two left lobes) along the anatomical region7,8,9,11,12,13,14,15,16. A different way to read the lungs is to divide them into three zones: upper, middle, and lower. The upper section is from the apex of the lung to the bronchial bifurcation, the middle section is from the bronchial bifurcation to the right inferior pulmonary vein, and the lower section is from the right inferior pulmonary vein to the diaphragm, for a total of six locations on both sides17,18. In addition to this division, there is another method in which the lungs are divided into anterior and posterior sections for a total of 12 regions for reading19. The third is the method of scoring the divided regions. Most of the time, each area is scored out of 5 points: 0, no pneumonia; 1, 1–5% pneumonia; 2, 6–25% pneumonia; 3, 26–50% pneumonia; 4, 51–75% pneumonia; and 5, > 75% pneumonia7,8,11,12,13,15. A 4-point scale not separated by 5% (0, 0%; 1, 1–25%; 2, 26–50%, 3, 51–75%; 4, > 75%)9,14,19, is more convenient than a 2-point scale (0, 0%; 1, 1–50%; 2, > 50%)20,21.

These three points have both advantages and disadvantages. Regarding the first point, it is definitely advantageous to be sure that radiologists will be reading the data. However, the disadvantage is that they are not necessarily stationed at the hospital where the COVID patients are admitted. If the scoring is not available to the physician who will examine the patient and read the CT at that time, the advantage is lowered. Regarding the second point, the area of the lungs, there are many methods that read the lungs in five lobes, but this reading has the disadvantage of not equally dividing the left and right sides of the lungs. In general, the right lung is divided into three lobes (upper, middle, and lower lobes), and the left lung into two lobes (upper and lower lobes), each of which is not equal in size22. Therefore, scoring each lobe equally by a factor of 5 may not represent the extent to which the lungs are actually affected by pneumonia. Finally, regarding the third point, it is good to score the absence of pneumonia as zero; however, it is difficult to determine the range within which the points should be separated. If one believes that a patient with pneumonia is more likely to have a mild case if the range is small, one might divide the points between 1 and 5%, or if one believes that a patient with pneumonia in the range of less than 25% is not different from a patient with pneumonia in the range of less than 5%, one might not divide the points between 5% and 1–25%, since fewer points are better. The scoring might be 1–25%, instead of 5%. Although advantages and disadvantages exist for each point, all scoring systems associate higher scores with a worse prognosis.

All our patients had severe COVID-19 pneumonia and were intubated and required ventilator management. We used a 3-zone technique (six zones in total) with a score distribution of 0–5, so that the left and right sides were equally divided23. Among severely ill patients, a higher CT score was more likely to result in death or ECMO introduction. CT images showed a higher rate of crazy paving pattern and consolidation in the death and ECMO groups. On multivariate analysis, our results showed that patients with airway pressure higher than 26.5 cmH2O and CT score higher than 16.7 were significantly more likely to die or receive ECMO.

This suggests that the CT score is useful in predicting the prognosis of patients with COVID-19 pneumonia, but not every facility has radiologists. Therefore, it is difficult for physicians, who are not radiologists, to assign CT scores. Emergency physicians, who often see COVID-19 pneumonia in the initial treatment, were asked to read the case and compare their prognostic predictions with those of radiologists. There was no statistical difference in prognostic prediction between emergency medicine specialists and radiologists with ROC 0.718 (0.561–0.875) versus 0.681 (0.518–0.844), p = 0.238. This suggests that even emergency physicians may find it useful to use CT score as a prognostic predictor. However, a comparison of the two scores showed that the radiologists scored the CT images higher than the emergency physicians (Table 2). This could be due to the difference in the reading of the GGO between normal lungs and the lungs with pneumonia. If crazy paving pattern or consolidation was present, it was easy to assume that there was pneumonia in that area. The reason for the higher score given by the radiologists was thought to be the difference in the reading of the frosted shadows. The emergency physicians did not consider a small amount of frosted shadows to be pneumonia, whereas the radiologists did, which may have contributed to the higher scores. The presence or absence of a crazy paving pattern or consolidation affects the severity of the disease, which can be read by emergency physicians; therefore, there was no difference in prognosis between emergency physicians and radiologists.

We included cases of mild to moderate COVID-19 because the CT score can also predict whether intubation is necessary. The scores varied significantly that even emergency physicians could easily determine the need for intubation. However, in this study, it was found to be useful specifically in determining whether ECMO is needed in severe COVID cases. This is because severe COVID pneumonia has various presentations, which makes it difficult to determine the severity of the disease. In addition, there are ICUs where ventilator management is available, but ECMO is not. In such cases, there is the advantage of a faster decision to transfer the patient to a facility where ECMO can be used.

However, not every facility is staffed by radiologists and is equipped to introduce ECMO 24 h a day. COVID is a pandemic, and it is not always possible to treat patients in hospitals equipped with such facilities. A complementary method is a deep-learning reading system, the usefulness of which has been previously reported24. We conducted this study in the hopes that our generated CT score could be a tool that could help in situations where it was not possible to prepare such a score. The present results suggest that the CT scoring of COVID patients is important and useful for determining future treatment strategies.

Limitations

This study has several limitations. First, the use of selected representative axial CT images may not allow an accurate assessment of pulmonary opacity if the distribution of lesions in each lung lobe is unbalanced. Second, because we were responsible for severe COVID, not all cases were evaluated by CT under the same conditions, as CT may have been performed at the transfer site. Third, this study was conducted only in severely ill patients and not in moderately ill patients. Fourth, it is important to note that long-term mortality and post-discharge prognostic symptoms were not considered. Further studies are needed to determine whether the CT score derived from representative CT images can predict the long-term prognosis of patients after discharge.

Conclusions

A higher score on our generated CT score could predict the likelihood of death or ECMO management. The CT score also showed no difference in the predictive ability between radiologists and emergency physicians, suggesting that it can be used by non-radiologists. A CT score at the time of admission allows for early preparation and transfer to a hospital that can manage patients who need ECMO.