Predefined and data driven CT densitometric features predict critical illness and hospital length of stay in COVID-19 patients

The aim of this study was to compare whole lung CT density histograms to predict critical illness outcome and hospital length of stay in a cohort of 80 COVID-19 patients. CT chest images on segmented lungs were retrospectively analyzed. Functional Principal Component Analysis (FPCA) was used to find the main modes of variations on CT density histograms. CT density features, the CT severity score, the COVID-GRAM score and the patient clinical data were assessed for predicting the patient outcome using logistic regression models and survival analysis. ROC analysis predictors of critically ill status: 87.5th percentile CT density (Q875)—AUC 0.88 95% CI (0.79 0.94), F1-CT—AUC 0.87 (0.77 0.93) Standard Deviation (SD-CT)—AUC 0.86 (0.73, 0.93). Multivariate models combining CT-density predictors and Neutrophil–Lymphocyte Ratio showed the highest accuracy. SD-CT, Q875 and F1 score were significant predictors of hospital length of stay (LOS) while controlling for hospital death using competing risks models. Moreover, two multivariate Fine-Gray regression models combining the clinical variables: age, NLR, Contrast CT factor with either Q875 or F1 CT-density predictors revealed significant effects for the prediction of LOS incidence in presence of a competing risk (death) and acceptable predictive performances (Bootstrapped C-index 0.74 [0.70 0.78]).

www.nature.com/scientificreports/ Instead of using predefined formal lung CT density features such as MLA, skewness, or any quantile measurement, it would be ideal to interrogate the whole sample of CT histograms of patient lungs with the disease of interest (e.g.,  and see how the CT histograms vary in the patient cohort and eventually, which combinations of modes of variation are associated with the severity of the disease or the patient outcome. In the present study, we are using a statistical method called Functional Principal Component Analysis (FPCA) to explore the modes of variation of the lung CT density histograms in Covid-19 patients having either non-enhanced CT or contrast-enhanced CT and to extract new data-driven features for the prediction of patients critical-illness status, hospital length-of-stay and mortality. Performances were compared with a priori methods using CT attenuation quantile values from Q50 (median density) to Q875 (87.5th percentile density), mean lung attenuation (MLA), standard deviation, skewness, and kurtosis.

Results
Patient characteristics, clinical, and laboratory findings. Patient characteristics and clinical and laboratory findings are reported in Supplementary Table S1 online. The study included 80 patients (median age 63.5 years, 37 of whom were female). Most patients had at least one comorbidity (71%), from which 30% had 1 comorbidity 49% had 2-3 comorbidities and 21% had 4-5 comorbidities. The mean time between CT and critical illness was 3.7 days (SD 2.7). The mean death time was 24 days (SD 23 days). Critical illness status was defined as meeting one or more of these patient conditions: admission to ICU, requirement for mechanical ventilation, extracorporeal membrane oxygenation (ECMO) or death within 1 month of first presentation to hospital.
CT density curves analysis. The Functional Principal Component Analysis (FPCA) resulted in four Functional Principal Components FPCs explaining respectively 76.7% (F1) 13.5% (F2), 3.8% (F3) and 2.6% (F4) of the variability of lung CT density curves in the Covid-19 patients (see Fig. 1). F1 represents the main mode of variation of the lung CT density curves, from homogeneous low CT density (10th percentile curve) consistent with normal lungs density to the heterogeneous distribution of CT density with high densities (above − 100 HU) (90th percentile curve associated with extensive lungs consolidation and ground glass opacification) see Fig. 1a. Two-way Anova shows F1 significantly larger for both the 'critically-ill' outcome F(1,76) = 51.2-P < 0.001 and the contrast-CT condition F(1,76) = 6.70-P = 0.012. We expect the high F1 values to be strongly associated with unfavorable outcome ("critically ill" condition) in models stratified by contrast CT conditions.
The second mode of variation F2 represents different degrees of shift in CT density from normal homogeneous low lung CT density (90th percentile curve) toward a larger density range around -800 HU to -400 HU (10th percentile curve) see Fig. 1b. Interpretation: high F2 values seem associated with normal lungs while small F2 values seem associated with extensive lungs ground glass opacification but limited consolidation. We expect that the low values of F2 with extensive GGO but no significant consolidation to be moderately associated with unfavorable outcome. The third and fourth modes of variation are associated with small shifts toward higher CT densities: F3 + 46HU (25th percentile curve to 75th percentile curve) and F4 + 61HU (25th percentile curve to 75th percentile curve). See Fig. 1c,d.
Four score values called F1, F2, F3 and F4 quantifying for each patient the lung density curve (once for each mode of variation) were added to the list of existing features for predictive modeling.
CT density features analysis. The CT analysis findings (including a priori CT features and data driven feature F1) are summarized in Supplementary Table S2 (descriptive statistics), Table 1 (univariate analysis) and  Table 2 (multivariate analysis).
Patients with critical illness had significantly different CT density features (all P < 0.0001) with higher CT mean density (MLA), higher CT SD, higher Q875 (87.5th percentile), higher F1 (mean 0.326 vs. − 0.254), lower skewness and lower kurtosis (see Supplementary Table S2 online). Table 1 summarizes all the univariate feature performances (AUC) with optimal cutoff points using the Youden index criteria. Results for the three best CT predictors are stratified into contrast CT and non-contrast CT groups. Notice than for each of these features, the AUCs are slightly higher for the contrast enhanced group than the non-contrast group (non-significant differences). ROC analysis identified as best predictors of critically ill status: Q875-AUC 0.88 (0.79 0.94), F1-AUC 0.87 (0.77 0.93), SD-CT-AUC 0.86 (0.73, 0.93). Remarkably, the frequently used feature trio Mean CT (MLA), skewness and kurtosis showed lower performances than the three previous features: mean CT-AUC 0.84 (0.73 0.91), skewness-AUC 0.83 (0.73 0.90), Kurtosis-AUC 0.84 (0.73, 0.91) and were not retained in the final multivariate models.
Among the clinical variables shown in Table 1, the Neutrophil-Lymphocyte Ratio (NLR) appears the best predictor with AUC 0.74 (0.60 0.85) for the contrast cases and AUC 0.77 (0.56 0.91) for the non-contrast cases and was further successfully included in the final multivariate models. Main univariate ROC curves and AUCs are shown in Supplementary Fig. S1   www.nature.com/scientificreports/ Combined length-of-stay (LOS) and in-hospital mortality assessment. For SD-CT, Q875 and F1 predictors, the patient cohort was stratified in two groups using optimal cutoff points for combined contrast and non-contrast studies found in the univariate analysis: SD-CT ≥ 213.8 HU, Q875 > − 380 HU and F1 > − 0.099. Using Q875 feature (see Fig. 3-dashed lines) the cumulative incidence LOS at 30 days was: 89% (80%, 98%) for low Q875 group compared to 40% (24%, 56%) for high Q875 group. the Q875 based cumulative incidence for Death at 30 days was: 2.4% (0%, 7%) for low Q875 group 1 compared to 21% (8%, 34%) for high Q875 group 2. Both inter-group differences in cumulative incidences for LOS (P < 0.0001) and Death (P = 0.027) were significant.

Multivariate analysis of patient length-of-stay (LOS) with competing risks. The prediction of
the hospital length-of-stay (event: hospital discharge) in presence of a competing risk (death) was further investigated in a multivariate framework using the Fine-Gray regression (FGR) for competing risks 18 . Two multivariate models FGR-F1 and FGR-Q875 were selected using the stepwise backward variable selection adapted to the FGR models and further described in the statistical method section. Each of these models included the covariates: Contrast, Age and NLR with either F1 (FGR-F1 model) or Q875 (FGR-Q875 model). See Table 3. For performance comparison, the Reference model (no covariate) and the FGR-1 model without CT density variables (with only Age, NLR and Contrast) were also computed. See Supplementary Fig. S4. Both models FGR-F1 and FGR-Q875 revealed negative coefficients for F1, Q875, Age and NLR indicating that increased values of these covariates would decrease the incidence of the hospital discharge and thus would increase the hospital length of stay. Conversely, in both models the coefficients for Contrast (enhanced CT = 1) were positive, indicating an Table 1. Univariate (ROC) Analysis. Best variables (all cases, and cases with and without contrast) and other noticeable variables (Age, Lung volume and neutrophil lymphocyte ratio NLR). Cutoff points were calculated using Youden index. 95% confidence intervals. *Non-significant P value with test on AUC for continuous variables. **1 outlier removed. NC non contrast, C contrast, ROC receiving-operating characteristic. www.nature.com/scientificreports/ increased incidence for hospital discharge when the patient got a Contrast CT examination. In other words, the patients with contrast CT had a smaller hospital length-of-stay, all covariates adjusted. See Table 3.
Both the FGR-Q875 model and the FGR-F1 model had the same predictive performances: Bootstrap CV C-index 0.74 [0.70 0.78]. Similarly, using the alternative performance metric, the (Bootstrap resampled) Integrated Brier score (IBS) computed on the whole follow-up period: IBS was 0.16 for both FGR-Q875 and FGR-F1 models. For comparison, the IBS-reference model (no covariate) was 0.20 and IBS-FGR-1 model (no CT density covariate) was 0.19. The Supplementary Fig. S4 presents the prediction errors curves for each of these models. Both FGR-F1 and FGR-Q875 models have very similar performances during the 90 days of the follow-up period, confirming in a multivariate setting the similar predictive value of the CT-density variables F1 and Q875 already observed in the univariate analysis (see Fig. 3).
Semi-quantitative CT severity score and COVID-GRAM scores. The CT severity score was higher in critically ill patients vs. non critically ill patients with a combined mean score amongst both readers of 21 vs 15 (P = 0.002). Univariate AUC was 0.91 (0.80, 0.96) with Reader 1 and AUC 0.83 (0.73 to 0.91) with Reader 2.
The COVID-GRAM score performed poorly with an AUC value of 0.64 (0.52, 0.74). In fact, 79/80 (99%) of our patients were predicted to have medium or high risk for critical illness based on the clinical variables from their medical records compared to the actual value of 35/80 (44%).

Interrater reliability. Interrater reliability was tested by having a second investigator perform separate
Covid CT severity scoring, manual corrections on the lung segmentation (as needed) and measurements on a sample of 20 randomly selected patients.
Interrater reliability for the CT severity score measured with intra-class correlation (ICC) was: 0.90 (0.

Discussion
Parsimonious models combining only one quantitative lung CT density parameter (either SD-CT, Q875, or F1) and one clinical parameter (Neutrophil-Lymphocyte Ratio) allowed accurate prediction of critical illness (ICU, mechanical ventilation/ECMO and/or death) in Covid-19 patients with accuracy (AUC) ranging from 0.82 to 0.92, and prediction of hospital length-of-stay while controlling for the mortality risk. Remarkably, the performances were not adversely affected by the presence of IV contrast in the CT images and were even slightly better in the contrast enhanced group, although the lack of randomization for the IV contrast precludes evaluation of whether this factor leads to better predictions.  www.nature.com/scientificreports/ The three best CT parameters were Q875, F1 and SD-CT. SD-CT is a well-established a priori radiomic global feature related to the spread of the CT histogram largely available in commercial lung imaging software. In this study, the frequently seen parameters MLA, Skewness and Kurtosis 6 performed slightly worse than either SD-CT, Q875, or F1 (see Table 1) and were excluded from the final models. Q875 is another radiomic parameter related to the HU value reached when 87.5% of the lung voxels have been counted (starting with the lowest densities). Q875 is the counterpart for the high CT densities of the 15th percentile density index (PD15) used to quantify the severity of emphysema in lung CT densitometry. F1 parameter is a result of the histogram functional principal component analysis (FPCA) in the patient cohort. Without a-priori knowledge or information about the patient outcome, FPCA extracts the main modes of variation in the sample of CT histograms for the patient cohort. F1 (score) values represent the different degrees of CT histograms shift from homogeneous low lung densities (better outcome) toward heterogeneous much higher densities (worse outcome) see Fig. 1. In the current study, other modes of variations (F2, F3, etc.) were not predictive of the patient outcome. F2 seems to reflect the transition from normal lung densities to extended ground glass opacifications (about-800 HU to -600 HU) and it was not a significant predictor of critically ill status (AUC: 0.53, P:0.60). In this study, the overall results using FPCA are consistent with previous ones for pulmonary disease subtyping 19,20 or patient neurologic outcome prediction 21 confirming the value of the FPCA method: first as a non-specific data driven exploration tool, it offers interpretable modes of variations of the CT histograms in the patient whole cohort. Second, it is a generic method giving accurate predictors related to histogram variations without a priori knowledge or delicate radiomic high dimensional parameter selection. All three CT predictors are highly correlated: Spearman Rho Q875 vs. F1: 0.97, Q875 vs. SD-CT, Rho: 0.90, F1 vs. SD-CT: Rho: 0.92 and practically exchangeable. However, the data driven FPCA approach offers a unique data analysis tool of the CT histograms in the whole patient cohort. Current machine learning research is actively extending the FPCA method with supervised FPCA 22 , multivariate FPCA 23 (neuroimaging data), robust FPCA 24 , etc. offering a rich toolbox for future medical imaging studies. See also Pratt et al. 25 for a recent application in pulmonology.
The good performances of CT density features for patient outcome prediction in COVID patients are concordant with results from a few previous studies 10-12 including Lanza et al. 11 who showed that COVID patients requiring oxygenation and ventilation had higher amounts of compromised lung volumes (− 500 to 100 HU), statistically significant at 6-23% and greater than 23% respectively. Another large study by Colombi et al. showed www.nature.com/scientificreports/ that a percentage of well aerated lung on CT calculated by software of 71% (OR 3.8, 95% CI 1.9, 7.5, P < 0.001) or less was associated with ICU admission or death 12 . Lung volume is known to affect the lung CT density in a complex way: first, the optimal lung inflation is difficult to obtain in severe acute lung disease and spirometry-controlled lung CT is often not feasible. So, partially inflated lungs may increase the apparent lung CT density. Bressem et al. 10 have mentioned the potential Figure 3. Cumulative incidence plot for hospital length-of-stay (LOS) and in-hospital deaths in Covid-19 patients stratified in Low Q875 (resp. F1) patients and high Q875 (resp. F1) patients using an optimal cutoff point for combined contrast and non-contrast studies found in the univariate analysis: Q875 > − 380 HU (resp. F1 > 0.099). Both F1 and Q875 features give similar predictions of cumulative incidences for LOS and Death during the follow-up period. Table 3. Multivariate fine-gray regression (FGR) proportional hazards models FGR-F1 and FGR-Q875 for the subdistribution of a competing risk with primary outcome: hospital length-of-stay (LOS) and the competing risk (death). All variables in both models have significant effect on the cumulative incidence of the primary outcome (LOS) while controlling for the competing risk (death). Negative coefficients indicate that an increasing value of the variables (age, NLR, F1, Q875) is associated with a decreasing incidence of the hospital discharge (that is, an increasing LOS). Conversely, a positive coefficient (for the Contrast CT factor) indicates an increased incidence of hospital discharge (and reduced LOS). exp(Coef.) exponentiated coefficient, se(coef.) standard error of the variable coefficient. N = 80. www.nature.com/scientificreports/ confounding effect of the lung volume variation among patients when using CT density as a biomarker in Covid-19 patients. Second, the lung tissue density is increased with disease severity associated with extended GGO and consolidations. Third, lung CT density appears to be lower in subjects with larger lungs because of greater air spaces 26 . In this study, adding the lung volume feature did not improve the performances of the predictive models. However, a moderate but significant correlation has been observed for the CT parameters: Q875-Rho:  Fig. S3 shows the relationship between Q875 and lung volume for both patient outcome groups. The linear relationship between Q875 and log. Volume in the non-critically ill group and for a large range of lung volumes may be best explained with Robert et al. hypothesis on lung CT density change with normal lung growth 26 . Moreover, the performance of the clinical predictor: Neutrophil-Lymphocyte Ratio (NLR) in either the univariate analysis (Table 1) or in the multivariate best logistic regression models (Table 2-Models 2-3-4) and multivariate models for LOS prediction (Table 2-Models FGR-875 & FGR-F1) supports the conclusion of the recent meta-analysis from Li et al. 27 pointing out the value of this biomarker to predict disease severity and patient mortality in Covid-19 patients.
Our quantitative CT density features were compared with both the COVID-GRAM score and CT severity score to predict critical illness. The CT severity score (Reader-1) alone performed well with AUC 0.91 (0.80 0.96) and intra-class correlation (ICC) 0.90 (0.85 0.94) as previously shown in prior studies 4,28 However, Reader-2 CT severity score was suboptimal and illustrated the inter-reader variability of subjective features. See for example Fig. 2 (odd ratios). The COVID-GRAM score performed poorly in our study to predict critical illness with AUC of 0.64 (0.52, 0.74) 95% CI. A possible explanation of this poor performance compared to the original Chinese study to develop the model by Liang et al. 5 is the presence of older patients (60.8 vs 48.9 years) and a higher prevalence of one of more pre-existing comorbidities (71.3% vs 25.1%) in our study. Remarkably, Al Hassan et al. 29 recently reported similar findings with an AUC of 0.64 for COVID-GRAM score for risk stratification with Covid-19 patients.
Hospital length of stay (LOS) and hospital mortality are mutually related and thus require a competing risks method for proper assessment of the cumulative incidence of each event of interest (discharge or death). Using this method, our study showed that the patient groups with Q875 > − 380 HU, F1 > 0.099 or SD-CT > 213.8 HU were all associated with significantly higher cumulative incidences for longer length of stays while controlling for the hospital mortality. Furthermore, our multivariate Fine-Gray models FGR-F1 and FGR-Q875 extend these results combining the significant covariates age, NLR, contrast CT factor and CT-density F1 or Q875 affecting the incidence of the event of interest (LOS) while controlling for the effect of the competing risk (death). Additionally, significant variables found to be associated with a patient outcome (LOS) in our FGR models can be used to develop individual prognostic scores for treatment adaptation. More generally, the prediction of LOS is valuable in capacity planning to provide accurate predictions of the number of beds required at each level of care.
This study has several limitations: it is retrospective and has a modest sample size resulting in larger confidence intervals or suboptimal statistical power when considering subgroup analysis (such as IV contrast vs. Non-Contrast CT) and prevents us to draw conclusions about the in-hospital mortality due to the low number of death events (14/80). The predictive accuracy results were computed with cross-validation correcting performance for overfitting. However, future work involving multiple sites would be necessary for testing the performances in a fully separated testing dataset.
Another limitation is that CT chest protocols varied based on the clinical indication with almost twice as many pulmonary angiogram studies as non-contrast studies, preventing us to better understand the role of CT contrast in the predictive performances.
Finally, the methods discussed in this study are focused on a global lung CT histogram analysis. Multithreshold lung density analysis methods such as those described in already mentioned studies 10,12,15 or more advanced CT density/texture methods based on local lung pattern classification 30 were not tested and should deserve future attention.
In conclusion, the extensive and diffuse changes in lung CT density affecting the whole lungs in COVID-19 pneumonia patients offered the opportunity to compare predefined and data-driven imaging features related to the lungs CT density histograms. All SD-CT, Q875 and F1 features could accurately predict both critical patient illness and hospital length-of-stay. Combined models with one of these features and the biomarker for inflammation Neutrophil-Lymphocyte Ratio gives the highest predictive performance. This application of CT densitometry provided similar results for both Non-enhanced CT group and the contrast enhanced group. The FPCA method allowed the unsupervised analysis of the lung density histograms in the whole patient cohort to extract interpretable CT density features with high predictive values. This approach may be considered for other predictive models with diffuse lung diseases.

Study population. This study was approved by the UHN Coordinated Approval Process for Clinical
Research (CAPCR) ethics committee for human research at our home hospital (CAPCR ID: 20-5446). All methods were conducted in accordance with guidelines outlined by this committee. Informed consent was waived due to the retrospective collection of patient data, and this was approved by the CAPCR UHN ethics committee. Inclusion criteria were adult patients ≥ 18 years of age with real-time reverse transcriptase polymerase chain reaction (RT-PCR) confirmed COVID-19 (positive after 1-3 tests) who had undergone a CT chest within 24 h of admission to hospital between March 1 and December 15, 2020 and who were not in the ICU or mechanically ventilated at the time of the CT study. The indication for CT included ruling out suspected COVID-19 with a www.nature.com/scientificreports/ non-enhanced CT chest and to assess for a pulmonary embolism with an enhanced CT pulmonary angiogram study in patients with confirmed COVID-19. Exclusion criteria were known malignancy with pulmonary nodules on CT (added density), incomplete clinical data and those with a known superimposed bacterial pneumonia. 502 patients were retrieved, with 87 of these patients with confirmed COVID-19 by RT-PCR. Two patients with known lung cancer and lung nodules, three patients with both COVID-19 and a superimposed bacterial pneumonia and two patients with incomplete medical history and blood work were excluded from the study (see Supplementary Fig. S5).
Patient characteristics on admission. Clinical and laboratory data documented at admission included age, sex, symptoms, blood work (neutrophils, lymphocytes, lactate dehydrogenase, bilirubin) and comorbidities (chronic obstructive pulmonary disease, hypertension, diabetes, coronary disease, heart disease, malignancy, kidney disease, cerebral vascular disease, hepatitis B and immunodeficiency).
Outcomes. The primary outcome was critical illness, defined as one or more of admission to ICU, requirement for mechanical ventilation or extracorporeal membrane oxygenation and death within 1 month of first presentation to hospital. Secondary outcomes were hospital length of stays and mortality.
Computed tomography imaging protocol. We analyzed both non-contrast low dose and normal dose CT chest studies and contrast enhanced CT pulmonary angiogram CT studies from three hospitals in our institution. All patients were examined with either 64-CT Aquilion or 320-CT Aquilion-One scanner (Canon Medical Systems, Otawara, Japan). Chest CT acquisitions parameters were 120 kV and 20-100 mA (low dose), 120 kV and 40-150 mA (normal dose) and 100-120 kV and 0-250 mA (contrast enhanced pulmonary angiograms) according to our hospital protocols. All images were reviewed in lung windows (width: 1200 HU, level: − 700 HU) and mediastinum windows (width: 350 HU, level: 40 HU) with 1-3 mm slice thickness. CT pulmonary angiogram studies administered 70 mL of iodinated contrast (Iopromide 370 mg I/mL) at a rate of 5 mL/sec using a bolus tracking technique triggered to 250 HU in the main pulmonary artery.
COVID Gram score. Clinical and laboratory data at time of hospital presentation was collected from the patients' electronic medical records to calculate the COVID-GRAM score 5 . These 10 parameters include age, dyspnea, conscious, hemoptysis, history of malignancy, number of comorbidities, X-ray abnormality, Neutrophil-Lymphocyte Ratio, lactate dehydrogenase and direct bilirubin. An online calculator was used to calculate a risk score and percentage and place the patient in a low, medium, or high-risk group to predict critical illness 31 .
CT severity score. Subjective assessment of the percentage of ground glass opacities and consolidations on CT chest was performed by two radiologists, one with 4 years of clinical experience as a staff thoracic radiologist, and one a cardiothoracic imaging fellow with 5 years of radiology residency experience. The lungs were evaluated as per the CT Severity Score guidelines 4 by assigning a score of 0-2 (0 = no opacity, 1 = < 50% opacity and 2 = ≥ 50% opacity) in each of the 10 segments in both lungs out of a total score of 40.
CT density analysis. Lung density measurements were performed using Vitrea Advanced Visualization version 7.14 (Canon Medical, Minnetonka, USA) software to perform automatic lung segmentation and calculate a mean CT density (Mean-CT) and standard deviation (SD-CT) density for both lungs combined (see Figs. 4,5). Large pulmonary vessels, airways, mediastinum structures and pleural effusions are excluded from segmentation, and lung parenchyma, interstitial structures and segmental vessels and bronchi were included. Manual correction of the lung segmentation was applied when needed. The primary density analysis was performed by a cardiothoracic radiology fellow, and a second analysis to determine inter-rater reliability on a random sample of 20 cases from the data set was performed by a clinical research analyst with 4 years of experience in cardiothoracic radiology.
CT density curves analysis. Each CT density histogram for both lungs and each lung were converted in smooth curves defined between -1000 HU and 500 HU using Ramsey's smoothing method for frequency distributions 32 and previously described 21 . Quantiles values on the CT histograms from median (50th percentile) to 85.7th percentile were computed and added to the feature list. A Functional Principal Component Analysis (FPCA) was applied to the lung CT density curves following Petersen & Müller's method for frequency distributions 33 . FPCA is a data driven approach akin to the Principal Component analysis (PCA) to explore and quantify the main modes of variation of a sample of curves. The resulting functional principal component scores (FPCs) for the lung region were added to the list of candidate predictors of the patient outcome including lung volume, demographic information and a priori CT density-based features: mean lung CT attenuation, standard deviation, skewness, kurtosis and eight quantile-based features.
Statistical analysis. Statistical analysis was performed using R statistical programming and MedCalc software. A P value of < 0.05 was considered statistically significant. Continuous variables were described using mean and standard deviation or median and interquartile range and categorical variables using numbers and percentage. Mann-Whitney tests and Fisher exact tests were used to compare continuous and binary variables.
Variance ratio F-test were used to compare inter-group variance differences. Inter-rater agreement was assessed using intra-class correlation (ICC). Logistic regression models were applied for prediction of critical illness. A univariate analysis of the predictors for critical illness was performed using a receiver operating characteristic www.nature.com/scientificreports/ (ROC) area under the curve (AUC) metric for continuous predictors and odds ratio (OR) for binary predictors. First quartile to third quartile differences were used for defining OR in non-categorical variables. Optimal cutoff points on ROC curves were determined using the Youden index method.
In the multivariate analysis, predictive accuracy (AUC) of the final logistic regression models were corrected for overfitting using a Bootstrap cross-validation method. Backward variable selection was performed separately for models with qualitative assessment variables (subjective reader scores) and those with quantitative CT Imaging feature combined with 1 clinical biomarker for critically ill outcome. Highly correlated CT density predictors SD-CT, Q875 and F1 were tested in separate models. See Table 2. Final model selection was based on Bootstrap overfitting corrected AUC.
The length-of-stay estimate for Covid patients have recently been addressed with numerous methods 34 . In this study, the hospital discharge time was used as primary end point for hospital length-of-stay (LOS) and the in-hospital death was considered as the competing risk, following Brock et al. approach and publicly available R code 35 . A cumulative incidence plot of hospital discharge was computed in R-programming language using the Aalen-Johansen estimator. The Greenwood-type method was used to estimate the standard errors and confidence intervals of our cumulative incidence plots. The prediction of the hospital length-of-stay (event: hospital discharge) in presence of a competing risk (death) was further investigated in a multivariate framework using the Fine-Gray regression (FGR) for competing risks 18 . FGR applies a proportional hazards model for the direct estimation of the hazard of the cumulative incidence function (CIF) or subdistribution for the primary event (hospital discharge). Quantitative variables of interest, that is, those with significant odd ratio for the ICU prediction (see Fig. 2-bold font) with the age and the CT Contrast factor were combined for a stepwise variable selection using the R-library 'crrstep' 36 . The Bayesian Information Criterion BICcr was used for model selection. BICcr is a modified version of BIC adapted for FGR in using the total number of (uncensored) primary events (since only these events contribute to information of the partial likelihood) as a penalty and allows for a more parsimonious model than AIC but with a less stringent penalty than BIC 36 . The performance of the Fine-Gray Regression models FGR-F1 and FGR-Q875 for the prediction of the hospital discharge (or equivalently the LOS) was evaluated using both Concordance index (C-index) and Brier score. The Brier score is a weighted average of the squared distances between the observed survival status and the predicted survival probability of a model. Performance assessments were computed with the R package 'PEC' 37 . www.nature.com/scientificreports/

Data availability
The R-code for the CT density histogram smoothing and for the functional principal component analysis used in the current study are available from the corresponding author.