Given the profound impact of the COVID-19 pandemic, research on COVID-19 has become an important priority across the world, especially as it relates to prediction of outcomes1. Beyond patient demographics and traditional clinical characteristics, social factors have emerged as important predictors of outcomes. For example, prior work from our group and others have shown that higher social deprivation index (SDI) and deprived living environment are associated with hospitalization for COVID-192,3, with mortality and other poor outcomes both in the US and in the UK.

The SDI is a neighborhood-level marker of social disadvantage related to a dearth of health care resources4. SDI is especially important in the United States because individuals from regions with higher SDI (more disadvantaged) have higher risk of disease and often experience limited access to care, resulting in an unmet need in healthcare and poor patient outcomes4. High SDI is also associated with worse post-hospitalization outcomes in myriad diseases, and has been postulated to result from limited resources for recovery, limited continuity of care, and a greater burden of comorbidities5. While it is well-known that SDI plays an important role on processes prior to a hospitalization and after hospitalization5, its impact on in-hospital processes is less well-characterized.

Given emerging evidence regarding the influence of SDI and other social risk indicators on multiple aspects of the COVID-19 pandemic2,6, we sought to examine the importance of SDI on the prediction of in-hospital COVID-19 outcomes including intubation and in-hospital mortality. Understanding the influence of SDI on in-hospital outcomes’ prediction could provide important insights on care delivery during a pandemic, and potentially identify an important source of significant disparities for in-hospital outcomes of COVID-19. To address this gap in knowledge, we leveraged one of the largest electronic health record (EHR) datasets of hospitalized patients, derived from three major health systems in New York City (NYC)7 one of the first epicenters—during multiple phases of the pandemic (from March 1, 2020 to February 8, 2021).


Inclusion criteria were: (1) adults (≥ 18 years of age) (2) confirmed COVID-19 by positive RT-PCR test or ICD-10 diagnosis (3) admission to emergency department (ED) or hospital between March 1, 2020 and February 8, 2021. Patients living in a nursing home prior to their index presentation were excluded as zip codes in EHR may not represent their residence. The resulting cohort included 30,016 unique patients with confirmed COVID-19.

Exposure: social deprivation index

We linked clinical data using patient’s residential zip-code with social data at zip-code tabulation area (ZCTA) to compute the Social Deprivation Index (SDI)8 for 2020 using publicly available sources9. SDI is a composite of six socioeconomic characteristics (income, education, employment, housing, household characteristics and transportation) determined at the ZCTA level. We mapped patients’ residential zip codes onto ZCTAs. We categorized all ZCTAs into quintiles based on SDI score.


The two outcomes of interest were in-hospital intubation and in-hospital mortality. Intubation was defined as mechanical ventilation during hospital stay based on the presence of relevant orders and procedure codes. In-hospital mortality was defined as deaths that occurred during the hospitalization recorded in hospital EHR or reflected in the Diagnosis Related Group.

Patient characteristics

We examined demographics, baseline comorbidities, and vital signs at admission. Demographics included age, sex, race (White or non-White), and ethnicity (Hispanic or non-Hispanic). Established diagnosis codes10 were used to identify baseline comorbidities including hypertension, diabetes, coronary artery disease, heart failure, chronic obstructive pulmonary disease, asthma, cancer, obesity, and hyperlipidemia. Vital signs that were robustly captured by the participating health systems included systolic and diastolic blood pressure, and Body Mass Index (BMI) at admission.

Statistical analysis

To predict the two binary outcomes (intubation and mortality), we considered two methods—a logistic regression and random forests (RF). First, we constructed a sequence of models with logistic regression to evaluate the incremental effect of each group of predictors. Model 1included demographics, comorbidity, and vital signs; we added SDI quintile to construct Model 2, added time since the start of the pandemic to construct Model 3, and finally added an SDI by time interaction to construct Model 4.

Similar models (except Model 4) were constructed with Random Forests, a machine learning algorithm that automatically models complex interactions between predictors. Model performance was estimated by Area Under the Receiver Operating Characteristic curve (AUROC) using a five-fold cross-validation and its 95% confidence interval reported. Missing data in predictors (range 3.6–12%) were imputed with random forest11 to produce an imputed dataset.

Ethical Information

Weill Cornell IRB (#20-04021948) approved this study and determined that this study meets exemption requirements at HHS 45 CFR 46.104(d). All data management and analysis were conducted in a manner that is HIPAA-compliant.


Patient characteristics

Among N = 30,016 COVID-19 patients, the median Inter-quartile range (IQR) age was 59.5 (43.2–72.4) years, 50.8% were males, 63.5% were non-White race and 36.4% had Hispanic ethnicity. The most common comorbid conditions were hypertension (53.6%), hyperlipidemia (38.6%), and diabetes (32.9%) (Table 1). Compared to the group with the lowest SDI (1st quintile), the group with highest SDI (5th quintile) had a higher proportion of non-White race and Hispanic ethnicity, had higher prevalence of each of the comorbid conditions, and presented to the hospital earlier in the pandemic.

Table 1 Baseline characteristics of hospitalized Covid-19 patients included in the study across SDI (Social Deprivation Index) quintiles.


In a logistic regression model, Model 1 (demographics, comorbidity, and vitals) predicted intubation with moderate accuracy (AUROC = 0.73; 95% CI 0.70–0.75). The addition of SDI in Model 2 did not improve accuracy (AUROC = 0.73; 95% CI 0.71–0.75). The addition of time in Model 3 increased accuracy (AUROC = 0.78; 95% CI 0.76–0.79) compared to Model 2. The addition of an interaction between SDI quintiles and time did not improve prediction (AUROC = 0.78; 95% CI 0.76–0.79). Results from the RF showed similar results (Fig. 1).

Figure 1
figure 1

Fivefold cross validated AUROCs of four models using random forest (blue) and logistic regression(black). Model 1 included demographic (gender, race, age, ethnicity), vital signs (BMI, systolic and diastolic blood pressure) and comorbidities (see Table 1); Model 2 added SDI quintiles to Model 1; Model 3 added time (weeks since March 1st, 2020) to Model 2 and Model 4 (logistic regression only) added time x SDI quintile interaction to Model 3.

In-hospital mortality

In a logistic regression model, Model 1 (demographics, comorbidity, and vitals) predicted mortality accurately (AUROC = 0.80; 95% CI 0.79–0.82). The addition of SDI in Model 2 did not improve prediction (AUROC = 0.81; 95% CI 0.79–0.82). The addition of time in Model 3 increased accuracy (AUROC = 0.84; 95% CI 0.82–0.85) compared to Model 2. The addition of an interaction between SDI quintiles and time did not improve prediction (AUROC = 0.84; 95% CI 0.82–0.85). Results from the RF showed similar results (Fig. 1).


This study, of over 30,000 patients hospitalized for confirmed COVID-19 in NYC, found that neither SDI nor its interaction with time provided incremental value in predicting in-hospital intubation or death. This suggests that SDI based on a patient’s neighborhood did not influence outcomes once hospitalized for COVID-19 beyond known clinical risk factors, and this did not change over the course of the pandemic.

The importance of social determinants of health has garnered a spotlight in the United States over the past couple years in the setting of national events including COVID-1912. Our group previously showed that SDI was associated with hospitalization for COVID-19 and all-cause mortality; but did not consider in-hospital events2. This study extends prior findings by indicating that SDI does not predict adverse events beyond demographic and clinical predictors once medical attention is sought.

Our findings are reassuring given concerns about the impact of implicit bias related to social determinants of health on provision of care and associated outcomes during the COVID-19 pandemic13,14. Concerns about implicit bias were especially relevant at the peak of the pandemic during which some hospitals had to make plans for rationing care15,16. Our findings reveal that SDI did not have a major influence on in-hospital outcomes at any point of the pandemic including at the peak. Given these observations, interventions to address disparities as they relate to SDI should focus on the community rather than the hospital. For example, increased efforts to improve vaccinations rates in high SDI-regions may be especially important to improve outcomes for vulnerable populations. With emerging data about the long-term sequelae of COVID-1917, lack of healthcare resources may exacerbate negative impact of “long COVID, suggesting the potential utility of additional resources (e.g. paid sick-leave, housing support etc.) for COVID-19 survivors living in high SDI-regions.”

The strengths of this study are—first, inclusion of several major health systems in NYC, making our findings more generalizable than prior studies based on single institutions; second, the study time-period of 1-year since the beginning of the pandemic, thereby capturing the evolution of clinical knowledge and experience, practice habits, and evidence for therapies over the course of the pandemic; third, use of logistic regression and random forest to model high-level interactions that may not otherwise be easily discerned.

The important limitations are—first, findings are limited to NYC, and may not be generalizable to other regions of the country; second, SDI measures neighborhood level rather than individual level social disadvantage. Consequently, some individual patients may have more or less disadvantage than their neighborhood SDI; third, the complex interplay between SDI, overall health and COVID-19 infection makes risk estimates of predictors biased (collider bias)18, therefore we focus on prediction accuracy only; finally, clinical predictors were limited to those robustly captured across all health systems and we did not have data on some known predictors (e.g. respiratory rate) and health behaviors (e.g. diet).


SDI did not provide incremental improvement in predicting in-hospital intubation or mortality beyond known demographic and clinical predictors. SDI likely plays an important role on who acquires COVID-19, and its severity; but once hospitalized, SDI appears less important. Future interventions to address SDI-related disparities should focus on improving health of the community before acquiring COVID-19, such as through vaccination efforts.