Modelling COVID-19 severity in the Republic of Ireland using patient co-morbidities, socioeconomic profile and geographic location, February to November 2020

Understanding patient progression from symptomatic COVID-19 infection to a severe outcome represents an important tool for improved diagnoses, surveillance, and triage. A series of models have been developed and validated to elucidate hospitalization, admission to an intensive care unit (ICU) and mortality in patients from the Republic of Ireland. This retrospective cohort study of patients with laboratory-confirmed symptomatic COVID-19 infection included data extracted from national COVID-19 surveillance forms (i.e., age, gender, underlying health conditions, occupation) and geographically-referenced potential predictors (i.e., urban/rural classification, socio-economic profile). Generalised linear models and recursive partitioning and regression trees were used to elucidate COVID-19 progression. The incidence of symptomatic infection over the study-period was 0.96% (n = 47,265), of whom 3781 (8%) required hospitalisation, 615 (1.3%) were admitted to ICU and 1326 (2.8%) died. Models demonstrated an increasingly efficacious fit for predicting hospitalization [AUC 0.816 (95% CI 0.809, 0.822)], admission to ICU [AUC 0.885 (95% CI 0.88 0.89)] and death [AUC of 0.955 (95% CI 0.951 0.959)]. Severe obesity (BMI ≥ 40) was identified as a risk factor across all prognostic models; severely obese patients were substantially more likely to receive ICU treatment [OR 19.630] or die [OR 10.802]. Rural living was associated with an increased risk of hospitalization (OR 1.200 (95% CI 1.143–1.261)]. Urban living was associated with ICU admission [OR 1.533 (95% CI 1.606–1.682)]. Models provide approaches for predicting COVID-19 prognoses, allowing for evidence-based decision-making pertaining to targeted non-pharmaceutical interventions, risk-based vaccination priorities and improved patient triage.

Since the first reported national case on February 29th 2020, the Republic of Ireland, alongside much of the world, has endured three waves of COVID-19 infection, and numerous phases of non-pharmaceutical interventions including business, hospitality and school closures, stay at home orders, domestic travel restrictions, and nationwide lockdowns 1,2 . As of early April 2021, approximately 238,000 confirmed infections and 4718 deaths, respectively, have been reported, thus placing unprecedented pressure on critical care services 1 . The clinical manifestations of COVID-19 infection range from asymptomatic infection to pneumonia, which can progress to acute respiratory distress syndrome, multi-organ failure and, ultimately, death 3,4 . Globally, approximately 80% of reported cases are characterised by absent or mild symptoms, while 15-20% progress to severe pneumonia causing death in 1-5% of patients 5,6 .
Monitoring the clinical outcomes of patients diagnosed with COVID-19 is vital to understand the epidemiological and healthcare burden of SARSCoV-2, prioritise high-risk cases in the short term, and perhaps more importantly, provide a robust evidence-base for future public health emergency planning. Several risk factors have been statistically correlated with COVID-19 outcomes within the scientific literature, including age 7 , gender 8 , underlying chronic conditions 9 , race/ethnicity 10 , and occupation 11 . For example, a study cohort of 10,454 COVID-19 patients from Galicia (Spain) reports the presence of seven comorbidities (heart failure, hypertension, rheumatoid arthritis, COPD, asthma, obesity and diabetes) were associated with hospitalisation, three (liver disease, obesity and diabetes) with intensive care unit (ICU) admission, and six (lymphoma/leukaemia, heart disease, dementia, COPD, diabetes and chronic kidney disease) with death 4 . Likewise, a meta-analysis of over 3.1 million reported global cases indicates that male patients exhibit almost three times the odds of requiring ICU admission (OR = 2.84; 95% CI = 2.06, 3.92) and higher odds of death (OR = 1.39; 95% CI = 1.31, 1.47) compared to female patients 8 .
While the abovementioned studies leave little doubt as to the veracity of and necessity for prognostic modelling of COVID-19 outcomes, it is also important to consider the marked variation between regions and their background population health profile (i.e., comorbidity), socioeconomic profile, demographic distribution, and the complex interactions between these potential drivers of severe COVID- 19. Accordingly, the current study sought to develop a series of prognostic models to elucidate progression from symptomatic COVID-19 to hospitalization, intensive care and death in the Republic of Ireland. Several case-specific and geographically referenced predictors were employed for model training and testing, including age, gender, comorbidity profile, area-specific socioeconomic components, urban/rural classification and case classification (i.e., sporadic or cluster-associated).

Methods
Infection data. Confirmed and anonymised case data were obtained from the Computerised Infectious Disease Reporting (CIDR) database (http:// www. hpsc. ie/ CIDR/), an information system used for the collation of notifiable (communicable) infection data in Ireland 12 . For the purposes of clarity and comparability, only laboratory-confirmed, symptomatic cases have been included for analyses, that is cases associated with detection of SARS-CoV-2 nucleic acid or antigen in a clinical specimen (laboratory criteria), and exhibiting at least one of the following: sudden onset of cough or fever or shortness of breath or anosmia, ageusia or dysgeusia (clinical criteria) were included for analyses. Primary and secondary case classifications were included as potential predictors, with sporadic (i.e., not recorded as associated with a confirmed outbreak or cluster) and outbreak index cases (the first case identified as part of a recognised outbreak/cluster) were defined as primary cases, while all other known outbreak cases were defined as secondary cases.
All symptomatic COVID-19 cases with an "epi-date" occurring between 29th February and 30th November 2020 were included for analyses. Address level data had already been geocoded to Small Areas by the Health Service Executive (HSE)-Health Intelligence Unit. Research ethical approval for use of the COVID-19 dataset and associated analyses were granted by the National Research Ethics Committee for COVID-19-Related Health Research (NREC COVID-19) (Application number: 20-NREC-COV-061). All research methods including data processing and analyses were performed in accordance with relevant guidelines and regulations. As per conditions of the National Research Ethics Committee for COVID-19-Related Health Research, informed consent from all participants and legal guardians was waived, with data processing and analyses undertaken using irreversibly anonymised data.
Predictors. Comorbidity, underlying health and occupation. All comorbidities included in the "Underlying Clinical Conditions" section of the Health Service Executive (HSE) Health Protection Surveillance Centre (HPSC) COVID-19 Case Form 13 were extracted for analyses, as follows: -Chronic heart disease -Hypertension -Chronic neurological disease -Chronic respiratory disease -Chronic kidney disease -Chronic liver disease -Asthma requiring medication -Immunodeficiency, including HIV -Diabetes -Severe obesity (BMI ≥ 40) -Cancer/Malignancy The total case-specific comorbidity number was calculated and assigned on a case-wise basis. Data pertaining to an ongoing pregnancy and ≤ 6 weeks post-partum were extracted for all cases. While > 20 occupational classifications were used for reporting, a binary (YN) predictor was created, based on a recent Irish study 11 , to delineate those cases attributed to occupations in healthcare, as this represents a subset associated with particularly high exposure to infection and subsequent serial testing.
Urban/rural classification. A categorical Small Area (SA)-specific settlement type variable with three levels of measurement was developed using data obtained from the Irish Central Statistics Office (CSO). The CSO settlement type dataset 14  www.nature.com/scientificreports/ (1) to 'highly rural/remote areas' (6). The classification variable was coded such that any classification which included a built-up area (classification 1-4) was recoded as 'urban' , classification 5 (rural areas with high urban influence) was recoded as commuter/peri-urban, with all other areas (classification 6) coded as 'rural' .
Deprivation index and components. The Pobal Haase-Pratschke (HP) Deprivation Index is a composite measure of deprivation/affluence derived from national population census data and comprising 16 individual components, representing three dimensions of deprivation: demographic profile, social class composition, and labour market situation (Table 1) 15 . The absolute deprivation score reflects any changes to the national economy at SA level between census periods while the relative deprivation index score is a comparative measure of deprivation between SAs during a census period 15 . Deprivation indices (absolute and relative) and component scores were obtained for the most recent (2016) national census of Ireland and attributed to all laboratory-confirmed COVID-19 cases.

Statistical analysis.
To counteract the high proportion of "non-severe" outcomes within the case dataset, a balanced dataset was created via up-sampling. Cases were randomly partitioned into model training (80%) and validation (20%) subsets based on the dependent variable of interest (i.e., Hospital Inpatient, ICU Admission, Mortality), to derive generalised linear models using a binomial link function (i.e., dispersion = 1, parameter number = number of coefficients). Models were trained using all available predictors, with variables individually removed from the model based on the lowest Akaike Information Criterion (AIC) and the least significant variable p value (i.e., stepwise approach). Each significant variable was subsequently removed from the model to assess its effect on model accuracy based on developed confusion matrices. Only variables contributing significantly to model accuracy were retained. Receiver operating characteristic (ROC) curves and the area under the curve (AUC) were employed to assess the diagnostic ability of developed models; internal validation was undertaken on calculated AUROCs using 500 bootstrapped samples for model training and validation. The Nagelkerke R 2 was used to calculate the proportion of explained variance explained by the selected predictors, with the Brier score used to assess model performance (calibration). The "best predictors" identified via validated GLMs were used to develop "rpart" (Recursive Partitioning and Regression Trees) models to identify individual variable thresholds and the causative pathways from symptomatic infection to each of the three modelled outcomes (i.e., attribute cut-offs ("splitters") and causative order/ importance). As for GLMs, a balanced dataset and partitioning approach (80/20) for training and testing sets were employed. A 10 × cross-validation tree development method was used, with tune length (number of default parameters) varying from 2 to 10 for training. Final models were selected to maximise the complexity/accuracy of the decision trees (based on Cp (complexity parameter)). Accordingly, presented models are those with the maximum number of predictors in concurrence with the highest level of accuracy based on true positives (i.e., sensitivity). Final decision trees are presented to highlight successive thresholds (cut-off values (splitters) for continuous predictors, significant category for categorical predictors, predictor order) and pathways (i.e., predictor order) identified for progression from symptomatic confirmed COVID-19 infection to each of the modelled outcomes. www.nature.com/scientificreports/ All statistical analyses were carried out in R version 4.0.3 using the Caret, pROC, deskTOOLS, fmsb, glmnet and randomforest packages. All packages are freely available at http:// cran.r-proje ct. org.

Results
Descriptive statistics. Overall, 47,265 laboratory-confirmed cases of symptomatic COVID-19 infection (53.4% female; mean age 41.2 years; 0.96% of national population) were included for analyses ( Table 2), all of which occurred between February 29th and November 30th 2020. Of these, 3781 (7.99%) were reported as having been hospital inpatients, 615 (1.3%) were admitted to an intensive care unit (ICU) and 1326 (2.8%) died, of whom 599 (45.2%) had not been classified as a hospital inpatient. The odds of progression to severe outcomes typically increased with age, frequency/number of comorbidities, and deprivation elements, for example, across the entire study cohort, 21% of cases (n = 37,341) presented with ≥ 1 underlying clinical condition, compared with 60.4%, 78.9% and 84.2% among hospitalised cases, ICU admissions and deaths, respectively ( Table 2). Likewise, mean HP deprivation scores were markedly lower among cases associated with hospitalisation (− 1.82), ICU (− 0.28) and death (− 1.7) than the mean score across all symptomatic cases (0.24). Patients that died in hospital were typically younger (mean 77.3 years vs 84 years), associated with a higher comorbidity score (mean 1.96 vs 1.51) and markedly lower deprivation score (mean − 2.41 vs − 0.84), than those that died outside of hospital.
Admission to ICU. Approximately 1.3% (n = 615) of symptomatic COVID-19 infections from February 29th to November 30th resulted in admission to an ICU; the training GLM comprised 10 predictors, two of which were "protective" (healthcare worker, presence of a chronic neurological condition), including age, gender, five individual comorbidities, calculated comorbidity number, occupational classification, and one geographicallyspecific variable (urban resident) ( Fig. 3; Table 3). The validated model returned a bootstrapped AUC of 0.885 www.nature.com/scientificreports/ (95% CI 0.88 0.89), model predictive sensitivity (i.e., true positive) of 85.2%, a Nagelkerke R 2 of 0.575, and a Brier score of 0.128. The validated "rtree" model for hospitalisation among symptomatic COVID-19 cases is presented in Fig. 3; the model achieved a predictive accuracy of 83.1% on the outcome (ICU admission) class (Fig. 4).

Mortality. Just under 3% (n = 1326) of symptomatic COVID-19 infections occurring between February 29th
and November 30th resulted in death; the validated GLM comprised 8 predictors, one of which was "protective" (healthcare worker), including age, gender, four individual comorbidities, calculated comorbidity number, occupational classification, and one geographically-specific variable (urban resident) ( Fig. 5; Table 3). The validated model returned a bootstrapped AUC of 0.955 (95% CI 0.951 0.959), model predictive sensitivity (i.e., true positive) of 90.4%, a Nagelkerke R 2 of 0.816, and a Brier score of 0.06. The validated "rtree" model for mortality among symptomatic COVID-19 cases is presented in Fig. 6; the model achieved a predictive accuracy of 96.7% on the outcome (mortality) class.

Discussion
The complete Irish dataset of notified cases of COVID-19 throughout the first two waves of the pandemic was analysed to identify case-and geographically-specific attributes that may serve as predictors for hospitalization, ICU admission and mortality in patients with laboratory-confirmed, symptomatic COVID-19 infection. Results mirror findings from previous studies, with older age, male gender and increased comorbidity number  www.nature.com/scientificreports/ consistently significant factors within all validated models for COVID-19 severity. Studies have shown that increasing type-2 cytokine production with age likely reduce control of viral replication, leading to prolonged incubation and inflammatory response, thus facilitating the progression of infection 16,17 . Likewise, while symptomatic COVID-19 prevalence was higher among females (53.4%), the burden of severe infection was markedly higher among male cases for all three modelled outcomes; men were approximately 1. A recent review of the sex-and gender-related differences associated with COVID-19 outcomes in Europe proposes numerous potential reasons for this relationship, including gender-specific lifestyle, health behaviours, psychological stress, and socioeconomic conditions, in addition to several sex-specific biological mechanisms modulating the course of disease, including hormone-regulated gene expression, innate and adaptive immune responses, and immune-aging 18 . For example, numerous studies have shown that females are generally less susceptible to viral infections and mount higher innate immune responses (more rapid viral recognition and type I interferon production) than their male counterparts, leading to faster viral clearance 19,20 . Accordingly, there is a strong evidence base to suggest that upon infection with SARS-CoV-2, females may be better equipped to initially respond, and attenuate viral invasion and pathogenicity compared to males. Additionally, a recent study in the UK has noted significantly higher rates of "behavioural resistance" to protection actions (i.e., nonpharmaceutical interventions) among men, noting that 80% of those fined for breaking lockdown measures  www.nature.com/scientificreports/ were male 21 , potentially resulting in higher levels of viral exposure, transmission and loading among males, in concurrence with the aforementioned biological disparities. Accordingly, gendered or sex-specific therapies and/ or non-pharmaceutical interventions may be an important area for future research. COVID-19-related hospitalisation presented as the most analytically complex severe outcome, with numerous comorbidities and socioeconomic factors associated with admission as a hospital inpatient. While the models for predicting hospitalisation demonstrated a good fit (AUROC 0.816, 95% CI 0.809-0.822), the authors suggest that the lower predictive capacity of the presented hospitalisation model is reflective of the complexity of disease manifestation, particularly within the community, which is mediated by several socio-behavioural, clinical and biological factors. This may be particularly pronounced with respect to non-clinical and non-biological factors such as individual behaviours, self-efficacy and knowledge, which may lead to increased exposures and are particularly difficult to accurately quantify via routine epidemiological surveillance.
Asthma was associated with an increased likelihood of hospitalization. Recent research has been divided regarding the influence of asthma on COVID-19 hospitalisation, with some authors suggesting that those with asthma are over-represented among adult hospital admissions as SARS-CoV-2 may initiate an exacerbation in asthma symptoms, which has been reported among other respiratory viruses 20,21 . Likewise, the most common presenting symptoms of COVID-19 -dry cough and shortness of breath -are also common in acute exacerbation of asthma 20 . Conversely, several international studies have reported that asthma is not a significant risk factor for hospitalisation with COVID-19 22,23 , with some suggesting that it may be a protective factor, via increased numbers of eosinophils in the airways of asthmatic patients, or through potential antiviral and immunomodulatory activities of inhaled asthma medications, and particularly steroids 24 . Results from the current study may reflect the high prevalence of asthma in the ROI, which has the fourth highest global prevalence of the disease and was consistently among the top 20 diagnoses for admission to hospital prior to the pandemic 25 .
From a socio-geographic/economic perspective, patients living in categorically rural areas and in regions characterised by higher (> 17%, Fig. 2) rates of local authority (i.e., publicly-supported) housing were also at increased risk of hospitalisation, potentially reflecting a geographical and/or geo-social gradient associated with disease severity in Ireland. A recent investigation of the socioeconomic association of COVID-19 hospitalisation among 418,794 participants of the UK Biobank reports a striking gradient in COVID − 19 hospitalization rates according to the Townsend Deprivation Index − a composite measure of socioeconomic deprivation − and household income 26 . Likewise, individual socioeconomic status has been associated with the severity of COVID-19 among hospitalised patients under the age of 70 years in Greater Paris, with housing conditions as they relate to the capacity to socially distance and increased co-resident infections, specifically mentioned as probable drivers 27 . Within the current study sample, local-authority housing (%) and the prevalence of both primary (R sp = 0.375, p < 0.001) and college/university education (R sp = − 0.449, p < 0.001) were significantly correlated, with lower levels of education a globally recognised source of health inequalities 28 .
Predictive capacity increased for both ICU admissions and mortality, with models for ICU admission (AUROC 0.885, 95% CI 0.88-0.89) and mortality (AUC 0.955, 95% CI 0.95-0.96) assessed as being very good and excellent, respectively. Commonalities were observed across risk factors identified for both outcomes. Specifically, severe obesity, indicated by a body mass index (BMI) ≥ 40, was a significant marker for both ICU admission (OR 19.6) and death (OR 10.8). The identified risks associated with severe obesity align with pathophysiological mechanisms contributing to respiratory distress; in particular, a BMI ≥ 40 (associated with increased respiratory rate) is recognized as a contributor to multiple respiratory infections including pneumonia 29 and has been www.nature.com/scientificreports/ identified as a primary risk factor for poor COVID-19 prognoses 30,31 . Severe obesity was a particularly significant predictor among COVID-19 patients aged < 41 years (Fig. 4) and < 63 years (Fig. 6) for ICU admission and death, respectively (i.e., significantly below median ages for both outcomes). Similarly, the presence of malignant cancer and immunodeficiency resulting from cancer treatment impair the ability to mount an effective response to clear viral infection and are associated with increased susceptibility to acute clinical deterioration and increased mortality due to increased viral pathogenicity 32,33 , with < 63 years again identified as a significant "splitter" for COVID-related mortality (Fig. 6) demonstrating a lack of interaction between this health condition and older age. While residence in categorically rural areas was associated with a higher likelihood of hospitalisation, the opposite was true for admission to ICU, whereby urban dwellers were approximately 1.5 times more likely to require critical care (OR 1.533, 95% CI 1.606-1.682), and particularly among those aged > 60 years (Fig. 4). Urban living may be indicative of multiple individual or interacting factors including higher levels of deprivation 34 , higher viral exposures (i.e., close contacts) due to increased household and/or local population density 26 or compounded respiratory illnesses due to lower air quality in urban areas 35 . For example, within the current study sample, while a slightly higher proportion of symptomatic cases with asthma were reported in rural areas (~ 1.5% versus ~ 1% of all symptomatic cases), the likelihood of ICU admission among urban asthma sufferers was significantly higher (OR 15.55; CI 11.28-21.14) than their counterparts in rural (OR 13.22; CI 6.49-25.10) or commuter areas (OR 11.01; CI 3.73-26.99), potentially highlighting a significant interaction between urban pollutant exposures (e.g., particulate matter (PM) 2.5/10) and COVID-19 severity 36 .
The apparent "protective" effect of occupational status as a healthcare worker and both ICU admission and death may indicate clinical heterogeneity between these and other cases, arising from diagnostic bias, as healthcare workers are likely to be associated with a high index of suspicion for the disease, and as such, have and continue to undergo serial testing in Ireland. The threshold of clinical criteria for COVID-19 diagnosis in healthcare workers, and the temporal lag between viral exposure, positive diagnosis and subsequent treatment, is likely significantly lower than among the general population, due to these testing protocols 37 , resulting in improved outcomes and the apparent "protective" effect identified 38 . Likewise, the protective effect of chronic neurological disease with respect to ICU admission is thought to reflect clinical processes, specifically the clinical judgement that, regarding persons with advanced dementia, mechanical ventilation may prolong patient suffering without a clear survival benefit 39 . Other comorbidities have been associated with poor prognosis from ICU admission, and thus reduce odds of admission through clinical decision-making. Physical factors indicating a limited functional capacity are predictive of high mortality in ICU, suggesting that frailty has a significant impact on intensive care outcome; hence, the finding that age was associated with the lowest odds of ICU admission (compared with hospitalisation and mortality) may be unsurprising 8 . Similarly, the finding that 599 (45.2%) of those who died had not been hospitalised is unsurprising in the context of the mean age of this subgroup (84 years), as patients of such advanced age may have been considered too frail to benefit from hospital (and particularly critical) care.
While the presented study permits delineation of severe health outcomes based on clinical and socioeconomic attributes, there are some limitations the authors feel should be highlighted. Based on presented findings, it is likely that some case attributes may be indicative of differential healthcare access and thus not entirely elucidated by the pathophysiological mechanisms driving progression of the disease to increasing clinical severity and death. For example, rurality as a predictor of hospitalization, in conjunction with urban residence as a predictor for ICU admission, may reflect a lower threshold for rural residents to present to healthcare locations, and subsequently to be admitted for observation, to counter the risk of deterioration in the more remote home environment. Likewise, the choice of hospitalisation as a marker for COVID-19 severity also comprises some spatio-temporal limitations; hospitalisation itself may be affected by many factors including health-seeking behaviours, availability of care and healthcare policies or thresholds (e.g., more or less severe cases may be admitted to hospital and/or admission may be age-specific), which may be spatially unique and/or temporally fluid based on the capacity of a national or regional healthcare system to absorb cases (e.g., localised outbreaks). As such, the authors advise caution be exercised when comparing the current study findings with previous or future studies of a similar nature.

Conclusion
The identified nationally-specific risks associated with demographic, underlying health (comorbidities), geographic location and socioeconomic profile, and the specific importance, attribute "splitters" and variable interactions represent a robust evidence base for development of increasingly targeted public-health recommendations, interventions and therapeutic approaches for high-risk groups, e.g., minimization of social contact among those with elevated BMI, urban asthma or immunodeficiency caused by cancer treatments, and thorough respiratory etiquette and hand hygiene among household contacts in specific settings and/or geographic regions. Moreover, communication of the scientific basis for ongoing and future interventions, and particularly geographically-or socioeconomically bespoke interventions may be used to combat pandemic fatigue and increase overall transparency and awareness of ongoing public-health events. Furthermore, the presented models offer a metric by which tailored vaccination schedules may be devised with prioritization by age, sex, co-morbidity status and region. Lastly, results presented offer valuable information for effective patient triage; identifying those at increased risk of disease progression and death based on a suite of factors and not solely on clinical presentations of the disease.