Deciphering the link between healthcare expenditure, corruption, and COVID-19 mortality

This paper analyzes the determinants of COVID-19 mortality across over 140 countries in 2020, with a focus on healthcare expenditure and corruption. It finds a positive association between COVID-19 deaths and aging populations, obesity rates, and healthcare expenditure while noting a negative association with rural residency and corruption perception. The study further reveals that mortality is positively associated with aging populations in high-income countries and positively associated with obesity in upper-middle to high-income countries. Mortality is positively associated with healthcare expenditure, which likely reflects a country’s preparedness and ability to better track, document, and report COVID-19 deaths. On the other hand, mortality is negatively associated with corruption perception in upper-middle-income countries. Further analyses based on 2021 data reveal COVID-19 deaths are positively associated with the proportion of the population aged 65 and older in low to lower-middle-income countries, with obesity in high-income countries, and with tobacco use across most countries. Interestingly, there is no evidence linking COVID-19 deaths to healthcare expenditure and corruption perception, suggesting a post-2020 convergence in preparedness likely due to proactive pandemic responses, which might have also mitigated corruption’s impact. Policy recommendations are proposed to aid the elderly, address obesity, and combat tobacco use.


Data
The data used in this study are comprised of total population, the proportion of individuals aged 65 and above, the percentage of rural population relative to the total population, prevalence of tobacco use among adults, mean annual exposure to particulate matter 2.5 (measured in micrograms per cubic meter, µg/m 3 ), govern- ment expenditure on education as a percentage of GDP, and current healthcare expenditure as a percentage of GDP, which were obtained from the World Bank's World Development Indicators.For data related to COVID-19 deaths, representing the cumulative number of fatalities up to December 31, 2020, and data regarding the prevalence of obesity among adults (defined as having a Body Mass Index (BMI) greater than or equal to 30), we relied on data provided by the World Health Organization (WHO).Lastly, data on corruption were sourced from Transparency International.All data are from 2019 and the variables and their relevant details are described in Table 1.Corresponding summary statistics are presented in Table 2.
It is worth noting that endogeneity is not a significant concern in this paper.The temporal distinction in the data, with the explanatory variables captured during years preceding COVID-19 deaths, significantly reduces the risk of endogeneity.In particular, the exclusion of response variables such as vaccination and lockdown, which are prone to high endogeneity, further strengthens this argument 19 .The data used in this study essentially capture predetermined characteristics that delineate the state of countries prior to the onset of the pandemic.This temporal distinction ensures that the identified associations between explanatory variables and COVID-19 mortality remain robust and are less susceptible to biases arising from endogeneity.

Methodology
To capture the numerous factors that can contribute to COVID-19 mortality, it is important to also control for factors contributing to mortality in the absence of COVID-19.Not accounting for such factors could result in erroneously attributing COVID-19 deaths to some hypothesized factors.
The following represents a specification based on our current understanding of the COVID-19 virus and related hypotheses: where the dependent variable represents the natural logarithm of the number of reported COVID-19 deaths (logdeaths) for country i.This is estimated as a function of the natural logarithm of total population (logpop), the share of the population above 65 years of age (pop65plus), the prevalence of obesity (obesity), the proportion of the total population living in rural areas (ruralpop), the prevalence of tobacco use (tobacco), the natural log of particulate matter 2.5 mean annual exposure, percent government expenditure on education (education), percent current healthcare expenditure (healthcare), and the corruption perception index (corruption).Table 3 provides a correlation matrix of the independent variables.This table shows correlations that are low enough to allay concerns about multicollinearity,instability of estimates, and overfitting.
Since the dependent variable is measured in levels, it is crucial to introduce the population as a scale variable.In addition, all variables not expressed in percentages are log-transformed to facilitate the interpretation of parameter estimates as elasticities.The pop65plus variable is introduced as an explanatory variable, in line with previous research suggesting a significant proportion of COVID-19 fatalities are individuals aged 65 and above [3][4][5] .The obesity variable is included to account for the established contribution of obesity to mortality 5 .The ruralpop variable addresses the disparities in mortality between urban and rural areas 15,16 .The tobacco variable factors in (1)  www.nature.com/scientificreports/ the relationship between smoking prevalence and COVID-19 mortality 5,7 .The education variable controls for the effect of lower education on mortality 7 .Lastly, the logpm25 variable controls for the impact of air pollution on mortality.Previous research has consistently presented evidence supporting a positive relationship between air pollution and COVID-19 mortality [7][8][9][10][11][12][13][14] .
The model also incorporates the two variables of interest: healthcare and corruption.The healthcare variable is used to test hypotheses regarding the association between healthcare expenditure and COVID-19 mortality, as suggested by previous research 13,17,18 .On the other hand, the corruption variable is included to assess the association between corruption and mortality, a variable that has received little attention 19 .A higher corruption perceptions index implies lower corruption level.
Stepwise regression is used in this study for selecting a parsimonious specification with fewer variables for interpretation.This iterative process of model refinement begins with a stepwise forward selection approach, which involves sequentially adding predictors based on their potential to contribute significantly to explaining the variation in COVID-19 deaths.Following the forward selection step, a backward elimination approach is employed by systematically removing the least statistically significant predictors, ensuring that only the most robust variables remain.These steps allow the model to achieve a balance between complexity and explanatory power.The resulting specification with fewer variables not only yields a more interpretable model but also mitigates the risk of overfitting.
All estimations are completed using a Least Squares estimator with bootstrapped standard errors.This estimator allays concerns about within-sample distortions and derives estimates of standard errors and confidence intervals based on the underlying distribution of the sample rather than based on some a priori distributional assumptions.All estimations are completed with 500 bootstrap replications.
In a preliminary analysis of the hypothesized relationships, scatter plots are derived using the locally weighted scatterplot smoother (lowess).This approach is preferable over a simple linear regression line because it offers more flexibility by not assuming a specific functional form for the relationship between variables 20 .As a result, it can capture complex patterns in the data without imposing rigid assumptions.Figures 1 and 2 illustrate the associations between healthcare spending and COVID-19 mortality and between corruption and mortality, respectively.At first glance, these figures appear to reveal a positive relationship.However, the following section will further investigate these relationships by accounting for other potential confounding factors, aiming to validate the patterns observed in the figures.

Results
The estimation results presented in Table 4 are derived from a stepwise regression, a method that incrementally introduces variables to identify the model with the greatest explanatory power.The initial model in column (1)  indicates that the coefficients for logpop and pop65plus are both positive and statistically significant ( p < 0.001 ).After introducing obesity in column (2), the coefficients for logpop and pop65plus are unaffected while the coefficient for obesity is also positive and statistically significant ( p < 0.001).
Column (3) presents estimation results after adding ruralpop.The parameter estimates of the original variables are unaffected by this change and that for ruralpop negative and statistically significant ( p < 0.05 ).Columns (4)-( 6) present results after incorporating tobacco, logpm25, and education, respectively.Notably, the parameter estimates for these new variables are not statistically significant while those for the original variables remain unchanged.On the other hand, the variable ruralpop loses statistical significance across these three estimations.Column (7) takes into account the introduction of healthcare, with none of the parameter estimates for tobacco, logpm25, education, and healthcare proving statistically significant.However, the coefficient estimate for ruralpop regains statistical significance ( p < 0.05).
Column (8) showcases the results of the full model following the inclusion of corruption.With the exception of logpop, pop65plus, obesity, and ruralpop, none of the parameter estimates for the included variables are statistically significant.Given the consistent lack of statistical significance for tobacco, logpm25, and education, these variables are subsequently removed, and the model is re-estimated in column (9).This column reveals estimation results after the exclusion of these variables.As observed in previous models, the coefficients for logpop, pop65plus, and obesity are positive and statistically significant ( p < 0.001 ).In addition, the coefficient estimate www.nature.com/scientificreports/for ruralpop is negative and statistically significant ( p < 0.01 ).Lastly and notably, the coefficient estimates for healthcare and corruption are statistically significant, with positive and negative signs, respectively ( p < 0.05).
It is important to note that model accuracy, measured by the adjusted R 2 , significantly improved from 0.601 in column (1) to 0.672 with the introduction of obesity, with a minor increase to 0.68 after the addition of ruralpop.However, the adjusted R 2 decreased to 0.656 after introducing tobacco, increased slightly to 0.658 with the inclusion of logpm25, and decreased significantly to 0.627 after adding education.Subsequently, the explanatory  power increased significantly to 0.636 after introducing healthcare and further to 0.645 after introducing corruption.Remarkably, the last model, in column (9), which excludes the tobacco, logpm25, and education variables explains 69.1% of the variation in COVID-19 deaths, the highest among all nine estimations, indicating greater model accuracy and explanatory power.
A noteworthy observation from the estimation results is that the coefficient estimate for healthcare loses statistical significance in columns ( 7) and ( 8), suggesting a potential interaction effect with tobacco, logpm25, and education.This interaction could be attributed to how these variables influence healthcare expenditure.Nevertheless, the lack of statistical significance and lower adjusted R 2 values imply that these variables do not contribute significantly to the model's explanatory power.
In summary, focusing on the results in column (9), which show the best goodness-of-fit, it becomes apparent that countries with a larger proportion of people aged 65 and above experience higher COVID-19 mortality rates, and countries with higher obesity rates also report greater COVID-19 mortality.There is also evidence that countries with a greater proportion of people living in rural areas have lower COVID-19 mortality.In addition, countries with higher healthcare expenditure have higher COVID-19 mortality rates, while those with lower corruption levels, as suggested by a higher corruption perceptions index, have lower COVID-19 deaths.Lastly, contrary to prior research, there is no evidence of a connection between tobacco use, air pollution, education, and COVID-19 mortality.

Robustness of the results
There are always concerns about potentially influential observations when using cross-country studies.This issue can be addressed using M-estimation with Huber weighting.M-estimation provides a robust method for estimating regression parameters, helping to address potential issues that might arise from the bootstrap resampling process.While bootstrapping is a powerful technique for estimating the sampling distribution of regression coefficients and assessing uncertainty, it can be sensitive to outliers in the data, potentially leading to biased or inefficient parameter estimates.M-estimation can mitigate the impact of outliers on their parameter estimates from its ability to assign lower weights to potential outliers, thereby reducing the latter's influence on the final estimates and providing more reliable parameter estimates.
Table 5 summarizes the estimation results using the Huber M-estimator.Overall, the results remain largely consistent with those obtained using bootstrap linear regression.This reaffirms the previous findings even after accounting for potentially influential observations.Another concern regarding the estimation results is the potential variation of the hypothesized relationships across different groups of countries.Given the substantial disparities among countries at various stages of development, the policy recommendations derived from the analysis may lack relevance and generalizability across all countries.One approach to address this heterogeneity is to estimate Eq. ( 1) for subsamples based on predetermined criteria.Utilizing the World Bank Atlas Method Gross National Income per capita (GNI) Operational Table 4. Bootstrap estimation results (2020).Standard errors in parentheses *p < 0.05 , **p < 0.01 , ***p < 0.001.Upon examination of the data, it is noted that the low income, lower middle income, upper middle income, and high income subsamples are too small to be estimated separately.As a result, the first two subsamples are merged to form a combined low income to lower middle income group when GNI ≤ 4, 045 while the remaining subsamples are retained unchanged.All estimations are completed using Least Squares estimator with bootstrapped standard errors followed by M-estimation with Huber weighting and focusing on the parsimonious specification used in previous interpretations.
Table 6 summarizes the estimation results.Columns (1) present bootstrap estimation results for Eq.(1).Columns (2) display estimation results for the parsimonious specification that excludes tobacco, logpm25, and education.Columns (3) report Huber M-estimator estimation results for the parsimonious specification.For consistency, interpretations will focus on the results reported in columns (3).
First, the positive relationship between pop65plus and COVID-19 mortality, observed in the full sample, is found to be statistically significant only within the high-income group.Second, the observed positive relationship between obesity and COVID-19 mortality only holds within upper middle income and high-income groups.Third, the observed negative relationship between ruralpop and COVID-19 mortality only holds for low to lower middle income group.Fourth, The observed positive relationship between healthcare and COVID-19 mortality, as well as the negative relationship between corruption and COVID-19 mortality, only hold in the upper middle-income group.
An additional critical consideration for this study relates to the focus on the year 2020.While the examination of COVID-19 mortality in 2020 offers valuable insights into preparedness in initial responses and resource mobilization, the exclusion of subsequent years, notably 2021, could limit the assessment of policy effectiveness and healthcare system resilience.This is particularly important given that the year 2021 witnessed nearly double the number of deaths compared to the preceding year.To address this concern, all previous estimations in this study are completed using data from 2021.However, while most data sources are from the year 2021, exceptions include logpm25 data, sourced from 2019, and data for tobacco and healthcare, which are from 2020.Summary statistics for all variables are reported in Table 7.
Table 8 reports estimation results for the year 2021 using bootstrap regression.Consistent with the previous findings, COVID-19 deaths are positively associated with the pop65plus, obesity, and tobacco.This latter variable did not have a statistically significant parameter estimate in the previous estimations.In addition and contrary to the previous estimations, there is no evidence of statistically significant links between healthcare, corruption, and COVID-19 mortality.These results are further reaffirmed in Table 9, which reports the Huber M-estimator regression estimation results.Table 5. Huber M-estimator estimation results (2020).Standard errors in parentheses *p < 0.05 , **p < 0.01 , ***p < 0.001.www.nature.com/scientificreports/The final step in the addressing the concern related to excluding data for 2021 is completed by utilizing the World Bank Atlas Method Gross National Income per capita (GNI) Operational Guidelines and Analytical Classifications for the Calendar Year 2021.Countries are categorized as follows: low income if GNI ≤ $1, 085 , lower middle income if $1,085< GNI ≤ $4, 255 , upper middle income if $4,255< GNI ≤ $13, 205 , and high income if GNI> $13,205.Consistent with the previous approach, the first two subsamples are merged to form a combined low income to lower middle income group when GNI ≤ 4, 255 while the remaining subsamples remain unchanged.
Table 10 reports in Columns (1), (2), and (3) the estimation results of Eq. ( 1), the parsimonious specification that excludes ruralpop, logpm25, and education using bootstrap regression, and those of the parsimonious equation using the Huber M-estimator, respectively.It is worth noting that the parsimonious specification for 2021 differs from the one used with 2020 data due to the lack of statistical significance for ruralpop and presence of statistical significance for tobacco.
There are a number of observed differences in the presented estimation results.Contrary to previous findings that reported a positive relationship between pop65plus and COVID-19 mortality for the high-income group, the new estimations find that this relationship holds only for the low to lower middle-income groups.www.nature.com/scientificreports/ The new estimations also show the positive relationship between obesity and COVID-19 mortality holds only for the high-income group.Contrary to the previous estimations, there is a positive relationship between tobacco and COVID-19 mortality for the low to lower middle-income and high-income groups.Lastly, there is no link between healthcare, corruption, and COVID-19 mortality across any of the income groups.

Discussion
This paper evaluates the determinants of COVID-19 mortality in 2020 using a sample of more than 140 countries with a specific focus on healthcare expenditure and corruption.Across the overall sample, the research reveals a positive association between COVID-19 deaths and several key factors: the proportion of the population aged 65 and older, the prevalence of obesity among adults, and healthcare expenditure.Conversely, it identifies a negative association between COVID-19 mortality and two factors: the proportion of the population living in rural areas and the corruption perceptions index.These findings highlight the heightened vulnerability of older individuals to severe COVID-19 outcomes, the established link between obesity and susceptibility to severe COVID-19 outcomes, and the connection between greater healthcare expenditure and corruption with higher COVID-19 mortality rates.Moreover, the study highlights disparities in mortality rates between rural and urban areas, with rural areas exhibiting lower mortality rates.Upon further examination, with subsamples categorized according to countries' levels of development, the study presents five key findings beyond the lack of statistically significant relationships between COVID-19 mortality, tobacco use, air pollution, and education.First, it finds that a higher proportion of the population aged 65 and older is significantly associated with greater COVID-19 deaths, particularly in high-income countries.This is likely due to the fact that the distribution of the population in high-income countries may include a disproportionately larger share of individuals aged 65 years and older compared to lower-income countries.This demographic skew toward older age groups would likely exacerbate the vulnerability to severe COVID-19 outcomes even in the presence of robust healthcare infrastructure.
Second, the prevalence of obesity within a population exhibits a positive and significant association with increased COVID-19 deaths, primarily in upper middle-income and high-income countries.This is likely due to the fact that in upper middle-income and high-income countries, obesity rates tend to be higher due to lifestyle factors and dietary habits.
Third, the analysis identifies disparities between rural and urban areas in low to lower middle income countries, with fewer deaths observed in countries with a greater share of the population residing in rural areas.This could be explained by the possibility that rural populations might have different social distancing practices or could be less densely populated, thereby reducing the spread of the virus and subsequently lowering mortality rates in these areas.However, it is important to consider that this disparity may also reflect the challenges faced Fourth, higher healthcare expenditure is found to be associated with higher COVID-19 mortality rates, albeit only in upper middle-income countries.This paper contends that greater healthcare expenditure likely reflects a country's preparedness and ability to better track, document, and report COVID-19 deaths.This suggests that countries with low expenditure on health care may not have the resources to accurately report COVID-19 deaths and may even have an incentive to under-report.It is important to note, however, that while a high expenditure share on healthcare could suggest a large healthcare sector, it is not indicative of the quality of healthcare.For instance, based on World Bank data, the United States spent more than 16.67% of its GDP on health care (second to Vanuatu) in 2019 and managed to get the world's highest number of COVID-19 deaths in 2020 in absolute terms (352,000) and the 15th largest in per capita terms.Most European countries that have relatively high expenditure on health care also happen to have a high number of deaths in absolute and in per capita terms.Another important observation is that several countries that have low healthcare expenditure have reported a relatively low number of deaths.In such countries, extreme lockdown and confinement measures would have been warranted after the realization that the pandemic would be too overwhelming and could lead to greater mortality and a collapse of the healthcare system.Of course, fewer reported deaths could also be the result of insufficient resources to track and report all deaths related to COVID-19.
Fifth, higher levels of corruption are linked to more COVID-19 deaths, particularly in upper middle-income countries.While this finding may seem counterintuitive at first glance, given that corruption is often more prevalent in low-income countries, there are a number of plausible explanations.For instance, corruption in upper middle-income countries may involve more complex networks and higher levels of collusion between government officials, private sector actors, and other stakeholders.These corruption networks could have serious implications for healthcare delivery and pandemic response, leading to systemic weaknesses that contribute to higher COVID-19 mortality rates.In contrast, corruption in low-income countries may be more localized or decentralized, with fewer layers of bureaucracy, making it less directly linked to mortality outcomes.
These findings gain added significance with the addition of an analysis using data for 2021.First, the fact that a higher proportion of the population aged 65 and older is significantly associated with greater COVID-19 deaths in the low to lower-middle-income groups rather than in the high-income group could be explain by one key factor.In 2021, high-income countries generally have more robust healthcare systems and better access to medical resources, including critical care facilities, advanced treatments, greater access to vaccines and resources for vaccine distribution, allowing them to implement more effective public health measures and containment strategies to protect older populations from COVID-19, such as early lockdowns, widespread testing, contact tracing, and prioritizing older populations for vaccination.These measures could have led to a more significant reduction in COVID-19 mortality rates among older adults in high-income countries compared to low to lowermiddle-income countries.
Second, the positive relationship between the prevalence of obesity within a population and COVID-19 mortality reaffirms the previous findings albeit only for high-income countries, which are known to have the highest rates of obesity.Third, the emergence of a positive relationship between the prevalence of tobacco use and COVID-19 mortality can be explained by the fact that although the prevalence of tobacco use has decreased on average between 2019 and 2021, as shown in Tables 2 and 7, the detrimental health effects of tobacco use, including its impact on respiratory health and immune function, often accumulate over time.
Fourth and more importantly, the absence of evidence linking COVID-19 mortality to healthcare expenditure and corruption is not only compelling but also reassuring.This finding implies that countries may have undertaken substantial investments to strengthen their healthcare infrastructure, expand testing and treatment capabilities, and enhance access to medical resources for COVID-19 patients.These proactive measures, spurred by the exigencies of the pandemic, likely contributed to a convergence in preparedness levels across countries and may have curtailed the impact of corruption on pandemic outcomes.This implies that, when confronted with public health crises as severe as the COVID-19 pandemic, countries may not be unduly constrained by their level of development in mobilizing resources.
This paper faces some limitations, which should be addressed in future research.First, while statistical associations are established, none imply causation.Second, the positive link between healthcare expenditure and COVID-19 mortality may be influenced by unaccounted confounding variables.Third, the study does not evaluate the unique circumstances of individual countries, such as variations in healthcare system structures, the timing of the pandemic's arrival, and the effectiveness of public health measures.Fourth, the analysis also does not comprehensively evaluate the unique circumstances of individual countries, which could significantly influence COVID-19 mortality.Variations in healthcare system structures, including differences in healthcare infrastructure, staffing levels, and access to medical resources, may have a substantial impact on a country's ability to effectively respond to the pandemic.Fifth, the study's findings may be subject to measurement limitations and data quality issues inherent in cross-national datasets.
This research has important policy implications.Given the vulnerability of older individuals to public health crises such as pandemics, especially in low to lower-middle-income countries, policymakers should prioritize targeted protection measures for this demographic group.This includes ensuring equitable access to vaccines, implementing early containment strategies, and strengthening healthcare systems to provide adequate support and care for older populations.In high-income countries, proactive public health interventions should be adopted to tackle the obesity crisis, especially given its exacerbating impact on the COVID-19 pandemic.Beyond initiatives to improve nutrition, promote physical activity, and combat sedentary behaviors, policymakers should consider using measures such as a fat tax or an outright ban of foods that are deemed unhealthy.Policy interventions should also raise awareness about the risks of COVID-19 and similar pandemics for tobacco users and encourage individuals to quit smoking.Policymakers could approach this issue more proactively by strengthening

Figure 1 .
Figure 1.Plot of ln deaths and healthcare expenditure.

Figure 2 .
Figure 2. Plot of ln deaths and corruption perceptions index.

Table 1 .
Description of the variables.

Table 3 .
Correlation Matrix of Independent Variables.